University of Colorado
This FAQ focuses on questions concerning data, e.g. data format, data processing, CMIP and CORDEX data.
Each ESGF question is sorted into exactly one topic. See also ESGF General since questions of general interest and questions matching several topics are under topic ESGF General.
Results of climate model runs depend on the starting point of the calculation, on the initialisation method and on the model physics. Ensemble calculations facilitate quantifying the variability of simulation data concerning a single model. In the CMIP and CORDEX projects, ensemble members are named in the rip-nomenclature, r for realization (starting point), i for initialization, p for physics, followed by an integer, e.g. "r1i1p1". More
Means over several ensemble members are not in the ESGF. You may download the individual ensemble members and calculate the mean using tools, e.g. with Climate Data Operators (CDO).
A generally recommended model doesn't exist in the CMIP and CORDEX projects. Many researchers take data from more than one model and also more than one ensemble member per model and calculate a mean or plot them together to have a measure for the deviations of the models.
CMIP and CORDEX variables obey the CF standard incl. the CF Standard Name Table (CF for "Climate and Forecast"). In this table is also a definition of the variable. Additional information is in the variable requirement tables of the projects, see next question.
The search category Variable only contains abbreviations. The CF Standard Name obeys the CF Standard Name Table (CF for "Climate and Forecast"). For CMIP5 varaibles, the relation between all three, Variable, CF Standard Name and Variable Long Name, is tabulated in the CMIP5 Standard Output document, for CORDEX variables see the CORDEX Variables Requirement Table.
In the CMIP5 project, Near-Term (10-30 years) or Long-Term (century and longer) climate simulations have been performed, with many models even both. Some of the decadal experiments are Near Term future scenarios. CMIP5 Long Term scenarios are the Representative Concentration Pathways (RCPs), which represent the full bandwidth of future emission trajectories for the years 2006-2100, some continued until 2300. [More information]
CMIP5 historicalAA, historical data with anthropogenic aerosol forcing only, can be found in the historicalMisc experiment. Select historicalMisc and look for "Forcing = AA" in the metadata of the search results.
CMIP5 historicalLU, historical data with land-use change forcing only, can be found in the historicalMisc experiment. Select historicalMisc and look for "Forcing = LU" in the metadata of the search results.
CMIP5 historicalSl, historical data with solar forcing only, can be found in the historicalMisc experiment. Select historicalMisc and look for "Forcing = Sl" in the metadata of the search results.
An overview which CMIP5 data for historicalSl and other forcings should exist can be found in the tables of Gavin Schmidt.
Climate model output used in the IPCC's Fifth Assessment Report (AR5) is a subset of CMIP5 data. Two snapshots of these data were taken for documentation. Both are based on the status of CMIP5 data on March 15, 2013, the cutoff date for literature to be included in the Working Group I report CLIMATE CHANGE 2013, The Physical Science Basis. Data updates since March 15, 2013, are not included in the snapshots. A more detailed description inclusive links to access points to the two snapshots can be found on the AR5 GCM data page of Data Distribution Centre (DDC).
Unless you really need the frozen data with deadline March 15, 2013, we recommend CMIP5 data because erroneous CMIP5 data have usually been corrected by publication of a new version. CMIP5 data can be downloaded from ESGF.
The SRES scenarios (Special Report on Emission Scenarios, for the Third Assessment Report) belong to CMIP3. CMIP3 data are in the ESGF now. In ESGF search, select project=CMIP3 and, for example, experiment=sresa1b.
CORDEX 3hr and 6hr data are usually not in the ESGF but locally stored at the modeling centers according to the CORDEX Archive Design. Please contact the modeling groups.
A central database with descriptions of CORDEX regional climate models does not exist. Nevertheless, every CORDEX data file has a header with a global attribute "references", which usually contains a web address. You may see this and other attributes without a file download: Simply select a CORDEX file in an ESGF portal, follow the OPENDAP link and search the section "Global Attributes".
"landfrac" is not a variable name in ESGF. Please look for variable "sftlf", standard name "land_area_fraction". This is the land sea mask of the model in the projects CMIP5, CORDEX, GeoMIP, LUCID, PMIP3, ...
The following table lists the grid resolutions, i.e. the distance between adjacent grid points in degrees.
|Model||Atmospheric Grid||Ocean Grid|
|CESM1(FASTCHEM)||0.9424||1.25||only time-independent ocean data|
|CSIRO-Mk3L-1-2||3.1857||5.625||only time-independent ocean data|
|CanAM4||2.7906||2.8125||no ocean data|
|HadGEM2-A||1.25||1.875||no ocean data|
|MPI-ESM-LR||1.8653||1.875||orthogonal curvilinear coordinates lat(i,j) and lon(i,j)|
|MRI-AGCM3-2H||0.562||0.5625||no ocean data|
|MRI-AGCM3-2S||0.188||0.1875||no ocean data|
In case of the atmospheric grid and its latitude, the tabulated resolution is only valid for the equator region. For higher latitudes deviations may occur.
Ocean models have their own, finer grid. If two values are given for the latitude resolution of the ocean grid, the resolution is not constant. The first value is that for the equator, the second for the poles (maximum for the two poles if different). In case of rotated poles the resolutions for the rotated coordinates rlon and rlat are tabulated. If latitude and longitude are defined with two indices i and j, the resolution cannot simply be read out. In this case lat(i,j) and lon(i,j) have been entered.
MPI-M ocean data are upside down, due to the MPI-M history to store the data from North to South (positive to negative latitude values). Additionally, a curvilinear grid with the North Pole over Greenland is used.
Solution: Use Climate Data Operator remapbil:
cdo remapbil,r240x220 inputfile.nc outputfile.nc
More details in the CDO documentation.
No, see the table below.
|GFDL-CM3||365_day||all but amip: julian|
|IPSL-CM5A-LR||365_day||all but aqua4K, aqua4xCO2, aquaControl, past1000: 360_day|
|IPSL-CM5B-LR||365_day||all but aquaControl: 360_day|
|MIROC-ESM||proleptic_gregorian||1pctCO2, abrupt4xCO2, past1000|
|gregorian||esmControl, esmFixClim2, esmHistorical, lgm, midHolocene, piControl, rcp26, rcp45, rcp60, rcp85, esmrcp85, historical, historicalGHG, historicalNat|
|MIROC4h||gregorian||all but piControl: 365_day|
|MIROC5||360_day||aqua4K, aqua4xCO2, aquaControl|
|365_day||1pctCO2, abrupt4xCO2, amip, amip4K, amip4xCO2, amipFuture, historical, piControl, rcp26, rcp45, rcp60, rcp85, sstClim, sstClim4xCO2, sstClimAerosol, sstClimSulfate|
The values in the table have been taken from the calendar attributes of the NetCDF files. Since the calendars "standard" and "gregorian" are identical as well as "noleap" and "365_day", only the latter are used in the table. CMIP5 calendars are defined in the CF standard and in the CMIP5 Model Output Requirements.
Our most up to date paper describing ESGF can be found here.
For many CMIP5 data a DataCite DOI has been assigned providing persistent citation information. These data may therefore be cited. Two ways are possible to find the corresponding DOIs:
Yes, if you select matching ensemble members. Look into the header of the RCP data file: The attributes parent_experiment_id and parent_experiment_rip name the right ensemble member for combination. [Background information]
CORDEX data: The height level is part of the short variable name. For example, ta500 is the air temperature at the 500 hPa pressure level.
Before download with OPeNDAP: Expand the dataset you need with "Show Files" and click on "OPENDAP". In the OPeNDAP Dataset Access Form look for lev and enable it. Click on "Get ASCII" and login. The lev array with the height levels will be listed.
After download: Use local software, for example ncdump, which is a command line tool belonging to NetCDF software.
ncdump -c filename.nc
The option -c causes ncdump to output header and coordinate arrays.
CORDEX offers cloud fraction variables for the following three height layers.
|Variable name||Lower boundary in Pa||Upper boundary in Pa|
The height boundaries for the three layers are given as pressure levels and are defined in the CORDEX Archive Design document. The height boundaries of the layer are also stored in the netCDF file in the variable plev_bnds.
The Network Common Data Format (NetCDF) is a binary data format for the exchange of scientific data and consists of a header and a data part. The header contains beside attributes the structure of the data part. The data itself are deposited in arrays in the data part. This enables quick access.
Data variables are defined by means of coordinate variables, for example the near-surface air temperature tas is defined as a function of time, latitude and longitude.
tas(time, lat, lon)
Inside a data array the data are ordered as follows:
For Mathematicians: The order inside the array corresponds to the lexical order of its index set. The index set of the data variable is the cartesian product of the index sets of the coordinate variables, for example
Itas = Itime X Ilat X Ilon
The definition of the data variable in the file header contains the manner and sequence of the coordinate variables.
For Programmers: The first value in the tas array is the value for the first time, first lat and first lon. The second value is that for first time, first lat and second lon. Then the tas values for the other longitudes follow. If the number of longitudes is only 2, now the value for first time, second lat and first lon follows. If the number of latitudes is also 2, the first tas value for the second time appears in position 5.
| 1 1 1 | 1 1 2 | 1 2 1 | 1 2 2 | 2 1 1 | ...
Technically spoken, the values are written to the array in a nested loop. The innermost loop is lon, the outermost is time with lat in the middle.
Solution 1: You may compare the version of the data. The version is part of the metadata and can be found in the ESGF portals. It is also printed in the NetCDF header.
Solution 2: ESGF offers a comfortable comparison using Wget scripting. Keep your Wget script after download and again run it with the -u option.
bash wget-###############.sh -u
This does not repeat the download but creates a new version of the download script. The old and the new script version are compared and this comparison includes the checksums in the download file lists of both scripts. A change in a file checksum is a hint for a new dataset version.
Solution 3: Sometimes data producers replace data without updating the version number in case of minor changes. In ESGF, this is not allowed and fortunately seldom. Ruling out these hidden changes is tedious. You may compare the checksum of your download file with that of a freshly downloaded file. Checksums may be calculated with md5sum:
Data downloaded from ESGF are usually in NetCDF format. NetCDF is a header based binary format and can be read/processed by
An exception is NetCDF OPeNDAP download. Here you can get ASCII CSV, i.e. readable text (Comma Separated Values), or dodc (binary OPeNDAP data format). ASCII CSV can directly be imported, for example, into Microsoft Excel.
There might be several reasons and solutions for this issue:
Solution 1: If you have downloaded the file with your browser's download manager (following a HTTPServer link), compare the checksum of your downloaded file with that in the metadata. In case the checksums are different, repeat the download since the file may have been changed during download. ESGF Wget scripts perform this check automatically.
Solution 2: Many data, especially CORDEX data, are stored in the format NetCDF4 or compressed NetCDF4. Ensure that your local software can handle this relatively new data format.
A multi-year average for each month of year can easily be calculated with CDO ymonavg. Example:
cdo splityear OH_Amon_ULAQ_rcp45_r1i1p1_196001-210012.nc OH_ # split into years cdo cat OH_1995.nc OH_1996.nc OH_1997.nc OH_1998.nc OH_1999.nc OH_2000.nc OH_2001.nc OH_2002.nc OH_2003.nc OH_2004.nc OH_Amon_ULAQ_rcp45_r1i1p1_1995-2004.nc # concatenate to a file containing 10 years cdo ymonavg OH_Amon_ULAQ_rcp45_r1i1p1_1995-2004.nc OH_average_over_1195-2004_ULAQ_rcp45_r1i1p1_Jan-Dec.nc # calculate multi-year average for each month
More details are in the CDO documentation,
Some native CORDEX grids have rotated poles, for example the native European domains EUR-44 and EUR-11. They can easily be regridded (rotated back).
Solution 1: Use interpolated data
Interpolated data are in the domains with "i" at the end, e.g. EUR-44i. These data already have a grid which has been rotated back.
Solution 2: Use cf-python
cf-python uses the ESMF regridding library as its regridding engine, and currently provides first-order conservative (by default) or bilinear spherical regridding. CORDEX data are usually NetCDF/CF compliant; so cf-python only needs the following commands:
The rotated_fields may have more dimensions than just rotated latitude (X) and rotated longitude (Y). The above command will regrid each X-Y slice and so regridded_fields will have the same rank as the original.
import cf rotated_fields = cf.read('rotated_pole_file.nc') unrotated_field = cf.read('unrotated_latlon_file.nc') regridded_fields = rotated_fields.regrids(unrotated_field)
More details in the cf-python documentation
Solution 3: Use CDO
Climate Data Operators (CDO) offer different ways of regridding, for example cdo rotuvb can perform a backward transformation of velocity components U and V from a rotated spherical system to a geographical system. More details in the CDO documentation.