Hosted by CU logo University of Colorado
Boulder
Powered by ESGF-CoG logo
Welcome, Guest. | Login | Create Account
CoG logo
You are at the CoG-CU node
 

Download and browse data from ESGF with OPeNDAP

OPeNDAP is a data transport architecture and can be used for data download, data browsing and data processing, for example image creation. This page describes data download and browsing in the ESGF via OPeNDAP, especially how to

  • Browse attributes (global attributes and variable-specific attributes in the NetCDF file header)
  • Convert data format to ASCII or dodc
  • Cut out data for a specific area and period
  • Access data with own software

Access via an ESGF portal

Data access via OPeNDAP is possible by using any ESGF portal. Perform a usual ESGF search, an example is in the image below.

Search results list with "Show Files" links

Click on "Show Files" for a file listing.

ESGF search results with opened file list

Click on "OPENDAP" to reach the OPeNDAP Dataset Access Form.

OPeNDAP dataset access form

The OPeNDAP Dataset Access Form consists of a global attributes block and many coordinate variable blocks (in the example time, lat, lat_bnds, lon, lon_bnds) followed by the data variable block (in the example psl). The attributes are taken from the NetCDF file header and can directly be browsed in the blocks.

If you want to cut out an area or period, you may do the following:

  • Enable all the coordinate variables to find out which indices you need (check the checkboxes)
  • Click on the “Get ASCII” button
  • You are asked for your ESGF OpenID and password
  • The ASCII output contains the values of the coordinate variables. The same indices will be used in the data variable array. Choose an index range
  • Check the data variable checkbox and type-in your index ranges there, as done in the screen shot above. Three integers should be set for each coordinate variable: lower boundary index, increment, upper boundary index. If the increment is greater than 1, data would be leaved out. For example, an increment of 2 means that every second value is taken
  • Click on the “Get ASCII” button again for text format or "Get Binary" for dodc. Only these two data formats are available here, not NetCDF

Result in text format for the filled-in OPeNDAP Dataset Access Form above:

Dataset {
    Grid {
     ARRAY:
        Float32 psl[time = 1][lat = 6][lon = 6];
     MAPS:
        Float64 time[time = 1];
        Float64 lat[lat = 6];
        Float64 lon[lon = 6];
    } psl;
} cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/6hr/atmos/6hrPlev/r1i1p1/v20111006/psl/psl_6hrPlev_MPI-ESM-LR_rcp45_r1i1p1_2100010100-2100123118.nc;
---------------------------------------------
psl.psl[1][6][6]
[0][0], 101965.19, 101979.19, 101995.44, 102007.69, 102016.19, 102012.69
[0][1], 101990.19, 101997.69, 102004.94, 101997.94, 101986.94, 101978.44
[0][2], 101932.44, 101936.19, 101921.44, 101885.94, 101856.94, 101856.19
[0][3], 101808.69, 101803.44, 101784.69, 101757.19, 101739.44, 101746.69
[0][4], 101676.69, 101653.94, 101638.44, 101634.19, 101638.19, 101645.94
[0][5], 101527.69, 101498.44, 101475.19, 101468.94, 101477.19, 101482.94

psl.time[1]
91311.0

psl.lat[6]
-32.64199447631836, -30.776744842529297, -28.9114933013916, -27.046239852905273, -25.180986404418945, -23.315731048583984

psl.lon[6]
84.375, 86.25, 88.125, 90.0, 91.875, 93.75

You may copy and paste e.g. the data variable array to a file now. If you have chosen "Get Binary", a download window for the dodc file pops-up.

Aggregations

Usually, data is divided into files of reasonable size, resonable for downloads of whole files. This cut has been done along the time coordinate, i.e. each file contains data belonging to one or few years only. Since the main purpose of OPeNDAP is not the download of whole files, concatenated time series have been made accessible via OPeNDAP, the aggregations.

Aggregations cannot be found in portals. They are only available from ESGF data nodes. Generally, aggregations may only be in the data node that has stored the non-aggregated data. An ESGF portal can therefore be used to find the right data node. Go to the THREDDS catalog of that data node and browse it. When you have found the right dataset, click on its link to get the file list. For the example above, the beginning of the file list is shown in the screenshot below.

File list in the ESGF THREDDS catalog

Scroll down the list until you find the aggregation you need. In the example below, the link to the aggregation has the extension .aggregation.

Link to aggregation in the ESGF THREDDS catalog

Aggregations may be divided into several parts, which are of course longer than the time period of a single non-aggregated file. The aggregation link leads to the page shown in the screenshot below.

Landing page for an aggregation in the ESGF THREDDS catalog

The time period of the aggregation can be taken from section "Time Coverage". Clicking on the link in section "Access" will open the aggregation's OPeNDAP Dataset Access Form. The form can be handled in the same way as for non-aggregated data.

OPeNDAP data URL

The filled OPeNDAP Dataset Access Form in the example above leads to the following URL if "Get ASCII" is pressed:

https://esgf1.dkrz.de/thredds/dodsC/cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/6hr/atmos/6hrPlev/r1i1p1/v20111006/psl/psl_6hrPlev_MPI-ESM-LR_rcp45_r1i1p1_2100010100-2100123118.nc.ascii?psl[0:1:0][30:1:35][45:1:50]

Behind the file extension .ascii it consists of the variable name (in the example "psl") and the variable's index ranges. This URL may be used, for example, by a program for direct data processing. Index ranges and file extension may be changed:

  • .dods instead of .ascii points to the binary file
  • .dds to the Dataset Descriptor Structure file, which is identical with the text header of the dods file
  • .das to the Data Attribute Structure file containing the attributes (text format)

The Dataset Descriptor Structure (DDS) for the example above:

Dataset {
    Grid {
     ARRAY:
        Float32 psl[time = 1][lat = 6][lon = 6];
     MAPS:
        Float64 time[time = 1];
        Float64 lat[lat = 6];
        Float64 lon[lon = 6];
    } psl;
} cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/6hr/atmos/6hrPlev/r1i1p1/v20111006/psl/psl_6hrPlev_MPI-ESM-LR_rcp45_r1i1p1_2100010100-2100123118.nc;

Access data with the command line via OPeNDAP

OPeNDAP data URLs may be used with local software, for example your own script. Since data access is restricted to registered users in ESGF, valid credentials have to be sent with your requests. These credentials can be created on the command line, embedded in a download of a single file with an ESGF Wget script or with the following myproxy command:

myproxy-logon -s <my_ESGF_portal> -l <username> -b -T -t 72 -o ~/.esg/credentials.pem

<my_ESGF_portal> is the DNS name of the portal which you used to create your ESGF account, for example pcmdi.llnl.gov; <username> is not the complete OpenID but its last part only, your user name. Some Linux distributions offer a package myproxy, which also contains the myproxy-logon tool. ESGF Wget scripts and myproxy-logon create and fetch all needed credentials or renew expired local certificates. ESGF Wget scripts automatically create the credentials directory with name .esg in your HOME directory whereas myproxy-logon expects an existing directory .esg in your HOME. In .esg, the file credentials.pem contains two certificates and the private key you need for data access.

In ESGF, user certificates are short-term certificates valid for 72 hours maximum. The exact value depends on the settings in the Identity Provider (IdP) which has issued your OpenID. In a UNIX Shell, you can inquire the period of validity with the following command:

openssl x509 -text -noout -in $HOME/.esg/credentials.pem

The period of validity will be appended to standard output (console) among other output. Example:

        Validity
            Not Before: Jun 24 16:23:10 2016 GMT
            Not After : Jun 27 16:28:10 2016 GMT

If you only want to create or renew your certificate with help of an ESGF Wget script, choose a short data file for download, e.g. a fixed-field file. For example, the surface altitude (variable orog) is time-independent and, hence, orog files are short.

Next, you need an OPeNDAP configuration file .dodsrc in your HOME directory. It can be generated, for example, with the following UNIX command:

cat > .dodsrc << EOF
HTTP.COOKIEJAR=${HOME}/.esg/dods_cookies
HTTP.SSL.VALIDATE=0
HTTP.SSL.CERTIFICATE=${HOME}/.esg/credentials.pem
HTTP.SSL.KEY=${HOME}/.esg/credentials.pem
HTTP.SSL.CAPATH=${HOME}/.esg/credentials.pem
EOF

With these preparations, access of ESGF OPeNDAP data should be possible. For example ESGF OPeNDAP data can directly be processed with ncdump:

ncdump -h http://esgf1.dkrz.de/thredds/dodsC/cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/6hr/atmos/6hrPlev/r1i1p1/v20111006/psl/psl_6hrPlev_MPI-ESM-LR_rcp45_r1i1p1_2100010100-2100123118.nc

ncdump belongs to the NetCDF software and converts the binary NetCDF file to text. The option -h causes ncdump to output the file header only.

A second example: Use of Climate Data Operators (CDO)

cdo showformat http://esgf1.dkrz.de/thredds/dodsC/cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/6hr/atmos/6hrPlev/r1i1p1/v20111006/psl/psl_6hrPlev_MPI-ESM-LR_rcp45_r1i1p1_2100010100-2100123118.nc

cdo showformat simply outputs the format of the specified climate data file.

Also possible: Download using the Wget command

wget --certificate ${HOME}/.esg/credentials.pem --private-key=${HOME}/.esg/credentials.pem --ca-certificate=${HOME}/.esg/credentials.pem --no-check-certificate http://esgf1.dkrz.de/thredds/dodsC/cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/6hr/atmos/6hrPlev/r1i1p1/v20111006/psl/psl_6hrPlev_MPI-ESM-LR_rcp45_r1i1p1_2100010100-2100123118.nc.ascii?psl[0:1:0][30:1:35][45:1:50]

This Wget command writes the same text file as shown above in the first text box. .dods, .dds and .das files can be created using the corresponding file extension in the command.

The credentials directory .esg may also be copied from another computer where it already exists.

Own Python scripts

The esgf-pyclient package enables data access via OPeNDAP and also contains an interface to the ESGF Search API and a help function for login. A good starting point for an own script using esgf-pyclient is Carsten Ehbrecht's demo notebook. Once installed, this IPython notebook can be run in a web browser. It is an interactive worksheet, which enables a step-by-step run of search, login, data access and processing. Even changes in the demo script may be tried.

For installation of the demo notebook go to Carsten's GitHub repository, press the green button "Clone or download" to get the software and follow the instructions in the README.md file, i.e. install Conda and run the three given initialization commands.

Last Update: Jan. 4, 2017, 11:40 a.m. by Torsten Rathmann