University of Colorado
This page is a guide for ESGF administrators about how to configure their local node to enable downloads of both restricted and public data through Globus.
This guide supports both a full node configuration (Index+IdP+Data) and a split Index+IdP versus data node configuration. Each step below will indicate on which node it needs to be executed.
Node: the same set of Globus credentials can be used when executing the ESGF installation on the Index+IdP and Data nodes.
The ESGF installer will install an up-to-date version of the Globus Connect Server, but to do so it will require a valid Globus account to associate with the node. This account will be used by the node to submit a data transfer request on behalf of the user. So, prior to run the installer, you must obtain a Globus username and password (by visiting the Globus website) that you will use at installation time. For example:
The choice of the Globus username is important, as it will be the first part of all endpoints setup on the node. For example:
Node: Data node.
In order to be downloadable through Globus, datasets must be published into the ESGF system with Globus URLs. This can be achieved by setting:
thredds_file_services = HTTPServer | /thredds/fileServer/ | TDSat<node> | fileservice OpenDAP | /thredds/dodsC/ | OpenDAPat<node> | fileservice GridFTP | gsiftp://<hostname>:2811/ | GRIDFTP | fileservice Globus | globus:<UUID>/ | Globus | fileservice
in the esg.ini file, for example: "globus:b7a8fa70-71d1-11e5-ba4c-22000b92c6ec/. A UUID of the Globus endpoint can be obtained from the Globus website, https://www.globus.org/app/endpoints?scope=my-endpoints.
If you already have published some datasets without Globus URLs, you can run the script, https://github.com/ESGF/esgf-utils/blob/master/globus/add_globus_urls.py, to add the Globus URLs to THREDDS catalogs and re-harvest them without republishing all of the datasets again.
. /etc/esg.env python add_globus_urls.py
Node: Index+IdP node.
The node where CoG is running must be registered as a client that is authorized to submit data transfer requests to the Globus service on behalf of the user. To register CoG app, go to https://developers.globus.org, click "Register your app with Globus", create or add "ESGF" project. Click the "Manager Project" drop down and select "Add new app" and fill out the registration form with the following information:
Submit the registration request by clicking "Create App". Scroll down to "Client secret", enter "Globus download" and click "Generate Secret". Save the "Client Secret" and "Client ID" which will be needed in the next step.
Node: Index+Idp node.
CoG needs access to the Globus client id and secret to be able to request tokens. The following section must be added to the node configuration file: /usr/local/cog/cog_config/cog_settings.cfg on the Index+IdP node, where CoG is running (the values are just example, please replace with your Globus client id and secret received from Globus support):
[GLOBUS] OAUTH_CLIENT_ID = 12345678-9012-3456-7890-123456789012 OAUTH_CLIENT_SECRET = 2345yujhbe3456yuhgfd45234yujhfd3Gev28gFWeBWE42= ENDPOINTS = /esg/config/esgf_endpoints.xml
Also an empty /esg/config/esgf_endpoints.xml file must be created:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <endpoints xmlns="http://www.esgf.org/whitelist"> </endpoints>
The file is a part of a legacy implementation of mapping GridFTP URLs to Globus URLs. The legacy implementation will be removed in the next release.
Public datasets are served through so called "shared" Globus endpoint. The shared endpoint is created from the Globus endpoint described above for restricted datasets. All public datasets will be accessible and downloaded on behalf of a selected ESGF user who has access to a project with public datasets (is a member of the project group). In this document, we assume that the user is https://<idp_hostname>/esgf-idp/openid/rootAdmin, however it is strongly advised to create another dedicated ESGF user account for accessing public datasets. To enable Globus downloads for public datasets, some additional configuration changes are required, besides steps 1, 2, and 5 described above for restricted datasets.
Node: Data node.
At this time, the Globus Connect Server (GCS) installed by ESGF must be specially configured to allow access to shared data.
cat /etc/grid-security/grid-mapfile "/O=ESGF/OU=ESGF.ORG/CN=https://<idp_hostname>/esgf-idp/openid/rootAdmin" sharer "^.*$" globus for example: "/O=ESGF/OU=ESGF.ORG/CN=https://esgf-node.jpl.nasa.gov/esgf-idp/openid/rootAdmin" sharer "^.*$" globus
Note that the OpenId inside the DN refers to the rootAdmin account on the Index+IdP node: X.509 credentials for "rootAdmin" must be obtained from the IdP node, and they will be mapped to the "sharer" Unix account on the Data node.
cat /etc/gridftp.d/globus-connect-server-sharing-esgf sharing_dn "/C=US/O=Globus Consortium/OU=Globus Online/OU=Transfer User/CN=__transfer__" sharing_rp R/esg_dataroot/<project_with_public_datasets> sharing_state_dir /etc/grid-security/sharing/$USER sharing_users_allow sharer sharing_users_deny globus
Node: Data node.
During Globus setup, the ESGF installer creates and registers a default public endpopint for the node. This endpoint must be activated using any ESGF account on the system, for example using the "rootAdmin" account that is created at installation time (the account is only used to retrieve valid credentials from the MyProxy server).
Node: Data node.
Once GCS is up and running on the node, the Node Administrator must create a "shared" endpoint that users can use to download data without any further authentication/authorization. In other words, a "shared" endpoint is suitable for serving public data, and does not need to be manually activated every time a user submits a data transfer request (it is automatically activated by the node through cached credentials).
First, you must create a "sharer" home directory where the shared endpoint information can be stored:
sudo mkdir -p /esg/gridftp_root/home/sharer sudo chown -R sharer:sharer /esg/gridftp_root/home/sharer
Then, you must create a shared endpoint using the Globus website:
Note that after the shared endpoint has been succesfully created, there will be a new configuration file stored in the above directory, of the form: /esg/gridftp_root/home/sharer/.globus/sharing/share-xxx....
Node: Data node.
In order to be downloadable through Globus, datasets must be published into the ESGF system with Globus URLs pointing to the shared endpoint. This can be achieved by setting:
thredds_file_services = HTTPServer | /thredds/fileServer/ | TDSat<node> | fileservice OpenDAP | /thredds/dodsC/ | OpenDAPat<node> | fileservice GridFTP | gsiftp://<hostname>:2811/ | GRIDFTP | fileservice # Globus endpoint for restricted datasets #Globus | globus:<UUID>/ | Globus | fileservice # Globus shared endpoint for public datasets Globus | globus:<UUID_of_the_shared_endpoint> | Globus | fileservice
in the esg.ini file, for example: "globus:2854feb6-bb21-11e5-9a07-22000b96db58/. A UUID of the shared Globus endpoint can be obtained from the Globus website, https://www.globus.org/app/endpoints?scope=my-endpoints.