University of Colorado
In many cases, data collections accessed through the ESGF system comprise data that are hosted on distributed servers, and that must be found by federated searches initiated at any node in the federation. Such is the case for prominent data collections such as CMIP5/6, CORDEX, obs4MIPs, ana4MIPs etc. For these collections, the search indexes are distributed on multiple nodes, and replicated by the other nodes so that a distributed search can retrieve the complete set of results.
In other cases, data collections are interesting only to specific projects or institutions, and they only need to be exposed through a single ESGF node for users to be able to find them and download the data. In these cases, their metadata should not be replicated across all ESGF nodes in the federarion, with the result of over-inflating all the search indexes, and consequent reduction in performance.
It turns out that an ESGF node administrator can setup and publish data to a "local shard" that is not shared with the rest of the ESGF federation, following a few simple steps, as described below.
Note: local shard supports requires esg-search version 4.3.0 or above, and CoG version 3.0.3 or above.
The following command will create a new Solr instance configuration suitable for publishing and searching local data:
esg-node --add-replica-shard localhost:8982
The local 8982 shard will be started by the esg-node command with neither the option -Denable.master=true, nor with the option -Denable.slave=true (instead, the flag -Denable.localhost=true is used): this means that this shard will not replicate from any other shard, and it will not expose itself to replication from other shards.
In general, data can be published to the local shard by following two simple requirements:
If using the ESGF publisher client to publish data, the above requirements can be fulfilled by a few simple changes in the esg.ini file:
from: https://your.host.name/esg-search/remote/secure/client-cert/hessian/publishingService to: https://your.host.name/esg-search/remote/secure/client-cert/hessian/publishingServiceLocal
Add the "shard" metadata field as part of the specific project configuration:
categories = project | enum | true | true | 0 ................................................ shard | string | true | true | 14 category_default = project | ..... ................... shard | localhost:8982
Search Service URL: http://your.host.name/esg-search/search/ Constraints: do NOT add distrib=false and do NOT specifiy any shard constraint
Search Service URL: http://your.host.name/esg-search/search/ Constraints: shards=localhost:8983/solr,localhost:8982/solr
The first configuration will cause the CoG interface to return results from ALL shards configured in the local file esgf_shards_static.xml (which may include other ESGF nodes throughout the federation); the second configuration will return results only from those shards that are explicitely listed.