Configuration: Dataset discovery and job configuration
Introduction
Data discovery and job configuration is one of the most important steps in using
CRAB as it defines on which dataset your jobs will run. The following describes the data discovery and
CRAB configuration concerning dataset selection. It also describes how to control at which
GRID site the jobs will run.
Data discovery
CMS provides the user with a
Dataset discovery service at
http://cmsdbs.cern.ch/discovery/. Various options are available to narrow down the selection of datasets. More information can be found at
[https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookDataSamples.
The result of the selection process is a single datasetpath in the format
/<dataset>/<tier>/<processed dataset>
and a list of storage elements where this dataset is available. Enter the datasetpath into the
datasetpath
field in the [CMSSW] section of your
CRAB configuration file.
CRAB will automatically configure your jobs to run over the correct files and will use one of the available
GRID sites to run them.
Control at which GRID site the jobs will run
To control at which
GRID site the jobs will run, you can use the list of storage elements from the discovery service. You can define in the [EDG] section sites which the job should use exclusively to run your jobs by specifying a comma-separated list of the corresponding storage elements:
se_white_list = cmssrm.fnal.gov,srm.cern.ch
On the other hand, you can exclude sites from running your jobs by providing a comma-separated list of corresponding storage elements in the [EDG] section:
se_black_list = cmssrm.fnal.gov,srm.cern.ch
You can further narrow down the site selection by using additional compute element criteria:
ce_white_list = cmslcgce.fnal.gov
ce_black_list = cmslcgce.fnal.gov
Note: the Condor-G direct submission mode requires one and only one storage element selected by the se_white_list
parameter. If there are more compute elements associated to a storage element, the user has to specify one using the ce_white_list
parameter.
Software availability
A big requirement for successful
CRAB submission is the availability of the used software version. To check if there are sites which have the software version used by the user installed, use the following sites: