Configuration: Basics
Introduction
CRAB is configured by a configuration file called
crab.cfg. The configuration file should be located within the CMSSW user project directory at the same location as the CMSSW parameter-set to be used by
CRAB. It's basic content is described in the following.
Basic crab.cfg
The minimal
CRAB configuration file has the following content:
[CRAB]
jobtype = cmssw
scheduler = edg
[CMSSW]
datasetpath = /RelVal120Higgs-ZZ-4Mu/FEVT/CMSSW_1_2_0-FEVT-1166242770
pset = io.cfg
total_number_of_events = 100
events_per_job = 10
output_file = output.root
[USER]
return_data = 1
use_central_bossDB = 0
use_boss_rt = 0
[EDG]
lcg_version = 2
rb = CERN
proxy_server = myproxy.cern.ch
virtual_organization = cms
retry_count = 2
lcg_catalog_type = lfc
lfc_host = lfc-cms-test.cern.ch
lfc_home = /grid/cms
The
CRAB configuration file is structured into
sections and it is important in which section a specific configuration item is listed. The section in the configuration file given above are
[CRAB]
[CMSSW]
[USER]
[EDG]
Basic parameters
[CRAB] section
Parameter |
Description |
jobtype
|
The jobtype defines the kind of job CRAB should run. As CMSSW only knows one jobtype, this is always cmssw
|
scheduler
|
The scheduler defines which GRID middleware is to be used by CRAB. There are 3 different schedulers for EGEE and one special scheduler only for OSG:
Scheduler |
Description |
edg
|
Default access mode to all EGEE and OSG resources using the resource broker.
|
glite
|
New access mode to all EGEE and OSG resources using the new gLite resource broker.
|
glitecoll
|
New access mode to all EGEE and OSG resources using the new gLite resource broker in high performance bulk mode.
|
condor_g
|
Direct access mode to only OSG sites (requires local Condor scheduler (see Local user interface for sh family or Local user interface for csh family)).
|
|
[CMSSW] section
Parameter |
Description |
datasetpath
|
The datasetpath identifies the dataset you want to access. It can be queried by using the CMS data discovery page: http://cmsdbs.cern.ch/discovery/. More information is given at Dataset discovery and job configuration.
|
pset
|
The name of the CMSSW parameter-set of your CMSSW job. The parameter-set has to be in the same directory as the CRAB configuration file.
|
total_number_of_events
|
Total number of events to be processed by CRAB. If set to -1 , all events of the selected dataset are processed. More information is given at Dataset discovery and job configuration.
|
events_per_job
|
Number of events per job. CRAB will create as many jobs as needed to process the total_number_of_events . Due to technical reasons, the number of jobs may be larger than the mathematical number of jobs (total_number_of_events/events_per_job) due to constraints for the job splitting. More information is given at Dataset discovery and job configuration.
|
outputfile
|
Comma-separated list of output filenames. Usually the filename selected in the PoolOutputSoure of the CMSSW parameter-set but can also hold user-specific output filenames like histogram files, etc. . These name is used by CRAB when generating the output filenames of the individual jobs. CRAB automatically adds job identifiers to the output filenames of the individual jobs so that the user can distinguish them. For example, if the output filename is output.root and the selected CRAB configuration results in 10 jobs, the output filenames of the individual jobs are named: output_00001.root, output_00002.root, ...
|
[USER] section
Parameter |
Description |
return_data
|
Defines the way CRAB handles user output. Default is 1 for using the GRID middleware sandbox. Attention: the sandbox is limited to 100 MB. More information is given at Output handling
|
use_central_bossDB
|
BOSS specific parameter.
|
use_boss_rt
|
Boss specific parameter.
|
[EDG] section
Parameter |
Description |
lcg_version
|
EGEE resource broker specific information.
|
rb
|
Defines which resource broker configuration should be used. If set to CERN , the official CERN configuration is downloaded from cmsdoc.cern.ch, if set to CNAF , the configuration for the CNAF resource broker is downloaded. If this parameter is commented out, the default of the used user interface is used.
|
proxy_server
|
Defines the grid proxy server name
|
virtual_organization
|
Has to be: cms
|
retry_count
|
Resource broker parameter, defines how often the resource broker should try to resubmit a job before giving up.
|
lcg_catalog_type
|
LFC catalog specific parameter.
|
lfc_host
|
LFC catalog specific parameter.
|
lfc_home
|
LFC catalog specific parameter.
|