Info that can be extracted from LHCb Dirac jobs
Basically there are two sets of infos for Dirac jobs "parameters" and "attributes", I extracted the ones I found to be possibly useful, there are several more......
In addition there is a "Logging Info" which marks state changes of the job was going through the
DIRAC system from submission until the final state.
LoggingInfo |
Description |
Example |
Source |
the overall status of the job |
JobManager, InputData, |
Status |
the status of the job as reported in the job monitor i.e. following a "state machine" |
Received, Checking, Running, Done, Failed |
Minor Status |
a minor status inside the Status above |
Job accepted, JobScheduling, ... |
Application Status |
the status of the payload application if it has started |
gaudi-script.py Successful, Executing DaVinci Step X |
DateTime |
The date and time of the status report in UTC |
2014-11-11 15:15 |
Parameter Name |
Description |
Example(s) |
JobID |
Dirac Job ID |
82141213 |
Status |
Final Status of the Job |
Done / Failed |
StartExecTime |
time stamp when the payload started on the worker node |
2014-07-07 13:30:51 |
RescheduleCounter |
number of times the same job was retried e.g. b/c of input data not retrievable |
0 |
Minor Status |
internal fine grained Dirac state |
Requests done |
ApplicationStatus |
payload status |
Job Finished Successfully |
JobType |
major job types, ~ dozen |
DataProcessing, MonteCarlo, User, .... |
SubmissionTime |
time stamp when the job was submitted to Dirac |
2014-07-07 12:30:51 |
Site |
LHCb site name where the job was executed |
LCG.CERN.ch |
EndExectime |
same as StartExecTime for payload ending on the WN |
2014-07-07 14:30:51 |
UserPriority |
internal Dirac priority of the job |
2 |
CPUTime |
might be useful but the ones I checked are all |
0.0 |
Attribute Name |
Description |
Example |
Pilot Reference |
WLCG name e.g. |
https://ce404.cern.ch:8443/CREAM895774125 |
CPUScalingFactor |
The scaling factor as posted in the BDII |
4.0 |
LoadAverage |
last (?) load of the WN reported |
24.14 |
DownloadInputData |
the input data for this job if any |
Successfully downloaded LFN(s): /lhcb/LHCb/Collision12/CHARMTOBESWUM.DST/00020349/0002/00020349_00024058_1.CharmToBeSwum.dst Downloaded 1 / 1 files from local Storage Elements on first attempt. |
WallClockTime(s) |
seconds it took to execute the payload |
18965.2357981 |
CacheSize(kB) |
cache used on the WN |
12288KB |
LastUpdateCPU(s) |
last CPU seconds reported by the watchdog |
15695.0 |
DiskSpace(MB) |
scratch space available |
6076.0 |
HostName |
host where the payload was executed |
p01001532193397.cern.ch |
TotalCPUTime(s) |
Total CPU seconds used to execute the payload |
16251.38 |
CPUNormalizationFactor |
the CPU scaling as measured by the LHCb pilot |
9.3 |
NormCPUTime(s) |
CPUNormalizationFactor * TotalCPUTime(s) |
151137.834 |
ScaledCPUTime |
.... |
117392.0 |
LocalJobID |
local batch system job ID |
543051900 |
ModelName |
cpu type as reported by the WN |
Intel(R)Xeon(R)CPUL5640@2.27GHz |
PayloadPID |
process id of the payload on the WN |
2200 |
AgentLocalSE |
what the job thinks are its local SEs to down/upload data |
IN2P3-RAW,IN2P3-DST,IN2P3_M-DST,IN2P3-USER,IN2P3-FAILOVER,IN2P3-RDST,IN2P3_MC_M-DST,IN2P3_MC-DST,IN2P3-ARCHIVE,IN2P3-BUFFER |
UploadedOutputData |
the output data produced successfully uploaded to (some) storage |
00039801_00022431_4.swimstrippingd02kskk.mdst |
OK |
but what? |
True |
LocalAccount |
wn unix id executing the payload |
lhbplt01 |
Memory(kB) |
max memory consumed by the payload |
2450756kB |
MemoryUsed(kB) |
total memory in use on the WN |
6140024.0 |
CPU(MHz) |
cpu clock speed |
2268.000 |
--
StefanRoiser - 15 Jul 2014