Initial efforts
GRATIA is not well documented, and I need to get a handle on where to go.
There will be a
ProbeConfig
file that allows the probe to operate and etc. That was prepared by Xin in the depths of time, and has not been changed by me yet. There are many things I don't understand here. Seems to have no direct link to the probe path.
I have had no success so far in my fumbling attempts to create and populate a
UsageRecord -- I think perhaps I have missed a step in not having
ProbeConfig in my working directory?
Fixed and hoping that will be the sticking point.
Just discovered that there were logs that I was indeed creating
UsageRecords in the logs -- but I suppose that since I had no valid
ProbeConfig, that was useless. In gratia/var/logs
I am using the SGE probe template as a starting point. Seems the command sequence is:
1. Get a line from the SGE job log that looks kind of like this:
all.q:osp-vtb00.nersc.gov:joeuser:group1:simple.sh:5:sge:0:1169517620:1169527990:1169528010:0:0:20:0:0:0.000000:0:0:0:0:3281:1:0:0.000000:0:0:0:0:32:41:NONE:defaultdepartment:NONE:1:0:0.040000:0.000080:0.000000:NONE:0.000000:NONE:8212480.000000
2. Parse it via a regular column structure pre-programmed into the SGE object to create a dictionary (in the
init call). Simple -- I can get the dictionaries directly from cx_Oracle. I'll just map them to an internal variable in an init call? Is that even necessary?
3. Create a
UsageRecord
>
>> r=Gratia.UsageRecord('Batch')
2011-05-20 15:42:48 EDT Gratia: Creating a Record 2011-05-20T19:28:50Z
Traceback (most recent call last):
File "
", line 1, in
File "Gratia.py", line 2067, in init
super(self.__class__,self).__init__()
File "Gratia.py", line 1920, in init
self.__ProbeName = Config.get_ProbeName()
AttributeError: 'NoneType' object has no attribute 'get_ProbeName'
OK -- this didn't come up when I was working without a ProbeConfig -- so I'm changing it to PandaMeter (to match the filename)
Since that failed as well, I suppose I have to read the error message. Here's what I see -- that there is no defined Config object (it's uninitialized), so it is indeed NoneType. Off to the code.
Tried using just the straight probe template -- failed in the same way. Not a syntax problem.
Durrr --
Gratia.Initialize()
4. Populate the UsageRecord
Adding JobId and UserId and all that -- use strings only. There should be more protection in Gratia for these things, but oh well.
Code seems to be (at least initially) developed.
5. Gratia.Send(r)
No unsuppressed usage records in this packet: not sending
Dang! OK, checking -- this comes from unconfirmed XML.
>>> out[11]
{'maxdiskunit': None, 'assignedpriority': 1000, 'dispatchdblock': None, 'ninputdatafiles': 1, 'nevents': 0, 'creationtime': datetime.datetime(2011, 5, 16, 0, 49, 37), 'maxcpucount': 0, 'cpuconsumptionunit': None, 'destinationse': 'ANALY_NIKHEF-ELPROD', 'maxattempt': 0, 'minramunit': None, 'exeerrorcode': 0, 'pilotid': None, 'specialhandling': 'rebro', 'jobsetid': 2671, 'modificationhost': 'voatlas59.cern.ch', 'brokerageerrorcode': 0, 'relocationflag': 1, 'cloud': 'NL', 'sourcesite': '2599', 'workinggroup': None, 'ninputfiles': None, 'homepackage': 'AnalysisTransforms-AtlasProduction_16.0.2.4', 'prodsourcelabel': 'user', 'ddmerrorcode': 0, 'produsername': 'elisa piccaro', 'taskbuffererrordiag': 'killed by Panda server : upstream job failed', 'ipconnectivity': None, 'jobdispatchererrorcode': 0, 'attemptnr': 0, 'maxcpuunit': None, 'superrorcode': 0, 'metadata': None, 'cpuconversion': None, 'vo': 'atlas', 'computingelement': 'to.be.set', 'inputfiletype': 'AOD', 'transexitcode': None, 'proddbupdatetime': datetime.datetime(1, 1, 1, 0, 0), 'currentpriority': -3229, 'transformation': 'http://pandaserver.cern.ch:25080/trf/user/runAthena-00-00-11', 'jobdefinitionid': 2672, 'jobdispatchererrordiag': None, 'pandaid': 1236909887, 'piloterrorcode': 0, 'maxdiskcount': 0, 'superrordiag': None, 'jobparameters': None, 'proddblock': 'data10_7TeV.periodE.physics_Muons.PhysCont.AOD.repro05_v02/', 'processingtype': 'pathena', 'commandtopilot': None, 'cpuconsumptiontime': 0, 'jobname': '13d1c1c9-9bec-4087-a0de-f35b5224b64b', 'batchid': None, 'brokerageerrordiag': None, 'grid': None, 'jobstatus': 'cancelled', 'parentid': 1236204503, 'atlasrelease': 'Atlas-16.0.2', 'endtime': datetime.datetime(2011, 5, 16, 1, 25, 39), 'prodserieslabel': None, 'computingsite': 'ANALY_NIKHEF-ELPROD', 'exeerrordiag': None, 'ddmerrordiag': None, 'destinationsite': None, 'destinationdblock': 'user.epiccaro.JPsiIt_PeriodE_common_v3/', 'corecount': None, 'inputfileproject': 'data10_7TeV', 'pilottiming': None, 'cmtconfig': 'i686-slc5-gcc43-opt', 'taskbuffererrorcode': 100, 'modificationtime': datetime.datetime(2011, 5, 16, 1, 25, 39), 'minramcount': 0, 'schedulerid': None, 'lockedby': 'panda-client-0.3.41', 'transfertype': None, 'starttime': None, 'produserid': '/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy', 'countrygroup': None, 'jobexecutionid': 2613, 'piloterrordiag': None, 'statechangetime': datetime.datetime(2011, 5, 16, 1, 25, 39), 'creationhost': 'lxplus308.cern.ch', 'taskid': 427, 'inputfilebytes': 3816013252}
>>> r=GetRecord(out[11])
>>> r.XmlCreate()
>>> r.XmlData
['\n', 'file:///u:/OSG/urwg-schema.11.xsd">\n', '\n', '\n', '\t', 'None', '\n', '\t', '1236909887', '\n', '\n', '\n', '\t', '/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy', '\n', '\t', 'elisa piccaro', '\n', '\t', '/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy', '\n', '\n', '\t', '13d1c1c9-9bec-4087-a0de-f35b5224b64b', '\n', '\t', 'cancelled', '\n', '\t', 'PT23H23M58.0S', '\n', '\t', 'PT23H23M58.0S', '\n', '\t', 'PT0S', '\n', '\t', '0', '\n', '\t', '2011-05-16T04:49:37Z', '\n', '\t', '2011-05-16T05:25:39Z', '\n', '\t', 'to.be.set', '\n', '\t', 'lxplus308.cern.ch', '\n', '\t', 'ANALY_NIKHEF-ELPROD', '\n', '\t', 'ANALY_NIKHEF-ELPROD', '\n', '\t', '', '\n', '\t', u'PandaMeter', '\n', '\t', u'PanDA_ATLAS', '\n', '\t', u'OSG', '\n', '\t', '1', '\n', '\t', 'Batch', '\n', '\n']
>>> xmlDoc = Gratia.safeParseXML("".join(r.XmlData))
>>> CheckXmlDoc(xmlDoc,False)
0
OK -- that's what's happening. There's something wrong with the XML we are using.
<?xml version="1.0" encoding="UTF-8"?><JobUsageRecord xmlns="http://www.gridforum.org/2003/ur-wg" xmlns:urwg="http://www.gridforum.org/2003/ur-wg" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gridforum.org/2003/ur-wg file:///u:/OSG/urwg-schema.11.xsd">
<RecordIdentity urwg:recordId="griddev01.usatlas.bnl.gov:16752.1" urwg:createTime="2011-05-20T23:26:55Z" />
<JobIdentity>
<LocalJobId >None</LocalJobId>
<GlobalJobId >1236909887</GlobalJobId>
</JobIdentity>
<UserIdentity>
<LocalUserId >/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy</LocalUserId>
<GlobalUsername >elisa piccaro</GlobalUsername>
<DN >/C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=elisa piccaro/CN=proxy</DN>
</UserIdentity>
<JobName >13d1c1c9-9bec-4087-a0de-f35b5224b64b</JobName>
<Status >cancelled</Status>
<TimeDuration urwg:type="submit" >PT23H23M58.0S</TimeDuration>
<WallDuration urwg:description="Was entered in seconds" >PT23H23M58.0S</WallDuration>
<CpuDuration urwg:usageType="user" urwg:description="Was entered in seconds" >PT0S</CpuDuration>
<Processors consumptionRate="total" urwg:metric="total" >0</Processors>
<StartTime urwg:description="Was entered in seconds" >2011-05-16T04:49:37Z</StartTime>
<EndTime urwg:description="Was entered as text" >2011-05-16T05:25:39Z</EndTime>
<MachineName >to.be.set</MachineName>
<SubmitHost >lxplus308.cern.ch</SubmitHost>
<Host primary="true" >ANALY_NIKHEF-ELPROD</Host>
<Queue >ANALY_NIKHEF-ELPROD</Queue>
<Resource urwg:description="user" ></Resource>
<ProbeName >PandaMeter</ProbeName>
<SiteName >PanDA_ATLAS</SiteName>
<Grid >OSG</Grid>
<Njobs >1</Njobs>
<Resource urwg:description="ResourceType" >Batch</Resource>
</JobUsageRecord>
[18:34:01 Satchel ~]$ xmllint --valid test.xml
test.xml:1: validity error : Validation failed: no DTD found !
Location="http://www.gridforum.org/2003/ur-wg file:///u:/OSG/urwg-schema.11.xsd"
Missing a schema file?
That's not it -- the same structure works if I only add a JobId. Anything else creates the same problem, even if it's clearly no threat to the XML.
15:16:34 EDT Gratia: Warning: UserIdentity block does not have exactly one populated LocalUserId node in Unknown Unknown
15:16:34 EDT Gratia: Info: suppressing record with Unknown Unknown due to Grid == Local
15:16:34 EDT Gratia: No unsuppressed usage records in this packet: not sending
15:16:34 EDT Gratia: *********************************************************
Hm. OK, I made the config file not require Grid records (SuppressGridLocalRecords = '1' -> '0')
Didn't work. But when I turned debug to 5, I managed to get somewhere -- the rejection came from LocalJobId being blank. After adding it, we're in business. Yay!
Now I need to figure out what to do with the certs. I am asking Philipe and Steve.
-- AldenStradling - 20-May-2011