Trigger AOD Size Reduction for Rel 13
TrigDecision and HLTResult in 13.0.30
A potential size reduction for 13.0.30 was to make the HLTResults in TrigDecision into DataLinks instead of embedded data members. This will not work because when trigger hypo's are re-run, the original HLTResult is re-written in StoreGate, thus both the old TrigDecision and the new TrigDecision objects will point to the same HLTResult.
To get around this, there seems to be two solutions:
- keep the HLTResult as a embedded data member of TrigDecision - therefore each new TrigDecision object will contain the new HLTResult - and do not save the HLTResult separately
- make the HLTResult in TrigDecision into a DataLink, then keep multiple copies of HLTResult in StoreGate - this requires re-writting of the
TrigSteering/LoopbackConverterFromPersistency
class so that the old HLTResult is not removed, and the new one gets some sensible name.
The first solution seems more robust because then there is no way to accidentally delete the HLTResult object which is necessary for TrigDecision, there is less book-keeping. Neither solution is feasible for a 13.0.30 pcache, so it will have to wait until 13.X.0, and we will have to live with the duplicated 10kB from TrigDecision.
--
AndrewHamilton - 16 Oct 2007
Results with new menu in 13.0.30.1 Sample A
Here is a table of the event size breakdown for 5 datasets (all numbers are kB/event based on 1000 events). The numbers are extracted using checkFile.py and
checkFileParse.py, they are all within 5% of the 'correct' event size (due to checkFile.py underestimation):
Dataset (AOD) |
total size |
event |
truth |
calo |
indet |
muon |
met |
jet |
tau |
eg |
trigger |
5011.J2_pythia_jetjet |
187 |
6.0 |
31.3 |
25.7 |
26.2 |
3.7 |
3.3 |
36.1 |
1.2 |
6.5 |
42.1 |
5144.PythiaZee |
175 |
6.4 |
25.6 |
20.7 |
19.0 |
3.9 |
3.3 |
33.8 |
1.4 |
5.6 |
49.7 |
5702.PythiaB_BsJpsiphi |
220 |
6.0 |
29.0 |
23.9 |
26.3 |
32.0 |
3.4 |
37.9 |
0.8 |
3.4 |
51.2 |
6384.PythiaH120gamgam |
179 |
6.4 |
26.6 |
22.3 |
20.2 |
3.5 |
3.4 |
32.9 |
1.3 |
5.0 |
51.5 |
5200.T1_McAtNlo_Jimmy |
418 |
8.1 |
53.7 |
36.8 |
45.8 |
17.6 |
3.8 |
65.6 |
3.4 |
18.2 |
158.0 |
--
AndrewHamilton - 08 Oct 2007
Results with new menu in 13.0.30
From
/afs/cern.ch/atlas/project/RTT/Work/rel_0/val/build/i686-slc4-gcc34-opt/offline/RecExAnaTest/AthenaRecExCommon/RecExTrigTest_RTT_esdprod/319/AOD.pool.root.checkFile
we can see that we are now back up to ~160kB/event for the trigger.
Using
checkFileParse.py (to use remove the
.txt
), I get the following breakdown:
Summary of catagories:
8.132 eventinfo
63.745 truth
36.378 calo
44.364 indet
18.299 muon
3.324 met
64.346 jet
3.194 tau
17.471 egamma
162.405 trigger
With the further breakdown of the trigger items (407 events in file):
Trigger Items:
kB/evt n items class
0.023 536 POOLContainer_MuonFeature
0.039 814 POOLContainer_TrigMissingET
0.090 722 POOLContainer_TrigMuonEFContainer
0.135 407 TrigConf::Lvl1AODPrescaleConfigData_p1_AODConfig-0
0.268 2589 POOLContainer_TrigT2Jet
0.290 503 POOLContainer_CombinedMuonFeature
0.297 407 CTP_Decision_p2_CTP_Decision
0.306 1944 POOLContainer_TrigTau
0.371 6457 POOLContainer_DataVector<TrigL2Bjet>
0.375 15103 POOLContainer_TrigRoiDescriptor
0.392 6457 POOLContainer_DataVector<TrigEFBjet>
0.452 407 LVL1_ROI_p1_LVL1_ROI
0.626 407 TrigConf::Lvl1AODConfigData_p1_AODConfig-0
0.647 277 POOLContainer_DataVector<TrigEFBphys>
0.710 1928 POOLContainer_TauJetContainer_p1
0.779 2144 POOLContainer_TrigEMCluster
1.085 3093 POOLContainer_egammaContainer_p1
1.135 2009 POOLContainer_TrigTauCluster
1.508 1928 POOLContainer_TauDetailsContainer_tlp1
1.627 3927 POOLContainer_CaloClusterContainer_p2
2.153 3745 POOLContainer_DataVector<TrigElectron>
2.531 3868 POOLContainer_egDetailContainer_p1
2.831 5220 POOLContainer_DataVector<TrigPhoton>
4.002 407 HLT::HLTResult_p1_HLTResult_L2
4.116 407 TrigInDetTrackTruthMap_MyTrigInDetTrackTruthMap
6.818 407 HLT::HLTResult_p1_HLTResult_EF
10.879 407 TrigDec::TrigDecision_p1_TrigDecision
12.345 481 POOLContainer_DataVector<TrigL2Bphys>
13.855 407 TrigConf::HLTAODConfigData_p1_AODConfig-0
16.150 20827 POOLContainer_DataVector<TrigVertex>
32.086 6356 POOLContainer_Rec::TrackParticleContainer_tlp1
43.484 12100 POOLContainer_TrigInDetTrackCollection
Ideas to reduce size:
- make
HLTAODConfigData
once per file, not once per event, savings ~14kB/event, (Till)
- make
HLTResult
and LVL1Result
ElementLinks, not pointers, in TrigDecision
, savings ~11kB/event, (Andrew) - not possible in 13.0.30, see above
- make
TrigVertex
pointer in TrigL2Bphys
transient, savings ~10kB/event, (Julie)
- make
std::list
and double m_cov[6]
transient in TrigVertex
, savings ~12kB/event, (Julie - but are there other TrigVertex
clients?)
However, the biggest factor in the size change is the number of RoI's per event (50
TrigVertex
, and 30
TrigInDetTrackCollection
per event!) due to the change of menu. By disabling the
BphysicsSlice
, the size of the trigger EDM is significantly reduced. Table shows trigger size per event based on checkFileParse.py calculation after correction for checkFile estimate errors:
50 ttbar events |
AOD file size (kB) |
size from checkfile sum |
correction |
trigger size (kB/evt after corr.) |
13.0.30 default |
13719 |
13080 |
1.05 |
173 |
13.0.30 no BPhysSlice |
10281 |
9650 |
1.06 |
107 |
--
AndrewHamilton - 26 Sep 2007
List of changes that have been made:
- the L1 objects RecEmTauRoI, RecJetRoI, and RecEnergyRoI were removed from ESD and AOD because they are not expected to be used for user trigger analysis. (very small space savings expected)
- Olya removed persistence of pointers from TrigTau to TauCluster and TrigInDetTrackCollection. (Expect to save most of 6.4 kB/event since TrigTau is small compared to track and cluster. Cluster and track can be reached by navigation instead.)
- L2Result and EFResult in 12.0.6 (total 23.8 kB/evt) are replaced by new HLTResult class in 13.0.0. Expect significant size reduction, to be quantified. This saving will however be offset by the config data unless that can be stored per run. (See end of next section).
- Ricardo Goncalo removed the TrigInDetTrack and TrigEMCluster pointers from TrigElectron, but had to add 3 ints and 1 float. He also did the necessary changes to TrigL2IDCaloHypo/Fex and the configuration files (4 April 2007).
- Denis Damazio updated the "double" to "float"s for Egamma and Taus (TrigCaloEvent-00-01-23 in CVS, but not yet in the tag collector). He tested it with TrigT2CaloEgamma/Common and Egamma ntuple filling (CBNT_TrigT2Calo) and found only one error of 10^(-6) in Eratio. Another detail is that ESDs made before the change will not be readable back (17 April 2007)
- Iwona has made the change to allow track pt thresholds to be set depending on the algorithm, so EF tracks can have a pt threshold of 1 GeV for most triggers (tau and full scan excepted). Expected space saving for EF is 16% of 25 kB = 4 kB; will be less because of tau exception.
- Carlo arranged to drop tracks from SiTrack with only 3 space points as they are never used. est. saving 50% of tracks from L2 b-jet slice. There are approx. 3.4 J20 RoIs per event (J20 are processed by the b-jet trigger), so this affects 3.4/19.8 track collections, but these probably have more tracks than average, so expect to save at least 1.7 kB saving (approx 1kB/collection).
List of things we are currently testing:
- doubles to floats
- no problems expected
- total savings expected ~10%
- TrigInDetTrack needs double precision during reconstruction. Float precision is adequate for persistent rep so use TP separation
- main classes now done
- put min pt threshold on TrigInDetTracks in track collections
- ~8% of tracks in all LVL2 track collections and about 16% of EF tracks have pt<1Gev;
- most LVL2 low-pt tracks come from TRTxk - see below. Try: TRTMinRecPt = 1000 #MeV. Expected saving ~1.8 kB.
- remove covariance matrix from Rec::TrackParticle
- made possible with new version of TrackParticle. Needs change in the way they are created by EF ID code. Jiri & Andrew will look into this for 13.0.20.
- drop the VxContainer
- where vertex position is needed (e.g. for tau, bjet, bphysics), use Trk::RecVertex (or event just Trk::Vertex) as the object put in the HLT navigation and retrieved by the hypothesis. It holds the vertex position as a Hep3Vector (in the Vertex base class) and if needed, RecVertex holds the ErrorMatrix and FitQuality.
- ok for high-pt electrons, photons, muons and single-prong taus. 3-prong taus make their own RecVertex so also ok.
- to check: b-jets, b-physics
- save most of 14.6 kB?
- what needs to be done in practice?
- drop VxContainer from output stream lists should be enough
- preferably, ElementLinks should not be set either, otherwise trying to dereference them in AOD would invoke lengthy attempts to back track before failing.
- Trigger configuration in AOD
- currently (rel 13 nightlies) stored run by run
- estimated size for 400 chains (based on CDF/D0 trigger) is 40kB; for rel 13 menu estimate approx 5kB.
- should be stored per event instead, waiting for RDS/DM to provide infrastructure.
List of things we have thought of and the reason we are not pursuing them:
- Use of ElementLinks in LVL2 trigger EDM because ElementLinks require StoreGate, which is not necessarily available online.
- New navigation: feature request has been made to Tomasz to make it possible to selectively drop collections from one algorithm but not another, if the algorithms save data in the HLT navigation with their instance name as the label. It is not currently supported but is being considered. - deferred to rel 14
Understanding the size of the 12.0.6 trigger AOD
File: 100 top events: mc11.004100.T1_McAtNLO_top.digit.RDO.v11000401._00001
Run 12.0.6
RecExCommon with AllAlgs=False.
Total size 116 kB/event
Top 10 classes which take 90% of size.
No. collections/evt |
Disk size/evt (kB) |
Class name |
12.2 |
25.1 |
Rec::TrackParticleContainer |
22.6 |
22.4 |
TrigInDetTrackCollection |
1.0 |
18.0 |
EFResult |
12.2 |
14.6 |
VxContainer |
4.7 |
6.4 |
TrigTau |
1.0 |
5.8 |
L2Result |
8.2 |
4.1 |
TrigElectron |
1.0 |
3.8 |
DataHeader |
9.0 |
2.7 |
JetCollection |
1.0 |
2.3 |
TrigInDetTrackTruthMap |
Understanding the number of collections
- take into account the average no. of RoIs of each type
- three LVL2 tracking algorithms run for each EM RoI, so three collections produced per EM RoI
- MU, HA (tau) RoIs each run one tracking algorithms
- One track collection is produced for JT20 (not the lowest jet threshold, about half the jets pass it)
- TrigElectron collection is produced for each threshold (except the lowest) for each EM RoI e.g. TrigInDetTrackCollection: 3 (track algos) x 3.6 (EM RoIs) + 5 (HA) + 0.6 (MU) + 0.5 (JT20) * 6.8 = 19.8
Understanding the size of tracks
- According to Dmitry's slides from the December Trigger AOD meeting:
TrigInDetTrack with end params contains 80 doubles (x 8 bytes) and 7 ints (x 4 bytes) => 668 bytes/track
- According to checkFile.py on the 100 top events above, the memory size (not compressed disk size) of TrigInDetTrackCollections is 34 kB/event
- According to the CBNT (see below) there are approx 57 tracks per event.
- 57 x 668 = 38 kB/event
- Not too far from the actual size shown by checkFile.py
- Difference: perhaps some tracks don't have end params?
The effect of pointers
- TrigTau contains pointers to a TrigInDetTrackCollection and a TrigTauCluster. Since POOL/ROOT follows the pointers and inserts the actual objects, they are included in the TrigTau. Clearly these objects dominate the size of the TrigTau and therefore removing the pointers should reduce it to a negligable size on disk.
Details of Testing
Track Pt Cut
Made a CBNT by adding
doWriteCBNT=True
to my job options file. Looked at the pt of the tracking collections in
ROOT by doing:
-
TFile* f=new TFile("ntuple.root")
-
TTree* t=(TTree*)f->Get("CollectionTree")
-
t->Draw("T2IdPt","(T2IdPt>-20)&&(T2IdPt<100000)")
-
t->Draw("T2IdPt","(T2IdPt>-20)&&(T2IdPt<100000)&&(T2IdAlgo==1)")
See
TrigInDetTrack.h
for definitions of
T2IdAlgo
variable:
SITRACKID=1
,
IDSCANID=2
,
TRTLUTID=3
,
TRTXKID=4
In 100 events, here are the number of LVL2 tracks less than 1
GeV for each collection:
Collection |
No. Tracks < 1Gev |
No. Tracks < 100Gev |
SITRACKID |
0 |
1141 |
IDSCANID |
34 |
3248 |
TRTLUTID |
0 |
0 |
TRTXKID |
428 |
1354 |
All Collections |
462 |
5743 |
EF tracks: total 8257, no. of tracks with pt under 1
GeV 1333.
Therefore potential saving of EF track collection size by 1
GeV min pt cut is 4kB.
Working area for 12.0.6
A trial work area for 12.0.6 has been created at
/afs/cern.ch/atlas/software/dist/trials/v-a/sgeorge/