Towards a common format for interfacing fast detector simulation and analysis tools

Interested people

MadAnalysis people: E. Conte, B. Fuks, G. Serret
Delphes people
Bats ANR project: J. Andrea, C. Collard

Goals of this page

Set up a new format allowing for communication between fast detector simulation and analysis tools.
This format must come with a reader and possibly a concatenator allowing for merging the samples easily.
This format must be compatible with all the existing event formats (stdhep, hepmc, lhe, etc...).
It must include all the features needed by the existing codes (delphes, madanalysis, pgs, ...).
The format must be flexible: no breaking if information are missing => the reader should know what to do.
The format should be backward and forward compatible: if one adds a new attribute (this is hard with Root, see below).

Concepts (alphabetical order)

Branching system (time efficiency): one should be able whether or not to load specific branches of the event (e.g., reading photon collections or not according to the needs).
Flexible, easily extendible: including new attribute/method/whatever should not break the format.
Interoperability: open format
Object-oriented: all the information is available once the event is loaded; pointers can be included in the format.
Pipe-lining: avoiding reading and writing files (efficiency, time,...).
Size of the variables: universality (int = 16/32/64bits? etc...).
Not reinventing the wheel if not necessary: STL libraries, TLorentzVector objects work well => to be used.
Traceability: dates, authors, versions, changelog, ...

First draft for the format (so far, nothing is fixed)

General features

A folder structure, with collections included in each folder, seems to be the most appropriate since we have different types of information (see below).
The collections should be able to communicate (gen-reco matching) through pointers.
This fits very well into a Root structure.
- Problem with the Root serialization: if we add a new attribute -> we loose the backward compatibility.
- Code versioning, general information about the events, etc are not easy to save in the Root format.
- One possible solution consists in Sezen's method (STL vector of simple variables) but the structure becomes messy.
- Another obvious option is to get rid of Root => what should we use? Do we have the time budget to develop something new? For instance, using Boost libraries ?
- One could think to an adaptive serializer which is passed to Root a priori so that Root could generate the dictionary necessary for reading events.

First (still incomplete) skeleton for the structure: 3 folders

General information info (should ensure reproducibility) (a)
- detector cards
- trigger cards
- MC generator cards
- versions and names of the used codes at each level of the chain (from the model to the end).
mctruth (this is important for gen-reco matching; if not interested, this folder can be empty).
- gen particles before hadronization (b)
  - Four-momentum (c)
  - PDG-id, statuscode
  - vertex (3Vector) where created
  - vertex (3Vector) where destroyed
  - lifetime
- gen particles after hadronization (b)
  - four-momentum (c)
  - PDG-id, statuscode
  - vertex (3Vector) where created
  - vertex (3Vector) where destroyed
  - liefetime
- initial particles (b)
  - same as gen particles (with primary as the vertex where created)
  - information on the momentum fraction, alpha_s, etc...
- gen vertices: vertex
  - position (3Vector)
  - links to incoming and outgoing particles (d)
- noise particles (pile-up and cie) (e)
  - not associated to any vertex
reco
- electrons
- muons
- taus
- photons
- jets
- met
- vertex (f)
- tracks (f)

Footnotes

(a) All the information about the fastsim is currently missing in the existing format. It must be included for reproducibility issues and take into account the modularity of the fast simulation.

(b) One should differentiate three types of particles since they could be used independently and contain different pieces of information.

Information on scales, alpha_s, etc... can be merged with the initial particles
The originating vertex for the initial particles should e a reserved workd such as primary
Particles before and after hadronization should be kept separated since
- Speed of the algorithms acting on partons only (we don't need the heavy hadron structure if interested in partons only)
- Practical
- Costless
- According to the needs of the user, one of the collection (for instance hadron collections) can be suppressed for saving disk/memory space.

TLorentz vector = 4 double precision numbers
Otherwise, 4 independent floats + PT, eta, phi = 7 floats and we do not have all the methods associated to the TLorentzVector class.

(d) Do we need such an "ingoing" or "outgoing" tag?

(e) Necessary? Isn't that going beyond the fastsim scheme. If yes, what to put in there? Do we need the position of the vertices (it is random anyway)? etc...

(f) Tracks necessary for tau reconstruction + displaced vertices. Vertices necessary for displaced vertices studies.

Comments

It could be possible to have several collections of jets, electrons, ...., which must be automatically recognized by the reader. An instance of a class must always start with the corresponding classname, such as jet_myjets and electron_myelectron for a collection of jets and electrons, respectively. Another instance: one could have a collection of jets jet_delphes after the use of delphes and a collection of jets jet_pgs of the use of PGS for the same event.

-- BenjF - 12-Jun-2012

Topic revision: r3 - 2012-06-12 - EricConte

Main

Webs

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
Main All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback