LCG HepML PROPOSAL

PROBLEM DESCRIPTION

One of the main request from the experimental collaborations to the community of authors of Monte-Carlo (MC) generators and other simulation tools is a unification of input/output formats of data files used in the software. The first steps in this direction were done in the Les Houches Accord I (hep-ph/0109068), SUSY Les Houses Accord (hep-ph/0311123), and PDF Les Houhes Accord (LHAPDF). All the standards are maintained already by MC codes.

In order to point out a place to our HepML conception we formulate the full simulation chain. It consists of several stages:

  • Description of theoretical models: Lagrangians and Feynman rules (hep-ph/0203102)
  • Definition of a general model for Monte-Carlo simulation (properties of particles, physics parameters, etc.)
  • Description of a physics process, contributed subprocesses to the process, and corresponding Feynman graphs
  • Parameters of Monte-Carlo integration methods and event generation details
  • Presentation of the results after the prefious item: total or differential cross sections, parton/particle level events.
  • Simulation of showering/hadronization and other effects applied
  • Simulation of the particular detector response
  • Model of triggers and reconstruction program output.
    Simulation chain
Simulation chain

Usually each level of the chain is presented by several software tools. For example, several MC generators can simulate the same process, or an experimental analysis is done by independent groups (in one collaboration) with several analysis tools. If these tools have their own formats of input/output information (the usual situation) it is difficult to compare the tools on the same level of simulation or continue the simulation with another available MC package. So, unification of data formats could be useful on all stages of the chain. Since development of a general format for the all stages is very complicated task, technically and conceptually, in our current HepML project we restrict ourselves to items 1-5 (in the previous list). These stages have a lot of elements of general information which could be formulated in a unified manner. The unified format (or standard) of data files can simplify development of standard tools for graphical representation, comparison and validation of the results from different available MC packages.

The most appropriate computing technology for the unification of the formats seems to be XML, which provides the possibility to describe the stored information in a very flexible and standardized way. Different MC generators may use the same tags to describe physics parameters, or may keep some specific information (defined by new special tags). XML parser will be able to separate necessary tags only (for a specific task), and other tags will not hamper in the parsing process. The HEPML format should consists of many tags which will describe meta-information of MC generation (e.g. theoretical models, physics processes, parameters of generation, cuts, etc.) and provides the rules how to read specific information about each event (e.g. four-momenta, QCD scale, color chain, ...).

Why XML?
  • XML describes the document structure.
  • XML has very simple native structures and syntax.
  • Most information in HEP can be presented as a tree structure or in more general case acyclic graph
    • XML is extensible language
    • XML has no fixed set of tags and attributes. User can introduce own targets and attributes. It defines rules for the document syntax and allows to develop problem-specific sets of elements.
  • WEB ready
    • XML document can be displayed in different way: HTML, txt, a record in SQL table, etc.
    • Most modern Web-browsers (Mozilla, IE, Opera) can display XML documents by default.
  • Well defined open standard
    • Recommended by W3C
    • Supported by IBM, Sun, Microsoft, Linux/UN*X community
  • XML is a legible: XML syntax can be read by users (by eyes).
  • Available tools to process XML documents
    • Xerces, Expat, etc.; nice support in C/C++, Perl, Python;

GENERAL CONCEPTION of HepML

We propose HepML as an extensible standard for the information transfered in the HEP simulations chain. Therefore, we have to develop a general extensible XML schema with the following requirements:

  • One general schema, which will include all particular schemas (sub-schemas) developed by indepenendent groups
  • Strict versioning policy of the schema
  • The sub-schemas will describe the necessary information on a concrete level of simulation (or a part of the information).

As the first step of our project, we consider the information which pass from ME generators to SH generators. Obviously, such XML tags of the event file format should be based on the Les Houches Accord I. Our HepML conception assumes two different ways to implement HepML tags in ME codes:

  • Preservation of native formats of the ME generators. It means the HepML document should be gererated in addition to the original event files. It helps to keep old codes and adds new functionality by HepML.
  • Origonal event file can be stored in HepML tags. In this case, a separate HepML document is not needed.
In general, all information should not be included to the HepML documents, we propose to store meta-information in the HepML documents. But the choice of the meta-information we leave for MC codes authors (they can store all information in the document).

We are going to develop a API library for different tag sets (they can be defined by Les Houches Accord or MC code authors). This API should be done for C/C++ and FORTRAN environments. In this case, any ME generator can save necessary information in the XML form (HepML format). and any SH generator can read the event files in the HepML format. From the other side we are going to develop standalone utilities (based on the same API library) to convert event files from the native formats of main ME generators to the HepML format.

XML schema in HepML
We suggest to have one general HepML schema which will include tags of different HepML branches with preserved namespaces. The branches will correspond to separate sub-schemas developed by separate groups. It gives to HepML more flexibility and permits different groups to work on each part independently. The main advantage of the idea for end users of HepML is to have a single general method to access all HepML structures.

References to all permitted HepML root tags will be included in general HepML shema. HepML documents can use types and elements from all included schemas to the general schema.

The idea of the main (general) HepML schema is quite simple. It should have references to main schemas of the branches (which contain all subschemas in turn) and permits to use all the elements from different namespaces without explicit redeclaration.



All the tags will be placed inside a reserved common <hepml> tag which should have a reference to the unified HepML schema. The top level tag can define a default namespace in a HepML documents. It means all tags of other namespaces should have explicit references to their namespaces if they are used in the document.

Main HepML schema (in Wiki, hepml.xsd)

 General schema structure 
General schema

We propose to change CEDAR schema slightly to fit it into the general conception to provide HepML extensibility.
Adopted version: in Wiki, cedar-hepml.xsd

 CEDAR adopted schema scructure 
CEDAR schema

LCG HepML schema (in Wiki, lcg-hepml.xsd) contains several subschemas: for Les Houches Accord I (in Wiki, lha1.xsd) and MCDB (in Wiki, mcdb.xsd) and also an auxiliary schema (in Wiki, types.xsd)

 LCG HepML schema scructure 
LCG schema

Development of libraries and utilities

The main goal of HepML libraries and standalone tools is to provide operations with HepML documents. In general, HepML libraries will provide three obvious types of functionality: creating of the documents, reading of the documents and validation of the documents. Also validation can be made with standalone tools.
For special purposes (e.g. filling of FORTRAN common blocks according to LHA I) will be provided additional software.
We intend to decsribe C++ classes to operate with specific HepML entities just in C++ code. It means that it will be possible to read HepML document into C++ objects and also store HepML tags to a file or a stream from corresponding C++ classes.
To work with FORTRAN special wrappers using C++ libraries for HepML will be developed. This way is more simple then to create HepML handling routines for Fortran directly.

Collaboration with CEDAR

The main idea of HepML is to provide the HEP community with a general XML standard for HEP information, therefore it is important to attract to this project as many people as possible, for the development and discussions. We propose HepML as an extensible standard where the general schemas should be as flexible as possible and different groups can develop different subschemas independently with separation of tags in different namespaces. For the first step, we propose to combine CEDAR HEPML schemas and LCG HEPML schemas as a set of subschemas with different namespaces under the general HepML schema where we will have a minimal description. In this way we will have the possibility to add easily new HepML subprojects (e.g. XML description of theoretical models) from any group around the world. The technical details are described in the section HepML scheme above.

Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2007-10-01 - SergeyBelov
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback