A summary of the ATLAS Computing Technical Proposal [1] is given, outlining the plans of the ATLAS collaboration for computing in terms of software development, computing infrastructure, manpower and cost. We can only lay down the requirements, as they are known today, technical directions that we currently follow, and a rough estimate of the resources needed. Computing is instrumental to the success of the experiment. In many of the areas, we do not yet have solutions that are adequate to the scale of the problem in terms of complexity, data rate and data volume, organisational problems due to the world-wide dispersion of resources, and duration of the project.
Computing for the ATLAS experiment at the LHC proton-proton collider at CERN will pose a number of new, yet unsolved technological and organisational challenges:
It is expected that about 85% of the ATLAS software effort will be from small separated groups not based at CERN. In order to optimise the quality, to guarantee the long-term maintenance, and to minimise the necessary resources, a well-defined ‘ATLAS Software Process’ (ASP) [2] has been developed in the framework of the RD41 (MOOSE) project [3]. In the software development we will adhere to accepted international standards; wherever possible we will seek common developments with other experiments and employ commercial solutions. We plan to implement the software following the object-oriented paradigm. Currently, we are studying the implementation using the C++ language.
For a project the size of ATLAS we must adopt appropriate engineering techniques for the construction of the software. The key elements of the ASP are
The organisation of computing is part of the overall ATLAS organisation. The ATLAS Computing Steering Group (ACOS) deals with offline computing matters and more global computing aspects such as software engineering and computing infrastructure. The chairperson of ACOS is the ATLAS computing co-ordinator. He represents offline computing in the ATLAS Executive Board. In ACOS, the computing representatives of the ATLAS systems (Inner Detector, Liquid Argon Calorimeter, Tile Calorimeter, Muon System, Trigger/DAQ) provide direct contact to their respective communities. The detector communities organise their software work relatively autonomously. The co-ordinators of the major packages such as simulation, reconstruction and trigger simulation integrate the software prepared in these sub-domains. Additional members represent specific geographical regions or computing tasks within the ATLAS collaboration.
The Software Development Environment
The Software Development Environment (SDE) is everything needed on the developer's desktop (CASE tools, testing tools, compilers, linkers, debuggers etc.) in order to participate in the orderly development or modification of a software product. It should be a fully integrated operational software environment, and not just a collection of individual task-oriented tools.
A working group has discussed the requirements and has produced a first document listing the functional requirements and making, some initial choices for the Software Development Environment.
It has been decided that ATLAS will follow the Unified Notation as soon as the standard has been published officially and tool support is acceptable. Up to that time the OMT notation for diagrams showing static associations between classes (the Object Model in OMT terminology) will be used along with a diagram which shows message flow using the notation supported by the Object Interaction Editor of the StP CASE tool.
Transition from FORTRAN to Object-Oriented Software
We have a detector simulation program based on the GEANT 3.21 detector simulation package and a reconstruction program for the simulated data. All this code is written in FORTRAN 77 and uses ZEBRA for memory management. These programs have been used in the past to study the detector behaviour and to optimise its parameters and have produced all results for the ATLAS Technical Proposal [4] and are being used for the Technical Design Reports of the various sub-detectors. It is foreseen to use and upgrade these programs for at least the next two to three years.
The ATLAS detector has been described in the simulation program in great detail, resulting in 11 million GEANT volumes. The simulation of a single di-jet event takes on average 20 minutes on a HP-735 workstation. Between November 1996 and May 1997, about two million jets have been produced.
A major challenge will be to design and develop new software while still maintaining the FORTRAN software. We plan to start the development from the proven algorithms, improving them where necessary. We will follow an OO design, implementing in C++. The data, now described in common blocks and stored in ZEBRA banks, will then be encapsulated in objects together with the functions acting on that data.
To introduce OO and C++ we propose to follow two lines; one will start from the existing simulation and reconstruction code and make extensions and modifications in C++, and the other starts from scratch with a new OO design. The first line allows implementing new algorithms written in C++ in an established environment, allowing the users to continue to use the services such as input/output and histogramming provided by the framework. In the second line we start to develop a framework and a set of tools for the long-term use in ATLAS. It is hoped that most of the effort invested in the first approach can be reused in the second.
It is felt that pursuing these two parallel lines of development minimises the risk of the changeover from procedural to object-oriented programming. It does not disturb significantly the necessary detector studies based on simulation. It allows the development of new FORTRAN code, which follows closely the development of ideas on the ATLAS detector itself.
Current Object Oriented Projects
We have started several projects providing object-oriented software for practical applications in ATLAS. The purpose of these projects is to gain experience with the ASP and to provide hands-on experience with the new style of programming for the ATLAS physicists. The projects have a relatively short timescale of about one year such that they can be used to try out the ASP and provide examples for the ATLAS community to start more software developments in the new style.
Examples of such projects are:
The ATLAS computing model describes the global architecture of how we plan to use software, processing power, storage and networks to do the offline computing at LHC. The term offline computing encompasses detector calibration and alignment, event reconstruction, Monte Carlo generation, and physics analysis of both real and simulated data. The basic inputs to the model are:
The event reconstruction must handle the 100 Hz rate out of the event filter. We propose to reconstruct quasi-online allowing for a short ( ~ few hours) delay for the generation of the alignment and calibration constants from the data themselves. For the output of the reconstruction, we target an event size of ~100 kbyte, i.e. a reduction of a factor of 10 in data volume. The set of objects produced by the reconstruction is labelled Event Summary Data (ESD). It is anticipated that the reprocessing of events, to account for changes in the calibration, alignment, or reconstruction algorithms, can begin from either the ESD or the raw data. We propose to allocate sufficient resources to reprocess events: a few times per year starting from the ESD, and once per year starting from the raw data.
There are five activities requiring access to the data which are of interest for the computing model:
For the physics studies, the computing model must allow efficient access to select and study relatively small event samples embedded in large samples of mostly background. For example, many physics channels consist of 107 to 108 events, i.e. 1 - 10% of the annual event sample. One can imagine that several groups apply different selection criteria to define ‘analysis samples’, which are several orders of magnitude smaller. These are then extensively studied; resulting in a new set of selection criteria which is used to repeat the exercise.
The information required for physics studies is generally just simple ‘physics objects’, i.e. electrons, muons, jets, tracks, etc., requiring only a small amount of data per event (relative to the initial 1 Mbyte), estimated to be less than ~10 kbyte. This set of objects is labelled Analysis Object Data (AOD). An important point for the computing model is that these ‘physics objects’ evolve with time as the calibration constants and reconstruction algorithms improve. This is particularly true during the early phase of the experiment where the first physics data will be used to understand the detector response.
Monte Carlo studies have already been extensively exploited for the design of the detector and trigger and the understanding of test-beam results. Studies will continue during the construction phase, and as well, Monte Carlo generated events will be used to test the offline event reconstruction and analysis software. As the experiment begins taking data, the Monte Carlo will need to be tuned and then used to calculate corrections for physics results. It is estimated that the required number of Monte Carlo generated events is approximately 10% of the number of real events, and corresponds to ~5x104 SPECint95 processing power. The low I/O bandwidth required for Monte Carlo generation allows it to be distributed across the collaboration. This effort will need to be organised collaboration-wide, and most likely only the ESD and/or AOD information will need to be made available for general use.
Technology is an important ingredient in the computing model, since the offline system which will eventually be designed relies on the capabilities of an underlying technological layer. The extrapolation of cost estimates for networks, storage and computing power is difficult over a time-scale of several years. However, for the requirements of the computing model presented in this chapter, it is reasonable to expect from recent trends that the cost of storage and computing power will have decreased sufficiently for the requirements of ATLAS to be satisfied. The largest uncertainty lies in the affordability of wide area network (WAN) bandwidth, in particular because of the deregulation of the European telecommunications industry and the recent rapid growth in Internet usage. The importance of WAN bandwidth becomes clear when one understands that, in order to analyse the large volumes of data produced at LHC, the processing power is required to be close to the data, and thus analysis facilities will be localised at CERN and possibly a few regional centres.
The key software elements, which directly concern the computing model, are the management and the storage of data. Commercial ODBMS and MSS capabilities are currently under study by RD45 [6]; their preliminary results are promising. An ODBMS would serve as a front-end tool where one organises and manages the data from a logical perspective, i.e. one directly manipulates runs of events, individual events, tracks of an event, different samples of events, etc., and the ODBMS manages the physical location of the information, i.e. which part of an event is stored on which file. A MSS would serve as a large bandwidth back-end file server allowing hierarchical storage management of the data, which is transparent to the front-end user. The current view is that the combination of a commercial ODBMS and MSS will manage all of the data for both the event reconstruction and the physics analysis.
We expect that there will be ~500 ‘equivalent physicists’ performing some analysis task with ~150 users simultaneously accessing the data. We assume that from start-up all physicists will have adequate access to the data to perform analysis from their home institute. It will be important that there is a coherent view of the data independently of where the data physically resides, i.e. at CERN, at a regional centre or at one’s home institute.
The question of the rôle of regional centres in the ATLAS computing model has not yet been resolved. It is generally agreed to perform the event reconstruction at CERN. Also, the bulk of the raw data will remain at CERN. Thus, any reprocessing of a large fraction of the raw data will be done at CERN. The rôle of regional centres would be to concentrate on the areas of physics analysis, MC generation, and possibly some of the reprocessing which begins from the ESD information. The information provided in the following sections is intended to begin the preparation for a decision that will be taken by the end of 1998.
The participating institutes in ATLAS provide the basic support for their physicists. This includes desktop support and a certain amount of computing power and storage. The rôle of the institute within the ATLAS computing model will also need to be understood. The key point is to provide the resources so that one can perform the required analysis tasks from the home institute. This may include some data that is physically transferred. However, it should be stressed again that the majority of the data will have to remain at the large facilities, i.e. CERN and possibly regional centres.
A precise cost estimate for the computing hardware is impossible due to the uncertainties in both the requirements and the evolution of the technology and the market. A rough estimate puts the cost of the central installation of data storage and processing power at CERN for ATLAS at about 20 million Swiss francs, to be spent over several years. The cost is dominated by the CPU requirements of 2.5x105 SPECint95 (107 MIPS). Subsequently about 9 million Swiss francs will be needed to expand the facilities to the increasing requirements and for maintenance.
To enable physics analysis in a world-wide collaboration, good networking is a necessity. Today it is impossible to predict the evolution of the cost and the performance of international networks at the time of LHC running. As these are important parameters for the precise planning of an analysis scenario, we have to follow the developments and adjust our planning accordingly. Already during the construction phase, we need international networks for document and code exchange as well as for communication such as video-conferencing in order to minimise travel. Currently, in some areas the networks are still insufficient even for code exchange.
Computing for ATLAS is an important and challenging task. We have started to define solutions for the software development using a well-defined software process based on the object-oriented design and implementation. First ideas for a computing model concentrate on the event storage in an object oriented database. Because of the long lifetime of the project, we have to build into our strategy the flexibility to cope with the rapid evolution in the field of computing and with the changing requirements of the experiment.
References
[1] The ATLAS Collaboration, ATLAS Computing Technical Proposal, CERN/LHCC/96-43
[2] S. Fisher, The ATLAS Software Process, Presentation at CHEP97 [--> Slides]
[3] K. Bos, RD41 or MOOSE Collaboration, Presentation at CHEP97
[4] The ATLAS Collaboration, ATLAS Technical Proposal, CERN/LHCC/94-43, LHCC/P2
[5] S. Giani, GEANT4: a world-wide collaboration to build Object Oriented HEP simulation software, Presentation at CHEP97
[6] J. Shiers, RD45 Status Report, Presentation at CHEP97