Last Update: May 29, 1999
1. To develop a formal framework for describing analysis architectures and to apply them to existing experiments. The framework can then be used to discuss architectural models for the LHC.
2. To provide quantitative information on the resources which have been employed in the most challenging data analysis efforts that have been undertaken in High Energy Physics (HEP) so far. This information can provide a baseline to which the magnitude of LHC data analysis problem can be compared.
3. To identify successful approaches that can be expected to carry over to the future and to identify key problem areas which are already known to exist at this scale of effort.
In the second section of the report, we explain the scope of the survey and our reasons for choosing a particular subset of experiments. The third section presents a brief explanation of the methodology used to collect the data and provides information required to interpret the data tables which are presented in section four. In section five, we provide the conclusions drawn from the data by the members of the working group. Back to Table of Contents Scope of the Survey: For reasons described below, we believe that we can learn only a limited amount from existing experiments. Therefore, the survey is not exhaustive, and concentrates on a representative selection of LEP, HERA, FNAL Collider, and large CERN and FNAL fixed target experiments. We considered experiments that had, by past standards, very large volumes of data, high rates, high degrees of complexity, and large geographically dispersed collaborations. We included in the list experiments that are running now at CERN, such as NA48 and NA45, which have large amounts of data, at least by current standards, and are being used to try out new models of computing that were designed to be relevant to future experiments with even larger datasets.
We believe that this survey will produce only a limited amount of information that can be used to guide the development of computing architectures for LHC experments. Many of these experiments are quite mature and their initial computing architecture was established in the late 1980's or early 1990's. Computing technology, software, and networking were very different than they are today. While all of these experiments have evolved to take partial advantage of new developments in computing, they have all been heavily influenced and constrained by their initial conditions. In particular, their detailed hardware solutions are unlikely to be applicable to the timeframe of the LHC startup. We can, however, learn much about the accuracy with which these groups were able to forecast their computing needs, their ability to reconfigure while operating, and the speed with which they could introduce new technologies and methods. One area which we focussed on was the degree to which analysis was centralized or distributed and the reasons why the particular patterns emerged.
After some discussion, we decided to contact the ten experiments which are listed in the next section. Back to Table of Contents Methodology of the Survey: The survey was carried out by first formulating a list of questions, then appointing a `summarizer' for each experiment to contact recognized leaders of computing in that experiment who provided the answers to the questions. The answers, of course, can depend to a degree on the view of respondent for the experiment, and often represent a snapshot of the computing model at a particular point of time. We would like to thank the contact people for the effort in answering our questions which are non-trivial to understand and research. In addition, the terminology between different computing models can vary enough as to not translate well.
The list of experiments, summarizers and contact people is:
Experiment | Summarizer | Contact Person |
LEP experiments: | ||
ALEPH | Tim Smith | Marco Cattaneo |
DELPHI | Tim Smith | Richard Gokieli |
L3 | Tim Smith | Bob Clare |
OPAL | Tim Smith | Ann Williamson, Gordon Long, Steve O'Neal |
CERN Fixed Target | ||
NA48 | Tim Smith | Bernd Panzer-Steindel, Monica Pepe |
NA45 | Tim Smith | Bernd Panzer-Steindel, Andreas Pfeiffer |
FNAL experiments: | ||
CDF | Mike Diesburg | Eric Wicklund |
D0 | Mike Diesburg | Mike Diesburg |
KTeV | Vivian O'Dell | Vivian O'Dell |
FOCUS | Vivian O'Dell | Harry Cheung |
DESY experiments | ||
ZEUS | Ian McArthur | Dave Baily |
The questions that were posed were the following:
Survey of Experiments - Form
The results of these questionnaires was then distilled into the
tables presented in the next section.
Question marks in the table indicate that the requested information could not
be obtained -- usually because the experiment did not or could not track the
information. The areas where the question marks occur are therefore, in themselves,
significant.
The information in the tables and the
conclusions drawn in the final section of the report constitute the results
of this study.
Back to Table of Contents
Quantitative Results of the Survey:
Experiment | Event Recon. Time | Event Analysis Time | Event Size | Collaboration Size |
D0 | 45 Si95-secs | ? | 600 kB | 32 Institutes,400 members |
CDF | 35 Si95-secs | ? | 130 kB | 47 Institutes,460 members |
ZEUS | 33 Si95-secs | 1.0 Si95-secs | 100 kB | 52 Institutes, 450 members |
ALEPH | 2.6 Si95-secs | ? | 30 kB | 32 Institutes,493 members |
DELPHI | 5 Si95-secs | ? | 50 kB | 55 Institutes,534 members |
L3 | 7.7 Si95-secs | 2.0 Si95-secs (average) | 40 kB | 54 Institutes,619 members |
OPAL | 2.0 Si95-secs | 1.0 Si95-secs | 20 kB | 35 Institutes,418 members |
NA45 | 6.4 Si95-secs | 0.01 Si95-secs | 200 kB | 8 Institutes,45 members |
NA48 | 0.15 Si95-secs | ? | 15 kB | 16 Institutes,207 members | KTeV | 0.0465 Si95-secs | 0.07 Si95-secs | 7 kB | 12 Institutes,80 members | FOCUS | 4 Si95-secs | 0.25-1.0 Si95-secs | 4 kB | 20 Institutes,120 members |
Note that there are many gaps in the "Event Analysis Time" entry in the table. This is because the analysis time varies widely with the type of analysis being done, and so this number (even as a range) is not well defined.
Experiment | Central CPU for IO (SpecInt95) | Onsite CPU (SpecInt95) | Offsite CPU (SpecInt95) | Onsite/Offsite Reconstruction | Onsite/Offsite Data Analysis |
D0 | 20 | 275 | 25 | all onsite | 95% onsite/ 5% offsite |
CDF | 180 | 100 | 25 | all onsite | all onsite |
ZEUS | 150 | 170 | ? | all onsite | all onsite |
ALEPH | 130 | 170 | ? | mainly onsite | 50% onsite/50% offsite |
DELPHI | 250 | 265 | ? | all onsite | 50% onsite/50% offsite |
L3 | 125 | 500 | ? | all onsite | 90% onsite/10% offsite |
OPAL | 190 | 645 | ? | all onsite | ? |
NA45 | 587 | None | ? | all onsite | 0% onsite/100% offsite |
NA48 | 600 | 50 | ? | all onsite | ? |
KTeV | 280 | 195 | 460 | during analysis | 80% onsite/20% offsite |
FOCUS | 9 | 35 | 340 (?) | all onsite | 10% onsite/90% offsite |
Many experiments provided most of their own computing resources in the preparation and initial data taking. This took many forms, including private IBM mainframe computers and Vax clusters for the exclusive use of individual institutes, farms of emulator, vaxes and risc machines for dedicated services such as online filtering, quasi-online event reconstruction and analysis farms for dst access. For some experimeents, the spare cycles on university funded desktop machines were used for experiment wide simulation via batch jobs or other job distribution systems and in groups for specific analysis topics where specific machines hosted the data disks.
We chose to list machines as "onsite" if they met the following criteria:
Thus machines listed as "onsite" CPU in the table corresponds to their use from the point of view of distributed computing architectures even though they may be considered "offsite" by funding agencies and cocotime (cerns computer resource allocation/oversight committee).
Experiment | Total Raw Data (TB) | Total DST Data (TB) | DST Event Size | Onsite/Offsite Data Storage |
D0 | 30 | 2 | 50 kB | 100% onsite/ 1% offsite |
CDF | 8.3 | 2 | 32 kB | 100% onsite / <5% offsite |
ZEUS | 3.5/year | 5.3/year | 150 kB | 100% onsite |
ALEPH | 3.5 | 0.94 | 1.5 kB | 100% onsite/100% offsite |
DELPHI | 3.5 | 0.55 | 31 kB | ? |
L3 | 3.4 | 0.6 | 7.1 kB | 100% onsite |
OPAL | 2.7 | 1.0 | 7.3 kB | 100% onsite |
NA45 | 2.0 | 0.4 | 100 kB | 100% onsite |
NA48 | 100 | 1.5 | 1.7 kB | 100% onsite |
KTeV | 45 | N/A | 7 kB | 100% onsite/25% offsite |
FOCUS | 25 | 18 TB (L1/L2 DSTs) | 1-3 kB | 100% onsite/?% offsite |
Experiment | MC Event Generation Time | MC Event Size | Onsite/Offsite generation | Total MC Data Set Size |
D0 | 125 Si95-secs | 1 MB | 50% onsite/ 50% offsite | ? |
CDF | 25 Si95-secs | 100 kB | 95%/5% | ? |
ZEUS | 36 Si95-secs | 45 kB | 40%/60% | 9 TB |
ALEPH | 7.8 Si95-secs | 100 kB | 50%/50% | 2 TB |
DELPHI | 100-500 Si95-secs | 500 kB | 20%/80% | 5 TB |
L3 | 300 Si95-secs | 350 kB | 45%/55% | ? |
OPAL | 200 Si95-secs | 100-300 kB | 85%/15% | 12 TB |
NA45 | 900 Si95-secs | 50 kB | 40%/60% | 2 TB |
NA48 | 0.4-0.8 Si95-secs | 30 kB | 0%/100% (50%-70% Lyon) | 0.15 TB |
KTeV | 0.2 Si95-secs | 2 kB | ? | ? |
FOCUS | ? Si95-secs | 0.5-3 kB | ? | ? |
Experiment | Onsite Central CPU for I/O | Onsite Distributed CPU | Offsite |
D0 | 10 MB/s | 30 MB/s | 3.2 MB/s |
CDF | 5 MB/s | all data access done centrally | 4.7 MB/s |
Experiment | LAN network aggregate rate | WAN network aggregate rate |
D0 | 3 100 Mb/s FDDI backbones, 300 Mb/s | 2 T3 lines, 90 Mb/s |
CDF | 100 Mb/s FDDI backbone | 2 T3 lines, 90 Mb/s |
ZEUS | 800 Mb/s HIPPI(analysis),100 Mb/s Ethernet (reconstruction),10 Mb/s Ethernet (desktop) | 34 Mb/s (DESY - European network) |
ALEPH | 800 Mb/s HIPPI. Switched 100 Mb/s ethernet in Computer Center. 10 Mb/s ethernet on desktop | ? |
DELPHI | 800 Mb/s HIPPI + Gb Ethernet + switched 100 Mb/s ethernet in Computer Center. 10 Mb/s ethernet on desktop | ? |
L3 | 800 Mb/s HIPPI. Switched 100 Mb/s ethernet in Computer Center. FDDI and 10 Mb/s ethernet on desktop | ? |
OPAL | 800 Mb/s HIPPI. Switched 100 Mb/s ethernet in Computer Center. 10 Mb/s ethernet on desktop | ? |
NA45 | 800 Mb/s HIPPI + Gb Ethernet + switched 100 Mb/s ethernet in Computer Center. 10 Mb/s ethernet on desktop | ? |
NA48 | 800 Mb/s HIPPI + Gb Ethernet + switched 100 Mb/s ethernet in Computer Center. 10 Mb/s ethernet on desktop | ? |
While it is understood that the magnitude of the computing required by the LHC experiments far exceeds anything that has been achieved so far in HEP, it is useful to quantify this by comparing the projected needs for ATLAS and CMS to the surveyed experiments. Below is a table comparing the above experiments with an estimate of the size of one LHC experiment. For illustration purposes, current numbers for Babar and estimates for CDF/D0 Run 2 are also shown. Note that the Babar numbers listed are at turn-on. (We are indebted to Charlie Young for information about Babar). The LHC numbers come from estimates for CMS offline computing during the first year of operation (currently projected at 2006.) (see "Rough Sizing Esitmates for a Computing Facility for a Large LHC Experiment" by Les Robertson.)
Experiment | Onsite CPU (Si95) | onsite disk (TB) | onsite tape (TB) | LAN capacity | Data Import/Export | Box Count |
LHC | 520,000 | 540 | 3000 | 46 GB/s | 10 TB/day (sustained) | ~1400 |
CDF - 2 | 12,000 | 20 | 800 | 1 Gb/s | 18 MB/s | ~250 |
D0 - 2 | 7,000 | 20 | 600 | 300 Mb/s | 10 MB/s | ~250 |
Babar | ~6000 | 8 | ~300 | Mix of Gb/sec, 100 Mb/sec Ethernet to servers | ~400 GB/day | ~400 |
D0 | 295 | 1.5 | 65 | 300 Mb/s | ? | 180 |
CDF | 280 | 2 | 100 | 100 Mb/s | ~100 GB/day | ? |
ALEPH | 300 | 1.8 | 30 | 1 Gb/s | ? | 70 |
DELPHI | 515 | 1.2 | 60 | 1 Gb/s | ? | 80 |
L3 | 625 | 2.0 | 40 | 1 Gb/s | ? | 160 |
OPAL | 835 | 1.6 | 22 | 1 Gb/s | ? | 220 |
NA45 | 587 | 1.3 | 2 | 1 Gb/s | 5 GB/day | 30 |
NA48 | 650 | 4.5 | 140 | 1 Gb/s | 5 GB/day | 50 |
KTeV | 475 | 1 | 50 | 100 Mb/s FDDI | 150 GB/day | 6 |
It is clear from this table that the scale of the computing problem in the surveyed current experiments is so much smaller than for LHC experiments that they cannot possibly define a framework for the LHC. However, the point behind this survey is to make the best possible use of the experiences of the past in formulating what is essentially a new model of computing for the LHC era.
In discussions with experiments about how to improve remote data processing, the following issues were the most commonly listed.
Probably the issue raised most often during the survey is that there are not enough people at any individual remote institution to ensure that reconstruction and analysis code is kept up to date. This is especially true when the software is under development and in the first year of running when software is most likely to change often. Although there are tools that have been, and continue to be, developed for both code management and code distribution, remote institutions still find it difficult to keep software current.
Professional system management and 7 X 24 hour coverage is essential for a remote institution to successfully run a production. In addition at least one full time software engineer familiar with the software, and physicists to test each local release, are needed to keep the site up to date.
In our survey we found that many remote institutions are running large Monte Carlo productions, while not being able to efficiently run data production jobs. This is mainly due to the inability of the remote institutions to access and store large quantities of raw data.
For remote sites, raw data access is typically through exchange of physical media. To date, wide area network speeds have prevented most experiments from collecting data in real time or semi-real time over a network. All of the experiments report archiving the full set of raw data onsite for safekeeping, and raw data is copied before being shipped offsite.
Although remote centers often have lots of CPU, most of the remote sites we polled did not have sufficient resources to handle large datasets. Clearly for a remote site to fully participate in raw data reconstruction and analysis, it must be able to provide users easy access to the data.
For a remote site to be able to be part of the event reconstruction project means it must have access to geometry, calibration, hardware and other constants typically stored in databases. Keeping the remote site up to date means that automatic tools for updating remote databases must exist. For LHC experiments, the size of the databases could be quite large, which will put heavy demands on networking. This, of course, depends on how often the databases need to be updated.
A stumbling block for many of the remote institutions we talked to is that the officially distributed software for an experiment did not run on their hardware platform either because of operating system incompatibility or a fundamental difference in hardware architecture. For an experiment to successfully include remote sites in its reconstruction/analysis production, a commitment must be made to do so and effort must be supplied either by the central institution or by the remote site (or both) to both homogenize computing or analysis hardware and to support different platforms and configurations.
Why do these problems exist? None of them are insurmountable or insoluble. The main issue is having a mindset of distributed computing in the computing model from the outset . The Fermilab fixed target experiment, FOCUS, did much of its computing at remote sites, and although they had access to fast networks, in fact the raw data tapes were shipped conventionally and the network was not used for raw data reconstruction. However looking at the table comparing onsite to offsite CPU for FOCUS, it was clear that distributed reconstruction and analysis was necessary from the beginning. While developing a model of distributed computing as integral to the experiment may not guarantee the effective use of offsite resources for data processing, not developing such a model early on will guarantee an ineffective, or even non-existent, use of offsite computing.
There are some clear requirements that can come out of these observations for both the central site and the remote sites. The remote sites must have:
The central site must:
In researching this report, it became clear that for many experiments there existed computing models and predictions written years before the experiments began taking data. (See Lessons for MONARC from previous CERN computing planning exercises ). In reviewing these documents, several lessons came out:
As we attempted to collect data for this report, we encountered several problems. Large scale production jobs, such as event reconstruction, are generally carefully monitored and important information such as the reconstruction time, total amount of data in and out, etc. are well-documented and easy to obtain. However, there are whole areas of the data analysis where accurate information was very hard to obtain, was completely lacking, or was not consistent among various sources. This is especially true with respect to the final physics analysis and to distributed computing. While this is understandable, it does make it difficult to profit from the experience of the past. Over the next year, many new experiments will be starting up whose computing is much closer to the LHC scale than the experiments we have reviewed in this document. It would be worthwhile to develop a plan to learn as much as possible from their experiences. Having a liaison to the offline group in each of the major new experiments from the LHC experiments, perhaps through MONARC, would be a start. This would provide a continuous flow of authoritative information to the LHC experiments while it can still be useful to the planning process.
In this document we have attempted to survey existing solutions to technical problems in computing in High Energy Physics. By definition all of these solutions are now, or certainly will be by LHC turn-on, obsolete. New approaches in defining and structuring computing architectures for the LHC era will use new technologies and hence generate new problems. Much research and development is and will be needed over the next few years in order to define a working architecture for doing distributed computing for LHC scale high energy physics experiments.