M. Aderholz (MPI), K. Amako (KEK), E. Arderiu Ribera (CERN), E. Auge (L.A.L/Orsay), G. Bagliesi (Pisa/INFN), L. Barone (Roma1/INFN), G. Battistoni (Milano/INFN), J. Bunn (Caltech/CERN), J. Butler (FNAL), M. Campanella (Milano/INFN), P. Capiluppi (Bologna/INFN), M. Dameri (Genova/INFN), D. Diacono (Bari/INFN), A. di Mattia (Roma1/INFN), U. Gasparini (Padova/INFN), F. Gagliardi (CERN), I. Gaines (FNAL), P. Galvez (Caltech), C. Grandi (Bologna/INFN), F. Harris (Oxford/CERN), K. Holtman (CERN), V. Karimäki (Helsinki), J. Klem (Helsinki), M. Leltchouk (Columbia), D. Linglin (IN2P3/Lyon Computing Centre), P. Lubrano (Perugia/INFN), L. Luminari (Roma1/INFN), M. Michelotto (Padova/INFN), I. McArthur (Oxford), H. Newman (Caltech), S.W. O'Neale (Birmingham), B. Osculati (Genova/INFN), M. Pepe (Perugia/INFN), L. Perini (Milano/INFN), J. Pinfold (Alberta), R. Pordes (FNAL), S. Rolli (Tufts), T. Sasaki (KEK), L. Servoli (Perugia/INFN), R.D. Schaffer (Orsay), M. Sgaravatto (Padova/INFN), T. Schalk (BaBar), J. Shiers (CERN), L. Silvestris (Bari/INFN), G.P. Siroli (Bologna/INFN), K. Sliwa (Tufts), C. Stanescu (Roma3/INFN), T. Smith (CERN), C. von Praun (CERN), E. Valente (INFN), I. Willers (CERN), R. Wilkinson (Caltech), D.O. Williams (CERN)
Executive Summary
The MONARC project will attempt to determine which classes of computing models are feasible for the LHC experiments. The boundary conditions for the models will be the network capacity and data handling resources likely to be available at the start of and during LHC running.
The main deliverable from the project will be a set of example "baseline" models. The project will also help to define regional centre architectures and functionality, the physics analysis process for the LHC experiments, and guidelines for retaining feasibility over the course of running. The results will be made available in time for the LHC Computing Progress Reports, and could be refined for use in the Experiments' Computing Technical Design Reports by 2002.
The approach taken in the Project is to develop and execute discrete event simulations of the various candidate distributed computing systems. The granularity of the simulations will be adjusted according to the detail required from the results. The models will be iteratively tuned in the light of experience. Simulation of the diverse tasks that are part of the spectrum of computing in HEP will be undertaken, and a simulation and modelling tool kit will be developed to enable studies of the impact of network and data handling limitations on the models.
Chapter 1: Introduction
The LHC experiments have envisaged computing models involving hundreds of physicists doing analysis on petabytes of data at institutions around the world. CMS and ATLAS also are considering the use of regional centres, each of which could complement the functionality of the CERN centre. The use of these centres would be well matched to the worldwide-distributed structure of the collaboration. They are intended to facilitate the access to the data, with more efficient and cost-effective data delivery to the groups in each world region, using national networks of greater capacity than may be available on intercontinental links.
The LHC models encompass a complex set of wide-area, regional and local-area networks, a heterogeneous set of compute- and data-servers, and a yet-to-be determined set of priorities for group-oriented and individual demands for remote data. Distributed systems of this scope and complexity do not yet exist, although systems of a similar size to those foreseen for the LHC experiments are predicted to come into operation by around 2005 in large corporations.
In order to proceed with the planning
and design of the LHC computing models, and to correctly dimension the
capacity of the networks and the size and characteristics of regional centres,
it is essential to conduct a systematic study of these distributed systems.
The MONARC project therefore intends to simulate and study network-distributed
computing architectures, data access and data management systems that are
major components of the computing model, and the ways in which the components interact
across networks. MONARC will bring together the efforts and relevant expertise
from the LHC experiments and R&D projects, as well as from current
or imminent experiments already engaged in building distributed systems
for computing, data access, simulation and analysis.
The primary goals of this project are:
Distributed databases are a crucial aspect of these studies. The RD45 project has developed considerable expertise in the field of Object Oriented Database Management Systems (ODBMS). MONARC will benefit from this experience and cooperate with RD45 in the specific areas where the work of the two projects overlap. MONARC will investigate questions which are largely complementary to RD45, such as network performance and prioritisation of traffic, for a variety of applications that must coexist and share the network resources.
Chapter 2: Objectives
A set of common modelling and simulation tools will be developed in MONARC. These tools will be integrated in an environment which will enable the LHC experiments to realistically evaluate and optimise their physics analysis procedures and their computing models, basing them on distributed data and computing architectures. Tools to realistically estimate the network bandwidth required in a given computing model will be developed. The parameters that are necessary and sufficient to characterise the computing model and its performance will be identified. The methods and tools to measure the model's performance, and to detect bottlenecks, will be designed and developed, and also tested in prototypes. This work will be done with as much as possible co-operation with the present LHC R&D projects, and current or imminent experiments.
The final goal is to determine a set of feasible models, and to provide a set of guidelines which the experiments could use to build their respective computing models.
The main objectives leading to this goal are:
Chapter 3: Workplan
3.1 Scope
MONARC aims to study analysis models and architectures suitable for LHC experiments, in order to contribute to their computing models in time for the Computing Progress Reports (CPR) that are due around the end of 1999.
The project involves collaboration not only from the LHC experiments, but also from other HEP experiments preparing to run in the near future, such as BaBar and COMPASS. These experiments are going to develop expertise in many of the computing-related fields of interest for LHC. MONARC will interact also with other teams: e.g. RD45, the ICFA network group, the HPSS teams and the GIOD project.
Although the manpower for the project is mostly provided by the collaborating institutes, the project also requires a significant manpower contribution from CERN:
The working methods to be employed in MONARC are largely determined by features of the project's structure:
3.4 Working Groups and Steering Group
The Working Groups are:
The steering group is composed of the chairpersons of the working groups together with the spokesman, the project leader and representatives of regional centres (See chapter 9).
3.5 Phases of the Project
This PEP proposes a project workplan divided in two phases:
The level of detail that can be reached in planning the
work is obviously different for Phase 1 and for Phase 2, as the actual
planning for Phase 2 will largely depend on the outcome of Phase 1.
In the following the workplan for Phase 1 is presented
in some detail, while for Phase 2 only a summary view is given.
The workplan for Phase 1 is organised in three sub-phases
3.6 Scope Limitations
3.7 Assumptions and Pre-Requisites
Chapter 4: Task Definitions
4.1 Overview
The major tasks are matched to the Working Groups listed in Section 3.4.
For each task a summary is given of the manpower resources available in the collaborating institutes, and of the current request.
4.2 Task 1: Simulation and Modelling
Realistic simulation and modelling of the distributed computing systems are the most important tasks in the first phase of the project. The goal is to be able to reliably model the behaviour of the system of site facilities and networks, given the assumed physical structure of the computer systems and the usage patterns, including the manner in which hundreds of physicists will access LHC data. The hardware and networking costs, and the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, are the main metrics that will be used to evaluate the models. The goal is to narrow down a region in this parameter space in which viable models can be chosen by any of the LHC-era experiments.
The planned research can be divided into the following subtasks, where the activities are expected to follow an iterative approach, with a complete cycle of the design-development-modelling-validation steps for every model studied, and every simulation tool used.
It is essential to decide on the appropriate level of modelling complexity. This work should start immediately, as it must run in parallel with the task of development of the modelling tools. We anticipate that the models will include sufficient details of data transfers from the disk to CPU, hardware configurations, complex network connections with varying availability of bandwidth depending on geographical topology, time and varying level of quality of service implementations. It is also essential to develop, as well as possible, models for data access patterns and analysis patterns. Here, the importance of input from physicists involved in experiments being part of MONARC cannot be overestimated. Also, experience from current, or near future, large statistics HEP experiments should be examined.
Significant development work might be required to extend the existing SoDA class libraries in order to be able to describe the models which will be simulated. The goal is to have an advanced set of simulation tools by Spring of 1999.
This subtask will involve detailed simulations performed with the adopted set of tools and agreed upon different sets of input parameter values, in order to explore meaningfully the multidimensional parameter space of variables which describe the computer system models. The goal is to deliver to the experiments first reliable results by the Summer of 1999.
This important step is anticipated
take place in parallel with the first implementation of the models.
We expect preparations for this subtask to begin almost immediately.
Coding significant patterns of data
handling in existing experiments and simulating them will be a first validating
step.
Designing the test-bed measurements
with which one can verify the results of model simulations, at first simple
and then more complex, is of paramount importance. There may be overlap
here with designing of the test-bed measurements with which to improve
our knowledge of a number of parameters which are needed as input to models
of the computing systems.
Provide and organise a repository to contain relevant information
We require at least 100 person-months for this task.
The manpower currently available for this simulation task amounts to
75 person-months (Bologna, Caltech, Milano, Perugia, Tufts).
In order to provide the remaining manpower required for this
activity, we are requesting a major contribution of 18 person-months
from CERN, and 40 person-months from the US.
This task addresses the issues of hardware and network
architecture of the distributed computing systems to be modelled.
In general this task will provide information on architectures used at major
HEP computing centres for previous and current generations of experiments,
of plans of major centres for future experiments, and of technology
and cost trends for the major components
(CPU, disk, mass storage and networks)
of potential distributed computing systems.
This information will be fed into the model simulations so that models can be
based on reality (both technologically and sociologically), so that models
can be optimised based on expected costs, and so that the dependence
of the models on costs and technology projections can be clearly seen.
The information will also be used to suggest avenues of study
for the testbed task.
Descriptions of computing architectures used by current
experiments should be prepared, concentrating on LEP, HERA, the FNAL Collider,
the large fixed target experiments at CERN and FNAL. Architectural
descriptions should include:
Similar descriptions as in subtask 4.3.1 should be provided
for the plans for meeting the needs of major upcoming experiments:
Deliverables:
Potential sites for LHC regional centres should be identified
and surveyed as to plans for hardware deployment and personnel support
expected to be available.
It should be recognised that there are
likely to be several different styles of regional centres,
from comprehensive centres offering
large amounts of CPU, disk and mass storage for all stages
of the analysis process,
to centres specialising in certain components of the full analysis
stream.
Different amounts of support and different topologies should also be
considered and our models must take these differences into account.
Surveys should include:
Realistic models require up-to-date estimates of hardware
cost and performance.
This subtask will require both market tracking and
measurements of hardware components as they are acquired
by participating institutes in the MONARC collaboration, in the categories
of:
Networks are a critical component of all distributed computing
models, and availability is influenced both by technological
and external forces.
The most accurate projections for network performance
are a crucial input to any distributed architecture models.
Measurements of current performance
and projections of future availability and cost should
be acquired in conjunction with other groups, in the categories of:
The manpower currently committed to this task amounts
to 39 person-months (Caltech, CERN, FNAL, Milano, Oxford, Perugia, Roma, Tufts).
The additional manpower required is 8 person-months (from CERN).
4.4 Task 3: Analysis Process Design
This task aims at the definition of
a few different schemes of "the way of doing analysis"
among the many possibilities
afforded by new (and perhaps unforeseen) computing technologies.
The task can be addressed with two
different and complementary approaches:
Both approaches will be pursued and
combined suitably in this project.
The task will select some different
scenarios for the analysis process, taking into account:
In summary, this Task will identify
possible analysis processes defining where the raw-data reside, how the
reconstructed objects will be produced and stored,
how and where the selection
of relevant data for the analyses will be accessed, and finally where and
how the physicists will go through their analyses.
All of the above will have to be coded and
parameterised in the simulation.
A very simple
example can better illustrate the nature of the task.
Given a PByte/year of data, an analysis
group goes through the tag database (produced quasi-online) selecting 1%
of reconstructed objects created by the "offline reconstruction".
They need to go through the full 1% sample once per month,
producing an analysis
object sample (reduced in number of events and in size) which contains
relevant analysis information, refined by experience and results from previous
steps.
Each member of the group needs access to the selected analysis objects,
at a time which fits their personal work schedule,
to extract results and personal sub-samples.
The timings, workaround, consistence,
residence, storage, CPU needs and efficiency of such an example will be
addressed by this task which will try to schematically describe
some different (and affordable) possibilities.
For the analysis processes retained for
further study and simulation,
either initially or later during the project,
clear diagrams will be produced to show the sequential, parallel and
iterative steps.
The task will be performed through the
following work packages or subtasks which embrace some or all of the
phases described in Chapter 3: Workplan
The aim of the task is to extract information
from the current experiments' analysis processes.
For example, the number of concurrent
users doing analysis in running experiments will give some constraint (scaled
to LHC) on the range of parameters to be simulated.
Other information such as the number of analysing groups and their dispersion may be
investigated and recorded.
Deliverables:
The aim of this task is to identify
a range of schemes of users' needs while performing data analysis.
Some features are
the response time of a database query,
the ability (and willingness) to make queries locally or remotely (regionally
or centrally) and the tools and methods the user will adopt.
Deliverables:
The aim of the
task is to identify a set of analysis processes
that will meet the requirements of LHC data reconstruction and physics
analysis. The models will be expressed in clear diagrams that specify
the input and output data volumes and the frequency of data access,
along with the locations of the data handling capability and computing
power which are intended to meet the needs. Candidate analysis processes
will be selected taking into account the assumed data handling capacity
and network throughput assumed at each site, and matching these
resources with a specification of where, when and how often each
physicist and each analysis group accesses its own samples of selected
data. Individuals' and group-oriented activities will have to be
prioritised, as part of the overall process specification.
This subtask has a first step,
which is to Eliminate Obviously
Unfeasible Models.
The aim of this first step is to exclude early in the project analysis
models that lead to technically unfeasible or clearly unaffordable
resources, even when projected to the year 2005. An example would be an
analysis process
that requires a local desktop storage of hundreds of terabytes, or a
network bandwidth of more that a gigabit/sec dedicated to every
analysing physicist. Such models are to be dropped without wasting time
on detailed simulations.
Deliverables:
The aim of the task is to study,
and then define different schemes of
use of the LHC collaborations' central and distributed resources for
computing and data handling. The schemes will include
priority-assignments for each of the classes of activity that make up
the data analysis, in an attempt to ensure that all components of the
data analysis are completed as needed, in an acceptably short time.
Policies and relative priorities for the use of regional centres by
users from other regions, for the use of network bandwidth, and for
access to remote data-handling facilities will be expressed
parametrically, and methods that relate the throughput or the "rate of
doing work" to the priority-profiles will be developed in the course of
this subtask.
The relative prioritisation
schemes will include components that are
driven by immediate physics needs, as well as the ongoing throughput
requirements for organised high-priority activities (such as the
first-round reconstruction of the raw data). For example, when an
analysis topic is granted priority over others on physics grounds, the
coordinating analysis group must have the methods and be granted the
authority to set priorities on the collaboration's resources. The
response of the system to the new high-priority task should be to
complete the task within a specified (short) time, without undue
disruption of the other analysis activities.
The aim of the task is to study, and then
define, different schemes of use of the LHC collaborations resources
(central and distributed)
in order to guarantee that prioritised analysis will obtain
what they need in due time.
Regional centre access, use of the network
and database distribution are the key parameters for this task.
For example,
when an analysis topic is granted priority over others on physics grounds,
the coordinating analysis group must have the methods and be granted the
authority
to set priorities on the collaboration's resources and
to understand and predict the scheduling
needed to achieve the results in time.
Deliverables:
This task is complicated by
the fact that realistic political constraints
such as policies for the use of computing resources by remote users
(from other countries or world-regions), as well as the technical
parameters that determine the ideal performnance of the system have to
be taken into account.
The task will attempt to produce
a relatively small set of parameters
that contain most of the information required to determine if a given
Model is feasible, and to evaluate its effectiveness in satisfying the
needs of LHC data analysis, relative to the computing and manpower
resources required.
Deliverables:
The manpower currently committed by
collaborating institutes for this task amounts to 30 person-months
(Birmingham, Bologna, Caltech,
FNAL, Milano, Tufts).
4.5 Task 4: Testbeds and Measurement
of Critical Parameters
In order to accomplish the analysis
of computing models and to evaluate the impact of data distribution schemes,
testbeds have to be implemented, in order to measure key
parameters.
The task can be subdivided into the
following subtasks, for their milestones and schedules see
the workplan.
This subtask will implement several
"use- cases" based on different configurations
for the distributed computing model,
and data access patterns related to different functionalities
such as reconstruction and analysis.
This subtask includes:
This subtask is in collaboration with
the other working groups and RD45.
The manpower currently committed by the
collaborating institutes for testbed measurements amounts to 60 person-months
(Bari, Birmingham, Bologna, Caltech, CERN, FNAL, Genova, Milano, Padova, Perugia, Pisa, Roma, Tufts).
Chapter 5: Deliverables
MONARC will deliver:
Chapter 6: Resources
More than 50 physicists and computing
experts have joined MONARC, and committed a significant fraction of their time.
Many others have expressed interest and are expected to join the project in the near future.
Most MONARC members are also involved in other activities within Atlas, CMS, LHCb etc.
The total manpower provided by the present participants is estimated to be
200 person-months.
Manpower totalling 36 person-months
is requested from CERN,
for specific tasks for which CERN can provide the most efficient solution.
For the operation of the MONARC project,
computing equipment,
travel funds and probably software licenses are needed.
The biggest investment is related to equipment, especially for the testbeds.
Most of the needed equipment is available in the institutes,
partly being acquired specifically for MONARC and partly being reused from
or shared with other activities.
Funding at the level of 140 kCHF is requested from CERN,
for the duration of the MONARC project. This includes 40 kCHF to indicate
the cost of commercial discrete event simulation software in case
the studies in the start-up phase (section 3.5)
determine such a purchase is needed.
Chapter 7: Schedule
In this chapter the main milestones
of the MONARC project are summarised. More details on the work flow are
given in the Workplan.
MONARC Main Milestones
Chapter 8: Risk Identification
The risks facing the MONARC project
come primarily from the unknown technology and price evolution from now
up to the time when LHC will be running. In this respect the most uncertain
areas are:
As the role of the professional
"Modellers" is of key importance for
the project, the milestone time scale relies on an early,
effective contribution from this highly skilled staff.
The schedule is agressive out of necessity,
bearing in mind the
coming CPRs in late 1999. Even if not all the objectives are fully met
in this timescale, the work is necessary for planning LHC computing
and will be continuing in some form or other beyond the end of 1999.
Such results as they exist will be used for the CPRs, with these being
continually refined beyond the publication of the CPRs.
Chapter 9: Management and Organisational
Responsibility
The responsibilities which are already accepted are:
Chapter 10: References
4.2.8 Milestones and Schedules
4.3 Task 2: Site and Network Architecture
4.3.1 Subtask: Survey of existing computing architectures
Deliverables:
Schedule:
4.3.2 Subtask: Survey of planned computing architectures
Near term experiments: BaBar, BELLE, CDF, D0, HERA-B,
RHIC
Later experiments: LHC
Schedule:
4.3.3 Subtask: Survey of potential regional centres and proposed architectures
Deliverables:
Schedule:
4.3.4 Subtask: Technology evaluation and cost tracking
Deliverables:
Schedule:
4.3.5 Subtask: Network performance and cost tracking
Deliverables:
Schedule:
4.3.6: Resources
An effort is made to establish
a diagram of the analysis process where all the aspects are taken into
account in an abstract way, together with the relations between them (e.g.
calibration, partial re-processing, short runs for program and analysis
setting up).
Only the aspects likely to absorb
more resources are taken into consideration, and these aspects are subjected
to fast iteration on paper for directly optimising the resource-consumption.
In addition, this task will address the
definition of a set of values able to characterise the analysis processes
in conjunction with the different computing model architectures, since the
correlation is very strong.
4.4.1 Subtask: Analyse contemporary production and analysis procedures
Milestones:
Duration/Schedule:
4.4.2 Subtask: Identify user requirements
Milestones:
Duration/Schedule:
4.4.3 SubTask: Identify feasible models to be simulated
Milestones:
The task will spread all over the duration of the project.
Duration/Schedule:
4.4.4 Subtask: Elaborate policies, priorities and schedules for different
models
Milestones:
The task will end
with the project, since the coordination and management of resources is
an item intrinsically embedded into the analysis process.
Duration/Schedule:
4.4.5 Subtask: Identify key parameters to evaluate simulated models
The aim of the task
is to establish those parameters which have the
greatest effect in determining whether the overall Model is feasible,
both in terms of the resource requirements and time-to-completion of the
components of the data analysis. Obvious examples are the network
bandwidth, computing power, data handling capacity and the times
required to return data-samples of varying sizes at each site. Less
obvious examples are the means of responding to peak demands, the
efficiency as a function of the load for various system components, the
character and flexibility of the prioritisation mechanisms, and
trade-off procedures for maximising the (priority-weighted) throughput
of the system when all demands cannot be met simultaneously.
Milestones:
Duration/Schedule:
4.4.6 Resources
4.5.1 Subtask: Define scope and configuration of testbeds
This subtask requires preliminary
information to be collected in collaboration with the "Analysis Process
Design" and the "Site and Network Architecture" Working Groups:
This SubTask will define which Testbeds
will be implemented to evaluate:
4.5.2 SubTask: Implementation and operation of testbeds
Testbeds will involve many institutes
running the appropriate tests. There will be
defined procedures for managing the global setup and facilitating test
bed execution in remote sites. This global setup includes the configuration
and management of hardware and software in all the involved regional centres.
4.5.3 Subtask: Verify key simulation parameters
4.5.4 Resources
The extra manpower required from CERN is 6 person-months, for the setting up, maintenance
and operation of a central testbed facility and defining configurations which
can be mostly replicated in the outside institutes.
The Caltech, Tufts, Milano, and Bologna groups are
actively searching for professionals or young people to be recruited to
devote most of their time to the technical work of the project.
Manpower Description 18 Design and implementation of the models in the simulation
plus development and support of the simulation tools 6 Setup and maintenance and operation of the CERN-based testbed
system and the related software tools 8 Analysis and design of the CERN-site architectures 4 Analysis of networks 36 Total manpower (person-months) requested from CERN 200 Availability of MONARC participants
Both in Italy and in the US,
more than 200 GBytes of disk space will be devoted to MONARC Objectivity/DB storage,
with read/write speeds expected to reach 100 MBytes/sec.
All the groups taking responsibility in the
testbed task have one or more workstations, or PC farms
that can be largely devoted to the MONARC work.
Caltech and Milano have access to Exemplar systems (at CACR and at CILEA
respectively) which can be used for short term tests.
Funding
(kCHF) Description 80 Capital cost of the CERN based testbed system
and development systems for modelling and simulation running 20 Travel money 40 Potential cost of commercial discrete event simulation
software 140 Total funding (kCHF) requested from CERN for the duration
of phases 1 and 2 of the MONARC project 500 For comparison, estimated value of the
dedicated and shared computing facilities outside of CERN
Choose Modelling Tools
Hardware setup for testbed systems
Validate the chosen tools with a Model
taken from an existing experiment
Complete first technical run of Simulations
of a well defined LHC type Model
Start measurements on testbed systems
Choose the range of models to be simulated
Completion of the first cycle of simulation
of possible Model for LHC experiments. First classification of the Models
and first evaluation criteria.
Progress Report with detailed
workplan for Phase-2
Completion of the coding and validating
phase for second-round Models
Completion of the second cycle of
simulations of refined Models for LHC experiments
Completion of the project and delivery
of the deliverables.
Another source of concern is due to the
very ambitious statements made in the CTP's about the quality of the analysis
environment for every physicist: the full transparency and minimum turnaround
time for any query, which are probably unrealistic goals (albeit useful
as asymptotic aims).
Realistic user requirements, in view
also of a meaningful cost/benefit ratio, will have to be negotiated with
the physicists of the experiments.
The responsibilities for the management
of the MONARC project are shared between the Spokesperson, the Project
Leader, and the Steering group.
The members of the Steering Group will include:
the Spokesperson, the Project Leader, the Chairs of the Working Groups,
plus representatives of major computer centres.
The project reference person in technical
matters: assures the technical direction of the project and the coordination
of resources.
The project reference person for the
operation of the project: takes care of the relations with LCB and CERN
in general; is responsible of keeping the project on track.
Steering Group Function Person Accepting Responsibility Spokesperson Harvey Newman Project Leader Laura Perini Simulation and Modelling WG Krzysztof Sliwa Site and Network Architecture WG Analysis Process Design WG Paolo Capiluppi Testbed WG Computer Centres
http://atlasinfo.cern.ch/Atlas/GROUPS/WWCOMP/pap_june30.html
http://www.mi.infn.it/~cmp/rd55/rd55-1-98.html
http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/TDR/html/Welcome.html
http://cmsdoc.cern.ch/ftp/CMG/CTP/index.html
http://wwwinfo.cern.ch/pl/cernlib/rd45/reports.htm
The RD45 web site is at:
http://wwwinfo.cern.ch/asd/rd45/index.html
http://wwwinfo.cern.ch/pdp/pc/soda/
http://www.objectivity.com/
http://wwwinfo.cern.ch/pdp/vm/guide/hsm_project.html
HPSS official page
http://www.sdsc.edu/hpss/hpss.html
http://pcbunn.cithep.caltech.edu/
http://nicewww.cern.ch/~davidw/icfa/July98Report.html
and their requirements report
http://l3www.cern.ch/~newman/icfareq98.html
Home page http://nicewww.cern.ch/~les/pasta/run2/welcome.html
Home page http://wwwcs.cern.ch/