LCB 98-xx

Models of Networked Analysis at Regional Centres for LHC Experiments

(MONARC)

PROJECT EXECUTION PLAN

Prepared by

M. Aderholz (MPI), K. Amako (KEK), E. Arderiu Ribera (CERN), E. Auge (L.A.L/Orsay), G. Bagliesi (Pisa/INFN), L. Barone (Roma1/INFN), G. Battistoni (Milano/INFN), J. Bunn (Caltech/CERN), J. Butler (FNAL), M. Campanella (Milano/INFN), P. Capiluppi (Bologna/INFN), M. Dameri (Genova/INFN), D. Diacono (Bari/INFN), A. di Mattia (Roma1/INFN), U. Gasparini (Padova/INFN), F. Gagliardi (CERN), I. Gaines (FNAL), P. Galvez (Caltech), C. Grandi (Bologna/INFN), F. Harris (Oxford/CERN), K. Holtman (CERN), V. Karimäki (Helsinki), J. Klem (Helsinki), M. Leltchouk (Columbia), D. Linglin (IN2P3/Lyon Computing Centre), P. Lubrano (Perugia/INFN), L. Luminari (Roma1/INFN), M. Michelotto (Padova/INFN), I. McArthur (Oxford), H. Newman (Caltech), S.W. O'Neale (Birmingham), B. Osculati (Genova/INFN), M. Pepe (Perugia/INFN), L. Perini (Milano/INFN), J. Pinfold (Alberta), R. Pordes (FNAL), S. Rolli (Tufts), T. Sasaki (KEK), L. Servoli (Perugia/INFN), R.D. Schaffer (Orsay), M. Sgaravatto (Padova/INFN), T. Schalk (BaBar), J. Shiers (CERN), L. Silvestris (Bari/INFN), G.P. Siroli (Bologna/INFN), K. Sliwa (Tufts), C. Stanescu (Roma3/INFN), T. Smith (CERN), C. von Praun (CERN), E. Valente (INFN), I. Willers (CERN), R. Wilkinson (Caltech), D.O. Williams (CERN)

 
20 September 1998

Executive Summary

The MONARC project will attempt to determine which classes of computing models are feasible for the LHC experiments. The boundary conditions for the models will be the network capacity and data handling resources likely to be available at the start of and during LHC running.

The main deliverable from the project will be a set of example "baseline" models. The project will also help to define regional centre architectures and functionality, the physics analysis process for the LHC experiments, and guidelines for retaining feasibility over the course of running. The results will be made available in time for the LHC Computing Progress Reports, and could be refined for use in the Experiments' Computing Technical Design Reports by 2002.

The approach taken in the Project is to develop and execute discrete event simulations of the various candidate distributed computing systems. The granularity of the simulations will be adjusted according to the detail required from the results. The models will be iteratively tuned in the light of experience. Simulation of the diverse tasks that are part of the spectrum of computing in HEP will be undertaken, and a simulation and modelling tool kit will be developed to enable studies of the impact of network and data handling limitations on the models.

Chapter 1: Introduction

The LHC experiments have envisaged computing models involving hundreds of physicists doing analysis on petabytes of data at institutions around the world. CMS and ATLAS also are considering the use of regional centres, each of which could complement the functionality of the CERN centre. The use of these centres would be well matched to the worldwide-distributed structure of the collaboration. They are intended to facilitate the access to the data, with more efficient and cost-effective data delivery to the groups in each world region, using national networks of greater capacity than may be available on intercontinental links.

The LHC models encompass a complex set of wide-area, regional and local-area networks, a heterogeneous set of compute- and data-servers, and a yet-to-be determined set of priorities for group-oriented and individual demands for remote data. Distributed systems of this scope and complexity do not yet exist, although systems of a similar size to those foreseen for the LHC experiments are predicted to come into operation by around 2005 in large corporations.

In order to proceed with the planning and design of the LHC computing models, and to correctly dimension the capacity of the networks and the size and characteristics of regional centres, it is essential to conduct a systematic study of these distributed systems. The MONARC project therefore intends to simulate and study network-distributed computing architectures, data access and data management systems that are major components of the computing model, and the ways in which the components interact across networks. MONARC will bring together the efforts and relevant expertise from the LHC experiments and R&D projects, as well as from current or imminent experiments already engaged in building distributed systems for computing, data access, simulation and analysis.
 

The primary goals of this project are:

As a result of this study, MONARC will deliver a set of tools for simulating candidate computing models of the experiments, and a set of common guidelines to allow the experiments to formulate their final models.

Distributed databases are a crucial aspect of these studies. The RD45 project has developed considerable expertise in the field of Object Oriented Database Management Systems (ODBMS). MONARC will benefit from this experience and cooperate with RD45 in the specific areas where the work of the two projects overlap. MONARC will investigate questions which are largely complementary to RD45, such as network performance and prioritisation of traffic, for a variety of applications that must coexist and share the network resources.

 

Chapter 2: Objectives

A set of common modelling and simulation tools will be developed in MONARC. These tools will be integrated in an environment which will enable the LHC experiments to realistically evaluate and optimise their physics analysis procedures and their computing models, basing them on distributed data and computing architectures. Tools to realistically estimate the network bandwidth required in a given computing model will be developed. The parameters that are necessary and sufficient to characterise the computing model and its performance will be identified. The methods and tools to measure the model's performance, and to detect bottlenecks, will be designed and developed, and also tested in prototypes. This work will be done with as much as possible co-operation with the present LHC R&D projects, and current or imminent experiments.

The final goal is to determine a set of feasible models, and to provide a set of guidelines which the experiments could use to build their respective computing models.

The main objectives leading to this goal are:

 

Chapter 3: Workplan

3.1 Scope

MONARC aims to study analysis models and architectures suitable for LHC experiments, in order to contribute to their computing models in time for the Computing Progress Reports (CPR) that are due around the end of 1999.

The project involves collaboration not only from the LHC experiments, but also from other HEP experiments preparing to run in the near future, such as BaBar and COMPASS. These experiments are going to develop expertise in many of the computing-related fields of interest for LHC. MONARC will interact also with other teams: e.g. RD45, the ICFA network group, the HPSS teams and the GIOD project.

Although the manpower for the project is mostly provided by the collaborating institutes, the project also requires a significant manpower contribution from CERN:

3.2 Interaction with other experiments and projects 3.3 Working methods

The working methods to be employed in MONARC are largely determined by features of the project's structure:

MONARC working methods will embody the following principles: The principles and requirements stated above lead naturally to a collaboration structure based on Working Groups and on a Steering Group. The Steering Group assures the integration of the various tasks that are pursued in the Working Groups.  

3.4 Working Groups and Steering Group

The Working Groups are:

The interplay between the different working groups and their activities, and the need to coordinate the decomposition/integration steps in the various iteration cycles, require that a detailed schedule be established to synchronise the relevant tasks. This schedule is therefore explained in section 3.5, Phases of the Project; chapter 4 details the tasks and subtasks and a summary of the milestones is given in chapter 7: Schedule.

The steering group is composed of the chairpersons of the working groups together with the spokesman, the project leader and representatives of regional centres (See chapter 9).

3.5 Phases of the Project

This PEP proposes a project workplan divided in two phases:

The Phase 3 mentioned in the PAP is now more clearly envisaged as a further R&D project, aimed at prototype designs and test implementations of the computing models. It will be the natural continuation of this project; it is however definitely separated from it in terms of deliverables. It should start at the completion of Phase 2 and contribute to the CTDR.

The level of detail that can be reached in planning the work is obviously different for Phase 1 and for Phase 2, as the actual planning for Phase 2 will largely depend on the outcome of Phase 1.
In the following the workplan for Phase 1 is presented in some detail, while for Phase 2 only a summary view is given.

The workplan for Phase 1 is organised in three sub-phases

The detailed workplan for Phase 2 will be provided in a progress report of this project which will be delivered at the conclusion of Phase 1, in mid-1999.
In this phase, two or three cycles of simulation/modelling will be performed. Here a summary is given of the items that will be addressed: At the end of Phase 2, the deliverables of this project will be:  

3.6 Scope Limitations

 

3.7 Assumptions and Pre-Requisites

Chapter 4: Task Definitions

4.1 Overview

The major tasks are matched to the Working Groups listed in Section 3.4.

For each task a summary is given of the manpower resources available in the collaborating institutes, and of the current request.

4.2 Task 1: Simulation and Modelling

Realistic simulation and modelling of the distributed computing systems are the most important tasks in the first phase of the project. The goal is to be able to reliably model the behaviour of the system of site facilities and networks, given the assumed physical structure of the computer systems and the usage patterns, including the manner in which hundreds of physicists will access LHC data. The hardware and networking costs, and the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, are the main metrics that will be used to evaluate the models. The goal is to narrow down a region in this parameter space in which viable models can be chosen by any of the LHC-era experiments.

The planned research can be divided into the following subtasks, where the activities are expected to follow an iterative approach, with a complete cycle of the design-development-modelling-validation steps for every model studied, and every simulation tool used.

4.2.1 Subtask: Survey existing modelling tools

Ideally one would like to use a small number (preferably 2-3) tools to be able to cross-check the results. There exist a number of packages, for example, SoDA, MODNET, SES, COMNET, Ptolemy, Simple++, PARASOL; some of which have been looked at by various groups within MONARC. This work should be pursued vigorously, with a recommendation by the end of 1998. SoDA, a simulation environment developed by Christoph von Praun at CERN IT, is at present the leading candidate to become one of the chosen packages.

4.2.2 Subtask: Use the tools for coding the models which MONARC will explore

It is essential to decide on the appropriate level of modelling complexity. This work should start immediately, as it must run in parallel with the task of development of the modelling tools. We anticipate that the models will include sufficient details of data transfers from the disk to CPU, hardware configurations, complex network connections with varying availability of bandwidth depending on geographical topology, time and varying level of quality of service implementations. It is also essential to develop, as well as possible, models for data access patterns and analysis patterns. Here, the importance of input from physicists involved in experiments being part of MONARC cannot be overestimated. Also, experience from current, or near future, large statistics HEP experiments should be examined.

4.2.3 Subtask: Develop modelling packages or a combination of existing tools

Significant development work might be required to extend the existing SoDA class libraries in order to be able to describe the models which will be simulated. The goal is to have an advanced set of simulation tools by Spring of 1999.

4.2.4 Subtask: Run simulations of the coded models

This subtask will involve detailed simulations performed with the adopted set of tools and agreed upon different sets of input parameter values, in order to explore meaningfully the multidimensional parameter space of variables which describe the computer system models. The goal is to deliver to the experiments first reliable results by the Summer of 1999.

4.2.5 Subtask: Validate simulation results on testbeds

This important step is anticipated take place in parallel with the first implementation of the models. We expect preparations for this subtask to begin almost immediately. Coding significant patterns of data handling in existing experiments and simulating them will be a first validating step.
Designing the test-bed measurements with which one can verify the results of model simulations, at first simple and then more complex, is of paramount importance. There may be overlap here with designing of the test-bed measurements with which to improve our knowledge of a number of parameters which are needed as input to models of the computing systems.

4.2.6 Subtask: Establish a repository for the MONARC project

Provide and organise a repository to contain relevant information

4.2.7 Resources

We require at least 100 person-months for this task. The manpower currently available for this simulation task amounts to 75 person-months (Bologna, Caltech, Milano, Perugia, Tufts). In order to provide the remaining manpower required for this activity, we are requesting a major contribution of 18 person-months from CERN, and 40 person-months from the US.

 

4.2.8 Milestones and Schedules

4.3 Task 2: Site and Network Architecture

This task addresses the issues of hardware and network architecture of the distributed computing systems to be modelled.  In general this task will provide information on architectures used at major HEP computing centres for previous and current generations of experiments, of plans of major centres for future experiments, and of technology and cost trends for the major components (CPU, disk, mass storage and networks) of potential distributed computing systems. This information will be fed into the model simulations so that models can be based on reality (both technologically and sociologically), so that models can be optimised based on expected costs, and so that the dependence of the models on costs and technology projections can be clearly seen. The information will also be used to suggest avenues of study for the testbed task.

4.3.1 Subtask: Survey of existing computing architectures

Descriptions of computing architectures used by current experiments should be prepared, concentrating on LEP, HERA, the FNAL Collider, the large fixed target experiments at CERN and FNAL.  Architectural descriptions should include:

Deliverables:
Schedule:

4.3.2 Subtask: Survey of planned computing architectures

Similar descriptions as in subtask 4.3.1 should be provided for the plans for meeting the needs of major upcoming experiments:
 Near term experiments: BaBar, BELLE, CDF, D0, HERA-B, RHIC
 Later experiments: LHC

Deliverables:


Schedule:

4.3.3 Subtask: Survey of potential regional centres and proposed architectures

Potential sites for LHC regional centres should be identified and surveyed as to plans for hardware deployment and personnel support expected to be available.  It should be recognised that there are likely to be several different styles of regional centres, from comprehensive centres offering large amounts of CPU, disk and mass storage for all stages of the analysis process, to centres specialising in certain components of the full analysis stream.  Different amounts of support and different topologies should also be considered and our models must take these differences into account. Surveys should include:


Deliverables:
Schedule:

4.3.4 Subtask: Technology evaluation and cost tracking

Realistic models require up-to-date estimates of hardware cost and performance.  This subtask will require both market tracking and measurements of hardware components as they are acquired by participating institutes in the MONARC collaboration, in the categories of:


Deliverables:
Schedule:  

4.3.5 Subtask: Network performance and cost tracking

Networks are a critical component of all distributed computing models, and availability is influenced both by technological and external forces. The most accurate projections for network performance are a crucial input to any distributed architecture models.  Measurements of current performance and projections of future availability and cost should be acquired in conjunction with other groups, in the categories of:


Deliverables:
Schedule:

4.3.6: Resources

The manpower currently committed to this task amounts to 39 person-months (Caltech, CERN, FNAL, Milano, Oxford, Perugia, Roma, Tufts). The additional manpower required is 8 person-months (from CERN).

4.4 Task 3: Analysis Process Design

This task aims at the definition of a few different schemes of "the way of doing analysis" among the many possibilities afforded by new (and perhaps unforeseen) computing technologies. The task can be addressed with two different and complementary approaches:

Both approaches will be pursued and combined suitably in this project.

The task will select some different scenarios for the analysis process, taking into account:

In addition, this task will address the definition of a set of values able to characterise the analysis processes in conjunction with the different computing model architectures, since the correlation is very strong.

In summary, this Task will identify possible analysis processes defining where the raw-data reside, how the reconstructed objects will be produced and stored, how and where the selection of relevant data for the analyses will be accessed, and finally where and how the physicists will go through their analyses.

All of the above will have to be coded and parameterised in the simulation.

A very simple example can better illustrate the nature of the task. Given a PByte/year of data, an analysis group goes through the tag database (produced quasi-online) selecting 1% of reconstructed objects created by the "offline reconstruction". They need to go through the full 1% sample once per month, producing an analysis object sample (reduced in number of events and in size) which contains relevant analysis information, refined by experience and results from previous steps. Each member of the group needs access to the selected analysis objects, at a time which fits their personal work schedule, to extract results and personal sub-samples.

The timings, workaround, consistence, residence, storage, CPU needs and efficiency of such an example will be addressed by this task which will try to schematically describe some different (and affordable) possibilities.

For the analysis processes retained for further study and simulation, either initially or later during the project, clear diagrams will be produced to show the sequential, parallel and iterative steps.

The task will be performed through the following work packages or subtasks which embrace some or all of the phases described in Chapter 3: Workplan

 

4.4.1 Subtask: Analyse contemporary production and analysis procedures

The aim of the task is to extract information from the current experiments' analysis processes. For example, the number of concurrent users doing analysis in running experiments will give some constraint (scaled to LHC) on the range of parameters to be simulated. Other information such as the number of analysing groups and their dispersion may be investigated and recorded.

Deliverables:


Milestones:
Duration/Schedule:
  • A working relationship is needed with other experiments, particularly those at hadron colliders. The duration/schedule is connected to the milestones previously stated, and to the evolution of the processes used by our colleagues at running experiments that may lead us to modify some of the simulations in this project.
 

4.4.2 Subtask: Identify user requirements

The aim of this task is to identify a range of schemes of users' needs while performing data analysis. Some features are the response time of a database query, the ability (and willingness) to make queries locally or remotely (regionally or centrally) and the tools and methods the user will adopt.

Deliverables:

  • A set of constrained parameters defining the steps followed by the users, individually and in a variety of group-oriented activities, while doing analysis. The parameters also will define the load presented to the computing and data handling facilities, and the networks, as the users carry out their analysis.

Milestones:
  • First milestone during Phase 1B for initial simulation.
  • Second milestone during Phase 1C to refine different (some) ranges of parameters.
  • Third milestone during Phase 2 to settle final range of affordable users' requests.

Duration/Schedule:
  • This task ends at the end of the Project, and its schedule is set by the previously-stated milestones. It may be revisited during Phase 3, in a follow-on R&D project.

 

4.4.3 SubTask: Identify feasible models to be simulated

The aim of the task is to identify a set of analysis processes that will meet the requirements of LHC data reconstruction and physics analysis. The models will be expressed in clear diagrams that specify the input and output data volumes and the frequency of data access, along with the locations of the data handling capability and computing power which are intended to meet the needs. Candidate analysis processes will be selected taking into account the assumed data handling capacity and network throughput assumed at each site, and matching these resources with a specification of where, when and how often each physicist and each analysis group accesses its own samples of selected data. Individuals' and group-oriented activities will have to be prioritised, as part of the overall process specification.

This subtask has a first step, which is to Eliminate Obviously Unfeasible Models. The aim of this first step is to exclude early in the project analysis models that lead to technically unfeasible or clearly unaffordable resources, even when projected to the year 2005. An example would be an analysis process that requires a local desktop storage of hundreds of terabytes, or a network bandwidth of more that a gigabit/sec dedicated to every analysing physicist. Such models are to be dropped without wasting time on detailed simulations.

Deliverables:

  • A set of feasible analysis processes for LHC treatment of data, suitable for simulation according to the computing model architectures defined by the present project.

Milestones: The task will spread all over the duration of the project.
  • A First milestone is foreseen for Phase 1B when an analysis process for first simulation is needed.
  • A Second milestone is due for phase 1C when a set of analysis processes is needed to understand the possible spread of models.
  • A Third milestone is due during phase 2 in order to refine the analysis process possibilities taking into account the constraints given by the simulation performed and by the technology and budget limitations.
  • Finally other milestones can be foreseen just before the end of the Project or, eventually, for a possible extension to phase 3.

Duration/Schedule:
  • See Workplan.
 

4.4.4 Subtask: Elaborate policies, priorities and schedules for different models

The aim of the task is to study, and then define different schemes of use of the LHC collaborations' central and distributed resources for computing and data handling. The schemes will include priority-assignments for each of the classes of activity that make up the data analysis, in an attempt to ensure that all components of the data analysis are completed as needed, in an acceptably short time. Policies and relative priorities for the use of regional centres by users from other regions, for the use of network bandwidth, and for access to remote data-handling facilities will be expressed parametrically, and methods that relate the throughput or the "rate of doing work" to the priority-profiles will be developed in the course of this subtask.

The relative prioritisation schemes will include components that are driven by immediate physics needs, as well as the ongoing throughput requirements for organised high-priority activities (such as the first-round reconstruction of the raw data). For example, when an analysis topic is granted priority over others on physics grounds, the coordinating analysis group must have the methods and be granted the authority to set priorities on the collaboration's resources. The response of the system to the new high-priority task should be to complete the task within a specified (short) time, without undue disruption of the other analysis activities. The aim of the task is to study, and then define, different schemes of use of the LHC collaborations resources (central and distributed) in order to guarantee that prioritised analysis will obtain what they need in due time. Regional centre access, use of the network and database distribution are the key parameters for this task. For example, when an analysis topic is granted priority over others on physics grounds, the coordinating analysis group must have the methods and be granted the authority to set priorities on the collaboration's resources and to understand and predict the scheduling needed to achieve the results in time.

Deliverables:

  • A set of rules to access collaboration's computing resources, together with methods to implement them. The rules have to be "tuned" to different analysis resources and to different computing model architectures.

Milestones: The task will end with the project, since the coordination and management of resources is an item intrinsically embedded into the analysis process.
  • A First milestone can be foreseen for Phases 1B-1C, when some different schemes of Processes have to be simulated on distributed resources.
  • A Second milestone is certainly due for Phase 2 when priorities and schedules have to be taken into account to get reasonable simulation results.
  • Some more refined milestones will be necessary for the evolution to Phase 3.

Duration/Schedule:
  • According to previous milestones and to the evolution of distributed database management systems during the project's life.
 

4.4.5 Subtask: Identify key parameters to evaluate simulated models

The aim of the task is to establish those parameters which have the greatest effect in determining whether the overall Model is feasible, both in terms of the resource requirements and time-to-completion of the components of the data analysis. Obvious examples are the network bandwidth, computing power, data handling capacity and the times required to return data-samples of varying sizes at each site. Less obvious examples are the means of responding to peak demands, the efficiency as a function of the load for various system components, the character and flexibility of the prioritisation mechanisms, and trade-off procedures for maximising the (priority-weighted) throughput of the system when all demands cannot be met simultaneously.

This task is complicated by the fact that realistic political constraints such as policies for the use of computing resources by remote users (from other countries or world-regions), as well as the technical parameters that determine the ideal performnance of the system have to be taken into account.

The task will attempt to produce a relatively small set of parameters that contain most of the information required to determine if a given Model is feasible, and to evaluate its effectiveness in satisfying the needs of LHC data analysis, relative to the computing and manpower resources required.

Deliverables:

  • A set of parameters to evaluate the simulated analysis models in terms of their feasibility and relative effectiveness.

Milestones:
  • A First milestone stating a preliminary set of parameters is needed during Phases 1B/1C.
  • A Second milestone is due during Phase 2 to specify the results of the simulations in a measurable way.
  • A Third (and probably not definitive) milestone will be requested for Phase 3 of the Project.

Duration/Schedule:
  • According to previous milestones.
 

4.4.6 Resources

The manpower currently committed by collaborating institutes for this task amounts to 30 person-months (Birmingham, Bologna, Caltech, FNAL, Milano, Tufts).

 
 

4.5 Task 4: Testbeds and Measurement of Critical Parameters

In order to accomplish the analysis of computing models and to evaluate the impact of data distribution schemes, testbeds have to be implemented, in order to measure key parameters.

The task can be subdivided into the following subtasks, for their milestones and schedules see the workplan.

 

4.5.1 Subtask: Define scope and configuration of testbeds

This subtask will implement several "use- cases" based on different configurations for the distributed computing model, and data access patterns related to different functionalities such as reconstruction and analysis.
This subtask requires preliminary information to be collected in collaboration with the "Analysis Process Design" and the "Site and Network Architecture" Working Groups:

  • Input information from RD45 with respect to the ODBMS issues
  • From experiments, such as ATLAS and CMS, distribution requirements and map them to possible architectures based on the previous point.
  • From experiments such as BaBar information about actual performances of distributed databases.
  • HPSS configurations and performances
This SubTask will define which Testbeds will be implemented to evaluate:
  • Data access patterns
  • The degree of data replication permitted in the federated database
  • The distribution of CPU and storage resources
 

4.5.2 SubTask: Implementation and operation of testbeds

Testbeds will involve many institutes running the appropriate tests. There will be defined procedures for managing the global setup and facilitating test bed execution in remote sites. This global setup includes the configuration and management of hardware and software in all the involved regional centres.

This subtask includes:

  • Creation of scripts which automate the test bed installation/execution in remote sites.
  • Set up access mechanisms to real and simulated data in an Objectivity/DB database (as is currently available from the CMS test beam runs and the GIOD project respectively).
  • Set up a dedicated ODBMS at CERN (on the requested testbed system), and in some outside sites, in collaboration with the GIOD and RD45 projects.
  • Set up resources to use the mass storage management (HPSS) facilities at CERN, in collaboration with the IT/PDP group.
 

4.5.3 Subtask: Verify key simulation parameters

This subtask is in collaboration with the other working groups and RD45.

  • Verify, on the testbeds, the key parameters obtained in the simulation phase.
  • Study ODBMS-related parameters, including overheads and network protocol latencies, using the testbeds.
  • Study parameters related to the network-link bandwidth and the topology of the connections between regional centres.

 

4.5.4 Resources

The manpower currently committed by the collaborating institutes for testbed measurements amounts to 60 person-months (Bari, Birmingham, Bologna, Caltech, CERN, FNAL, Genova, Milano, Padova, Perugia, Pisa, Roma, Tufts).
The extra manpower required from CERN is 6 person-months, for the setting up, maintenance and operation of a central testbed facility and defining configurations which can be mostly replicated in the outside institutes.

 

Chapter 5: Deliverables

MONARC will deliver:

  • Specifications for a set of feasible models.
  • Guidelines for the LHC collaborations to use in building their computing models.
  • A set of modelling tools to enable the LHC experiments to simulate and refine their computing models.

Chapter 6: Resources

More than 50 physicists and computing experts have joined MONARC, and committed a significant fraction of their time. Many others have expressed interest and are expected to join the project in the near future. Most MONARC members are also involved in other activities within Atlas, CMS, LHCb etc. The total manpower provided by the present participants is estimated to be 200 person-months.
The Caltech, Tufts, Milano, and Bologna groups are actively searching for professionals or young people to be recruited to devote most of their time to the technical work of the project.

Manpower totalling 36 person-months is requested from CERN, for specific tasks for which CERN can provide the most efficient solution.

 
Manpower Description
18 Design and implementation of the models in the simulation plus development and support of the simulation tools
6 Setup and maintenance and operation of the CERN-based testbed system and the related software tools
8 Analysis and design of the CERN-site architectures
4 Analysis of networks
36 Total manpower (person-months) requested from CERN
200 Availability of MONARC participants

For the operation of the MONARC project, computing equipment, travel funds and probably software licenses are needed. The biggest investment is related to equipment, especially for the testbeds. Most of the needed equipment is available in the institutes, partly being acquired specifically for MONARC and partly being reused from or shared with other activities.
Both in Italy and in the US, more than 200 GBytes of disk space will be devoted to MONARC Objectivity/DB storage, with read/write speeds expected to reach 100 MBytes/sec. All the groups taking responsibility in the testbed task have one or more workstations, or PC farms that can be largely devoted to the MONARC work. Caltech and Milano have access to Exemplar systems (at CACR and at CILEA respectively) which can be used for short term tests.

Funding at the level of 140 kCHF is requested from CERN, for the duration of the MONARC project. This includes 40 kCHF to indicate the cost of commercial discrete event simulation software in case the studies in the start-up phase (section 3.5) determine such a purchase is needed.

 
Funding
(kCHF)
Description
80 Capital cost of the CERN based testbed system and development systems for modelling and simulation running
20 Travel money
40 Potential cost of commercial discrete event simulation software
140 Total funding (kCHF) requested from CERN for the duration of phases 1 and 2 of the MONARC project
500 For comparison, estimated value of the dedicated and shared computing facilities outside of CERN

Chapter 7: Schedule

In this chapter the main milestones of the MONARC project are summarised. More details on the work flow are given in the Workplan.

MONARC Main Milestones

Chapter 8: Risk Identification

The risks facing the MONARC project come primarily from the unknown technology and price evolution from now up to the time when LHC will be running. In this respect the most uncertain areas are:

Another source of concern is due to the very ambitious statements made in the CTP's about the quality of the analysis environment for every physicist: the full transparency and minimum turnaround time for any query, which are probably unrealistic goals (albeit useful as asymptotic aims).
Realistic user requirements, in view also of a meaningful cost/benefit ratio, will have to be negotiated with the physicists of the experiments.

As the role of the professional "Modellers" is of key importance for the project, the milestone time scale relies on an early, effective contribution from this highly skilled staff.

The schedule is agressive out of necessity, bearing in mind the coming CPRs in late 1999. Even if not all the objectives are fully met in this timescale, the work is necessary for planning LHC computing and will be continuing in some form or other beyond the end of 1999. Such results as they exist will be used for the CPRs, with these being continually refined beyond the publication of the CPRs.

Chapter 9: Management and Organisational Responsibility
The responsibilities for the management of the MONARC project are shared between the Spokesperson, the Project Leader, and the Steering group.

The members of the Steering Group will include: the Spokesperson, the Project Leader, the Chairs of the Working Groups, plus representatives of major computer centres.

The responsibilities which are already accepted are:

 
Steering Group Function Person Accepting Responsibility
Spokesperson Harvey Newman
Project Leader Laura Perini
Simulation and Modelling WG Krzysztof Sliwa
Site and Network Architecture WG
Analysis Process Design WG Paolo Capiluppi
Testbed WG
Computer Centres

Chapter 10: References

  1. MONARC PAP, June 1998
    http://atlasinfo.cern.ch/Atlas/GROUPS/WWCOMP/pap_june30.html
  2. The analysis model and the optimisation of the geographical distribution of computing resources, M.Campanella,l.Perini,INFN/Milan,July 98, MONARC Note 98/1
    http://www.mi.infn.it/~cmp/rd55/rd55-1-98.html
  3. ATLAS Computing Technical Proposal,CERN/LHCC 96-43,19 Dec 1996
    http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/TDR/html/Welcome.html
  4. CMS Computing Technical Proposal,CERN/LHCC 96-45.19 Dec 1996
    http://cmsdoc.cern.ch/ftp/CMG/CTP/index.html
  5. Status report of the RD45 project, 8 April 1998
    http://wwwinfo.cern.ch/pl/cernlib/rd45/reports.htm
    The RD45 web site is at: http://wwwinfo.cern.ch/asd/rd45/index.html
  6. Simulation of Distributed Architectures(SODA), C Von Praun (follow WWW links for several relevant documents)
    http://wwwinfo.cern.ch/pdp/pc/soda/
  7. Objectivity/DB official page
    http://www.objectivity.com/
  8. HPSS High Performance Storage Systems project at CERN
    http://wwwinfo.cern.ch/pdp/vm/guide/hsm_project.html
    HPSS official page http://www.sdsc.edu/hpss/hpss.html
  9. GIOD Globally Interconnected Object Databases, Caltech, CERN, HP Joint Project
    http://pcbunn.cithep.caltech.edu/
  10. Status report ICFA Networking Task Force, July 1998. ICFA/98/671
    http://nicewww.cern.ch/~davidw/icfa/July98Report.html
    and their requirements report http://l3www.cern.ch/~newman/icfareq98.html
  11. PASTA Technology Tracking Team for Processors, Memory, Storage and Architectures
    Home page http://nicewww.cern.ch/~les/pasta/run2/welcome.html
  12. NT3 Network Technology Tracking Team have documents on the CS Group web pages
    Home page http://wwwcs.cern.ch/