LCB 98-xx

Models of Networked Analysis at Regional Centres for LHC Experiments

(MONARC)

PROJECT EXECUTION PLAN

Prepared by

M. Aderholz (MPI), K. Amako (KEK), E. Arderiu Ribera (CERN), E. Auge (L.A.L/Orsay), G. Bagliesi (Pisa/INFN), L. Barone (Roma1/INFN), G. Battistoni (Milano/INFN), J. Bunn (Caltech/CERN), J. Butler (FNAL), M. Campanella (Milano/INFN), P. Capiluppi (Bologna/INFN), M. Dameri (Genova/INFN), D. Diacono (Bari/INFN), A. di Mattia (Roma1/INFN), U. Gasparini (Padova/INFN), F. Gagliardi (CERN), I. Gaines (FNAL), P. Galvez (Caltech), C. Grandi (Bologna/INFN), F. Harris (Oxford/CERN), K. Holtman (CERN), V. Karimäki (Helsinki), J. Klem (Helsinki), M. Leltchouk (Columbia), D. Linglin (IN2P3/Lyon Computing Centre), P. Lubrano (Perugia/INFN), L. Luminari (Roma1/INFN), M. Michelotto (Padova/INFN), I. McArthur (Oxford), H. Newman (Caltech), S.W. O'Neale (Birmingham), B. Osculati (Genova/INFN), M. Pepe (Perugia/INFN), L. Perini (Milano/INFN), J. Pinfold (Alberta), R. Pordes (FNAL), S. Rolli (Tufts), T. Sasaki (KEK), L. Servoli (Perugia/INFN), R.D. Schaffer (Orsay), M. Sgaravatto (Padova/INFN), T. Schalk (BaBar), J. Shiers (CERN), L. Silvestris (Bari/INFN), G.P. Siroli (Bologna/INFN), K. Sliwa (Tufts), C. Stanescu (Roma3/INFN), T. Smith (CERN), C. von Praun (CERN), E. Valente (INFN), I. Willers (CERN), R. Wilkinson (Caltech), D.O. Williams (CERN)

20 September 1998

Executive Summary

The MONARC project will attempt to determine which classes of computing models are feasible for the LHC experiments. The boundary conditions for the models will be the network capacity and data handling resources likely to be available at the start of and during LHC running.

The main deliverable from the project will be a set of example "baseline" models. The project will also help to define regional centre architectures and functionality, the physics analysis process for the LHC experiments, and guidelines for retaining feasibility over the course of running. The results will be made available in time for the LHC Computing Progress Reports, and could be refined for use in the Experiments' Computing Technical Design Reports by 2002.

The approach taken in the Project is to develop and execute discrete event simulations of the various candidate distributed computing systems. The granularity of the simulations will be adjusted according to the detail required from the results. The models will be iteratively tuned in the light of experience. Simulation of the diverse tasks that are part of the spectrum of computing in HEP will be undertaken, and a simulation and modelling tool kit will be developed to enable studies of the impact of network and data handling limitations on the models.

Chapter 1: Introduction

The LHC experiments have envisaged computing models involving hundreds of physicists doing analysis on petabytes of data at institutions around the world. CMS and ATLAS also are considering the use of regional centres, each of which could complement the functionality of the CERN centre. The use of these centres would be well matched to the worldwide-distributed structure of the collaboration. They are intended to facilitate the access to the data, with more efficient and cost-effective data delivery to the groups in each world region, using national networks of greater capacity than may be available on intercontinental links.

The LHC models encompass a complex set of wide-area, regional and local-area networks, a heterogeneous set of compute- and data-servers, and a yet-to-be determined set of priorities for group-oriented and individual demands for remote data. Distributed systems of this scope and complexity do not yet exist, although systems of a similar size to those foreseen for the LHC experiments are predicted to come into operation by around 2005 in large corporations.

In order to proceed with the planning and design of the LHC computing models, and to correctly dimension the capacity of the networks and the size and characteristics of regional centres, it is essential to conduct a systematic study of these distributed systems. The MONARC project therefore intends to simulate and study network-distributed computing architectures, data access and data management systems that are major components of the computing model, and the ways in which the components interact across networks. MONARC will bring together the efforts and relevant expertise from the LHC experiments and R&D projects, as well as from current or imminent experiments already engaged in building distributed systems for computing, data access, simulation and analysis.

The primary goals of this project are:

To determine which classes of models, and modes of distributed analyses, are feasible, according to the network capacity and data-handling resources likely to be available at the collaborating sites.
To specify the main parameters that characterise these classes of models.
To produce example baseline models which fall into the "feasible" category.

As a result of this study, MONARC will deliver a set of tools for simulating candidate computing models of the experiments, and a set of common guidelines to allow the experiments to formulate their final models.

Distributed databases are a crucial aspect of these studies. The RD45 project has developed considerable expertise in the field of Object Oriented Database Management Systems (ODBMS). MONARC will benefit from this experience and cooperate with RD45 in the specific areas where the work of the two projects overlap. MONARC will investigate questions which are largely complementary to RD45, such as network performance and prioritisation of traffic, for a variety of applications that must coexist and share the network resources.

Chapter 2: Objectives

A set of common modelling and simulation tools will be developed in MONARC. These tools will be integrated in an environment which will enable the LHC experiments to realistically evaluate and optimise their physics analysis procedures and their computing models, basing them on distributed data and computing architectures. Tools to realistically estimate the network bandwidth required in a given computing model will be developed. The parameters that are necessary and sufficient to characterise the computing model and its performance will be identified. The methods and tools to measure the model's performance, and to detect bottlenecks, will be designed and developed, and also tested in prototypes. This work will be done with as much as possible co-operation with the present LHC R&D projects, and current or imminent experiments.

The final goal is to determine a set of feasible models, and to provide a set of guidelines which the experiments could use to build their respective computing models.

The main objectives leading to this goal are:

To identify the crucial parameters of computing models, collect information about those parameters and design, plan and execute the necessary measurements when they are not already available.
To develop simulation and modelling tools to enable the experiments to evaluate their computing models.
To determine the necessary infrastructure (network capacity, CPU, storage, manpower) needed to implement the baseline models.
To assess the major components and their behavioural characteristics relevant to the performance of the distributed computing system. Parts related to the ODBMS will be investigated in collaboration with RD45.
To investigate the impact of varying degrees of network saturation on the overall system performance, using the baseline models as examples.
To extract the common architectural features of the viable distributed computing systems, including their components, linkages and functional characteristics.

Chapter 3: Workplan

3.1 Scope

MONARC aims to study analysis models and architectures suitable for LHC experiments, in order to contribute to their computing models in time for the Computing Progress Reports (CPR) that are due around the end of 1999.

The project involves collaboration not only from the LHC experiments, but also from other HEP experiments preparing to run in the near future, such as BaBar and COMPASS. These experiments are going to develop expertise in many of the computing-related fields of interest for LHC. MONARC will interact also with other teams: e.g. RD45, the ICFA network group, the HPSS teams and the GIOD project.

Although the manpower for the project is mostly provided by the collaborating institutes, the project also requires a significant manpower contribution from CERN:

To facilitate knowledge-sharing.
To develop and measure aspects of the Computing Models related to the CERN-centre architecture
To provide central support for critical roles such as the "modeller" (see chapter 6: Resources for more detail).

3.2 Interaction with other experiments and projects

Pre-LHC Experiments

BaBar is the first large HEP experiment relying on OO software and on Objectivity/DB for analysis and data management. Some geographical distribution of the analysis and data storage is envisaged in BaBar, albeit on a more limited scale than in some of the LHC experiments. It is evident that the experience BaBar will gain in these fields will be extremely useful to MONARC. On the other hand BaBar is interested in following the work of this project on the use of distributed object databases, on the modes of cooperative work among regional centres, and related issues.

The interest in COMPASS is directed more towards issues related to data-production services at a single site.

The Fermilab Run 2 experiments, CDF and D0, may provide useful information on more traditional modes of data access and analysis. Channels for regular contacts with these experiments will be set up as soon as this project is approved, taking advantage of the fact that several MONARC members are also members of these collaborations.

RD45

The collaboration between MONARC and RD45 is already being put in place, and the two projects are scheduling regular joint meetings.

The role of MONARC will be to specify the object database performance parameters and related system characteristics that are critical for the computing model, as reflected in the results of the simulations. RD45 experts will be asked to provide a quantification of the relevant performance figures and critical parameters. Such values are expected to be partly available from their present experience, and partly to be obtained with specific new measurements. When new measurements are needed the two projects will agree on how they will be performed and by whom. A general guideline is that MONARC will especially perform the measurements concerning the behaviour of Objectivity/DB connected over wide area networks.

HPSS

Work on HPSS is going on at many computing centres, such as CERN, Caltech, SLAC, FNAL, and CCIN2P3. Some kind of high performance mass storage system has to be coupled to Objectivity/DB for effectively managing the physical storage media on which the data will be recorded. HPSS is currently the most promising option available. MONARC is aware of the tests of HPSS planned by the CERN PDP/IT team with ALICE, and will follow with interest the evolution of this work in the field of mass storage management.

GIOD

GIOD is a joint project (Caltech-CERN-HP) on Globally Interconnected Object Databases. It investigates the usability, scalability and portability of LHC experiments' prototype object oriented codes, in a hierarchy of large servers and medium/small client machines, with fast LAN and WAN connections.

There are clearly significant overlaps between GIOD and MONARC studies. Fruitful exchanges between the two projects have already started, based on the fact that several members of MONARC are also participating in GIOD.

Technology Tracking

In view of the preparation of the CTP's of LHC experiments, CERN/IT activated a technology tracking team to formulate performance and costs estimates of the relevant computing technologies, projected to the time of LHC commissioning. The results of the work of this team should be updated, as this information is a basic building block for the MONARC models.

Existing Computing Centres

Existing computing centres such as CERN, Caltech, SLAC, FNAL and CCIN2P3 may provide useful data to help define the most relevant set of parameters which characterise such centre (Network, MTBF, availability, manpower, organisation, interaction with disperse clients, setup of policies of internal versus external usage, foreseen evolution paths, etc.). Moreover some of them already act as regional centres for HEP experiments, e.g BaBar.

ICFA Network Task Force

The ICFA Network Task Force (NTF) has recently completed a study on the wide area network needs of the HEP community, and the evolution of these needs between now and LHC startup. They have also studied trends in network performance/cost. This information will be of substantial benefit in gauging the balance between networking and other resources within the overall distributed computing system. Members of this Task Force are continuing their tracking of network technology trends and HEP requirements, will which be a valuable source of information throughout the course of this project. The former NTF Chair and the Chair of its Requirements Working Group are members of MONARC.

3.3 Working methods

The working methods to be employed in MONARC are largely determined by features of the project's structure:

The geographical dispersion of the collaboration: many people in institutes located throughout the world.
The need to integrate information from different fields, both within the project and outside it, concerning:

The design of the Analysis Process (the major activities and steps in the physics analysis task; the priorities among the activities of users and groups; the data access patterns; volume management; response and turnaround times).
The architecture of centres and network infrastructure (with reasonable projections of the performance and costs of the technologies that will be available at LHC run time).
The configuration and performance of ODBMS and HPSS.
The simulation and modelling tools.

The need to iterate over modelling cycles, starting from simplified models and then modifying, refining, and evaluating them in order to understand their critical parameters

MONARC working methods will embody the following principles:

Standards.

As far as possible rely on available "standards" for tools, interfaces, methods, etc.

Formality.

Systematic procedures will be followed in producing tools, documents and in validating and exchanging results. Some central support and web services will be established.

Decomposition.

The problems will be broken up into parts that can be treated separately by groups that are small and connected enough for efficient work.

Integration.

The results of the works separately obtained through "decomposition" should be readily available for the integration needed for model development.

Iteration.

The activities will be developed by stepwise refinements, in relatively short cycles.

The principles and requirements stated above lead naturally to a collaboration structure based on Working Groups and on a Steering Group. The Steering Group assures the integration of the various tasks that are pursued in the Working Groups.

3.4 Working Groups and Steering Group

The Working Groups are:

Simulation and Modelling
Site and Network Architectures
Analysis Process Design
Testbeds and Measurement of Critical Parameters.

The interplay between the different working groups and their activities, and the need to coordinate the decomposition/integration steps in the various iteration cycles, require that a detailed schedule be established to synchronise the relevant tasks. This schedule is therefore explained in section 3.5, Phases of the Project; chapter 4 details the tasks and subtasks and a summary of the milestones is given in chapter 7: Schedule.

The steering group is composed of the chairpersons of the working groups together with the spokesman, the project leader and representatives of regional centres (See chapter 9).

3.5 Phases of the Project

This PEP proposes a project workplan divided in two phases:

Phase 1, duration 9 months:

Define a set of models.

Complete a first cycle of simulations for a chosen set of models, making use of evaluations of HPSS and ODBMS performance, and of preliminary assessments of the architecture and resources of the Centres, as well as the network infrastructure.

Phase 2, duration 6 months:

Perform further simulation cycles to establish the baseline models and the related guidelines for feasible computing models.

Refine the modelling tools and accomplish the development of the toolkit.

Deliver the project contributions in time for the CPR.

The Phase 3 mentioned in the PAP is now more clearly envisaged as a further R&D project, aimed at prototype designs and test implementations of the computing models. It will be the natural continuation of this project; it is however definitely separated from it in terms of deliverables. It should start at the completion of Phase 2 and contribute to the CTDR.

The level of detail that can be reached in planning the work is obviously different for Phase 1 and for Phase 2, as the actual planning for Phase 2 will largely depend on the outcome of Phase 1.
In the following the workplan for Phase 1 is presented in some detail, while for Phase 2 only a summary view is given.

The workplan for Phase 1 is organised in three sub-phases

Phase 1-A, duration 2 months:

This is a setup phase in which the following activities will be carried out in parallel:

Define first computing system models.
Setup a first design of the analysis process.
Evaluate and choose modelling and simulation systems.
Set up a maintenance system and a user environment for the simulation software.
Configure the testbed systems.
Determine a range of parameters characterising the

ODBMS performance
Determine a range of parameters characterising the

ODBMS performance
network load and infrastructure
architecture and resources of the centres

Organise the storage of the relevant information.

Phase 1-B, duration 3 months:

This is a startup phase with three activities going on in parallel:

Choice of a set of parameters for the first model study
Setup of federated Objectivity/DB on testbed systems.
Simulation startup.

This last point will entails a refinement of the analysis process design and includes:

Validation of chosen tool(s) using a running experiment as the basis for an "existing model"
Formulation of a first detailed model to be simulated.

This first model has to be chosen mainly on the basis of being a good starting point for learning the simulation technique and the critical parameters. It could be rather different from the "baseline models" envisaged as deliverable of the project.

Coding and testing of this first model study 1 (MS1)
MS1 simulation running and analysis of results
MS1 result-validation, and spot-checking using testbeds.
Discussion of results with LHC experiments.
Conclusions on MS1 in preparation for phase 1-C.

Phase 1-C, duration 4 months:

This phase is the first cycle of modelling, to be performed on a set chosen models "MS2-MSn". The modelling steps ( Code, Run, Analyse, Evaluate) will be preceded by the selection of the range of Models ( analysis processes and system input parameters ), and followed by testbed validation of key results.

In parallel with the simulation and modelling, and interacting strongly with it, the following activities will be pursued:

Refine models and coding for centres, networks, ODBMS, etc.
Clarify questions of ODBMS performance, using the testbed for measurements.
Refine the analysis process design

At the end of the modelling cycle, the results achieved will be:

Identification of key parameters
Identification of performance bottlenecks
First classification of models
Elaboration of first evaluation criteria for Analysis Processes and Models.

The detailed workplan for Phase 2 will be provided in a progress report of this project which will be delivered at the conclusion of Phase 1, in mid-1999.
In this phase, two or three cycles of simulation/modelling will be performed. Here a summary is given of the items that will be addressed:

Build refined model-set including HPSS, ODBMS, network and workload behaviour.
Develop a full set of evaluation criteria
Evaluate strategies for prioritising the different activities within the analysis
Refine the code
Refine the analysis process
Implement caching strategies for data transport, recomputation, replacement and deletion
Provide comparisons between models with different degrees of centralisation
Verify key features of baseline model simulations using the testbeds.

At the end of Phase 2, the deliverables of this project will be:

Specifications for a set of feasible baseline models.
Guidelines for the collaborations to use in building their computing models
A set of modelling tools to enable the experiments to simulate and refine their computing models.

3.6 Scope Limitations

This project is targeted at LHC experiments.

Many aspects have however broader interest and applicability.

This project will not seek to define or promote a single preferred Analysis Model, nor a single preferred architecture. The responsibility of such choice will stay with the experiments
This project will not implement any full scale prototype, and the level of detail of models will not necessarily be sufficient for such an implementation. The goal of this modelling is allowing the choice between basic topologies and architecture and the selection of the main features of the analysis processes.

As outlined in Section 3.5, the full scale prototyping is envisaged for Phase 3, which is seen as a further R&D project.

3.7 Assumptions and Pre-Requisites

This project assumes the reconstruction and analysis software of the LHC experiments will be based on the object orientation (OO) paradigm.
This project assumes the use of Objectivity/DB for LHC data handling. This is the current working assumption made by the LHC experiments; use of a different ODBMS would result in differences in the

Structure and behaviour of the collection of databases
I/O performance
Schema

Such differences could significantly affect the outcome of this project.

This project assumes as a baseline a distributed analysis process based on Regional Centres.
This project requires from the start the services of a Modeller (see Chapter 6), located at CERN.

Chapter 4: Task Definitions

4.1 Overview

The major tasks are matched to the Working Groups listed in Section 3.4.

For each task a summary is given of the manpower resources available in the collaborating institutes, and of the current request.

4.2 Task 1: Simulation and Modelling

Realistic simulation and modelling of the distributed computing systems are the most important tasks in the first phase of the project. The goal is to be able to reliably model the behaviour of the system of site facilities and networks, given the assumed physical structure of the computer systems and the usage patterns, including the manner in which hundreds of physicists will access LHC data. The hardware and networking costs, and the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, are the main metrics that will be used to evaluate the models. The goal is to narrow down a region in this parameter space in which viable models can be chosen by any of the LHC-era experiments.

The planned research can be divided into the following subtasks, where the activities are expected to follow an iterative approach, with a complete cycle of the design-development-modelling-validation steps for every model studied, and every simulation tool used.

4.2.1 Subtask: Survey existing modelling tools

Ideally one would like to use a small number (preferably 2-3) tools to be able to cross-check the results. There exist a number of packages, for example, SoDA, MODNET, SES, COMNET, Ptolemy, Simple++, PARASOL; some of which have been looked at by various groups within MONARC. This work should be pursued vigorously, with a recommendation by the end of 1998. SoDA, a simulation environment developed by Christoph von Praun at CERN IT, is at present the leading candidate to become one of the chosen packages.

4.2.2 Subtask: Use the tools for coding the models which MONARC will explore

It is essential to decide on the appropriate level of modelling complexity. This work should start immediately, as it must run in parallel with the task of development of the modelling tools. We anticipate that the models will include sufficient details of data transfers from the disk to CPU, hardware configurations, complex network connections with varying availability of bandwidth depending on geographical topology, time and varying level of quality of service implementations. It is also essential to develop, as well as possible, models for data access patterns and analysis patterns. Here, the importance of input from physicists involved in experiments being part of MONARC cannot be overestimated. Also, experience from current, or near future, large statistics HEP experiments should be examined.

4.2.3 Subtask: Develop modelling packages or a combination of existing tools

Significant development work might be required to extend the existing SoDA class libraries in order to be able to describe the models which will be simulated. The goal is to have an advanced set of simulation tools by Spring of 1999.

4.2.4 Subtask: Run simulations of the coded models

This subtask will involve detailed simulations performed with the adopted set of tools and agreed upon different sets of input parameter values, in order to explore meaningfully the multidimensional parameter space of variables which describe the computer system models. The goal is to deliver to the experiments first reliable results by the Summer of 1999.

4.2.5 Subtask: Validate simulation results on testbeds

This important step is anticipated take place in parallel with the first implementation of the models. We expect preparations for this subtask to begin almost immediately. Coding significant patterns of data handling in existing experiments and simulating them will be a first validating step.
Designing the test-bed measurements with which one can verify the results of model simulations, at first simple and then more complex, is of paramount importance. There may be overlap here with designing of the test-bed measurements with which to improve our knowledge of a number of parameters which are needed as input to models of the computing systems.

4.2.6 Subtask: Establish a repository for the MONARC project

Provide and organise a repository to contain relevant information

Parameters
Models
Metrics
Experimental results and set-up
Models and recommendations
Tools

4.2.7 Resources

We require at least 100 person-months for this task. The manpower currently available for this simulation task amounts to 75 person-months (Bologna, Caltech, Milano, Perugia, Tufts). In order to provide the remaining manpower required for this activity, we are requesting a major contribution of 18 person-months from CERN, and 40 person-months from the US.

4.2.8 Milestones and Schedules

survey the existing simulation tools (by December 1998)
choose a small number of simulation packages with which to continue development of the final set of tools to perform the task (end by December 1998)
design the computer system models, agree on the level of complexity of the first-round models (end of February 1999)
develop models of the analyses and data access patterns (end of February 1999)
develop and build the first-round models (end of March 1999)
develop the validation capabilities (end of March 1999)
modelling of the first-round models (end of May 1999)
validation of the first-round models (end of May 1999)
design of second-round models (end of June 1999)
develop and build the second-round models (end of June 1999)
modelling of the second-round models (July-August 1999)
validation of the second-round models (July-August 1999)

4.3 Task 2: Site and Network Architecture

This task addresses the issues of hardware and network architecture of the distributed computing systems to be modelled. In general this task will provide information on architectures used at major HEP computing centres for previous and current generations of experiments, of plans of major centres for future experiments, and of technology and cost trends for the major components (CPU, disk, mass storage and networks) of potential distributed computing systems. This information will be fed into the model simulations so that models can be based on reality (both technologically and sociologically), so that models can be optimised based on expected costs, and so that the dependence of the models on costs and technology projections can be clearly seen. The information will also be used to suggest avenues of study for the testbed task.

4.3.1 Subtask: Survey of existing computing architectures

Descriptions of computing architectures used by current experiments should be prepared, concentrating on LEP, HERA, the FNAL Collider, the large fixed target experiments at CERN and FNAL. Architectural descriptions should include:

Central CPU for computing intensive applications
Central CPU for I/O intensive applications
Central CPU for data analysis
Central disk and mass storage facilities
Distributed CPU and disk (still at central site)
Distributed CPU, disk and mass storage at remote sites
Desktop systems
Local network bandwidth
Wide area network bandwidth
Discussion of how computing tasks are partitioned among various systems in a distributed architecture

Deliverables:

Reports and diagrams of existing computing architectures

Schedule:

to be completed by January 1, 1999

4.3.2 Subtask: Survey of planned computing architectures

Similar descriptions as in subtask 4.3.1 should be provided for the plans for meeting the needs of major upcoming experiments:
Near term experiments: BaBar, BELLE, CDF, D0, HERA-B, RHIC
Later experiments: LHC

Deliverables:

Reports and diagrams of planned computing architectures

Schedule:

to be completed by April 1, 1999

4.3.3 Subtask: Survey of potential regional centres and proposed architectures

Potential sites for LHC regional centres should be identified and surveyed as to plans for hardware deployment and personnel support expected to be available. It should be recognised that there are likely to be several different styles of regional centres, from comprehensive centres offering large amounts of CPU, disk and mass storage for all stages of the analysis process, to centres specialising in certain components of the full analysis stream. Different amounts of support and different topologies should also be considered and our models must take these differences into account. Surveys should include:

Major HEP labs (existing large computing centres)
Other potential sites, including "distributed" centres in which a number of institutes collectively maintain their resources in a coherent way for the common use of all of their member physicists.

Deliverables:

Reports and diagrams of architectures and potential hardware and manpower deployment for candidate regional centres.

Schedule:

to be completed by April 1, 1999

4.3.4 Subtask: Technology evaluation and cost tracking

Realistic models require up-to-date estimates of hardware cost and performance. This subtask will require both market tracking and measurements of hardware components as they are acquired by participating institutes in the MONARC collaboration, in the categories of:

CPU;
Disk;
mass storage.

Deliverables:

Reports and spreadsheets showing price/performance for different computing model components and evaluation reports on new and unusual hardware solutions.

Schedule:

ongoing

4.3.5 Subtask: Network performance and cost tracking

Networks are a critical component of all distributed computing models, and availability is influenced both by technological and external forces. The most accurate projections for network performance are a crucial input to any distributed architecture models. Measurements of current performance and projections of future availability and cost should be acquired in conjunction with other groups, in the categories of:

International networks
National networks
End-user network access,

Deliverables:

Reports and spreadsheets showing price/performance of end-user network access, local and wide area networks, plausible projections into the future, and analyses of potential political and sociological issues affecting network availability.

Schedule:

ongoing

4.3.6: Resources

The manpower currently committed to this task amounts to 39 person-months (Caltech, CERN, FNAL, Milano, Oxford, Perugia, Roma, Tufts). The additional manpower required is 8 person-months (from CERN).

4.4 Task 3: Analysis Process Design

This task aims at the definition of a few different schemes of "the way of doing analysis" among the many possibilities afforded by new (and perhaps unforeseen) computing technologies. The task can be addressed with two different and complementary approaches:

Abstraction.

An effort is made to establish a diagram of the analysis process where all the aspects are taken into account in an abstract way, together with the relations between them (e.g. calibration, partial re-processing, short runs for program and analysis setting up).

Resource based approach.

Only the aspects likely to absorb more resources are taken into consideration, and these aspects are subjected to fast iteration on paper for directly optimising the resource-consumption.

Both approaches will be pursued and combined suitably in this project.

The task will select some different scenarios for the analysis process, taking into account:

user requests
experience from "Analysing" experts
projected models from imminently running experiments
priorities to be settled at due time for the analysis of data
constraints imposed by the technology and the budget possibilities

In addition, this task will address the definition of a set of values able to characterise the analysis processes in conjunction with the different computing model architectures, since the correlation is very strong.

In summary, this Task will identify possible analysis processes defining where the raw-data reside, how the reconstructed objects will be produced and stored, how and where the selection of relevant data for the analyses will be accessed, and finally where and how the physicists will go through their analyses.

All of the above will have to be coded and parameterised in the simulation.

A very simple example can better illustrate the nature of the task. Given a PByte/year of data, an analysis group goes through the tag database (produced quasi-online) selecting 1% of reconstructed objects created by the "offline reconstruction". They need to go through the full 1% sample once per month, producing an analysis object sample (reduced in number of events and in size) which contains relevant analysis information, refined by experience and results from previous steps. Each member of the group needs access to the selected analysis objects, at a time which fits their personal work schedule, to extract results and personal sub-samples.

The timings, workaround, consistence, residence, storage, CPU needs and efficiency of such an example will be addressed by this task which will try to schematically describe some different (and affordable) possibilities.

For the analysis processes retained for further study and simulation, either initially or later during the project, clear diagrams will be produced to show the sequential, parallel and iterative steps.

The task will be performed through the following work packages or subtasks which embrace some or all of the phases described in Chapter 3: Workplan

4.4.1 Subtask: Analyse contemporary production and analysis procedures

The aim of the task is to extract information from the current experiments' analysis processes. For example, the number of concurrent users doing analysis in running experiments will give some constraint (scaled to LHC) on the range of parameters to be simulated. Other information such as the number of analysing groups and their dispersion may be investigated and recorded.

Deliverables:

A range of constrained parameters defining the analysis groups, the members of the groups, the number of analyses, the number of concurrent users, the distribution of physicists in the groups.

Milestones:

The task has to be completed during phase 1B, with a possible follow-up during phase 2.

Duration/Schedule:

A working relationship is needed with other experiments, particularly those at hadron colliders. The duration/schedule is connected to the milestones previously stated, and to the evolution of the processes used by our colleagues at running experiments that may lead us to modify some of the simulations in this project.

4.4.2 Subtask: Identify user requirements

The aim of this task is to identify a range of schemes of users' needs while performing data analysis. Some features are the response time of a database query, the ability (and willingness) to make queries locally or remotely (regionally or centrally) and the tools and methods the user will adopt.

Deliverables:

A set of constrained parameters defining the steps followed by the users, individually and in a variety of group-oriented activities, while doing analysis. The parameters also will define the load presented to the computing and data handling facilities, and the networks, as the users carry out their analysis.

Milestones:

First milestone during Phase 1B for initial simulation.
Second milestone during Phase 1C to refine different (some) ranges of parameters.
Third milestone during Phase 2 to settle final range of affordable users' requests.

Duration/Schedule:

This task ends at the end of the Project, and its schedule is set by the previously-stated milestones. It may be revisited during Phase 3, in a follow-on R&D project.

4.4.3 SubTask: Identify feasible models to be simulated

The aim of the task is to identify a set of analysis processes that will meet the requirements of LHC data reconstruction and physics analysis. The models will be expressed in clear diagrams that specify the input and output data volumes and the frequency of data access, along with the locations of the data handling capability and computing power which are intended to meet the needs. Candidate analysis processes will be selected taking into account the assumed data handling capacity and network throughput assumed at each site, and matching these resources with a specification of where, when and how often each physicist and each analysis group accesses its own samples of selected data. Individuals' and group-oriented activities will have to be prioritised, as part of the overall process specification.

This subtask has a first step, which is to Eliminate Obviously Unfeasible Models. The aim of this first step is to exclude early in the project analysis models that lead to technically unfeasible or clearly unaffordable resources, even when projected to the year 2005. An example would be an analysis process that requires a local desktop storage of hundreds of terabytes, or a network bandwidth of more that a gigabit/sec dedicated to every analysing physicist. Such models are to be dropped without wasting time on detailed simulations.

Deliverables:

A set of feasible analysis processes for LHC treatment of data, suitable for simulation according to the computing model architectures defined by the present project.

Milestones: The task will spread all over the duration of the project.

A First milestone is foreseen for Phase 1B when an analysis process for first simulation is needed.
A Second milestone is due for phase 1C when a set of analysis processes is needed to understand the possible spread of models.
A Third milestone is due during phase 2 in order to refine the analysis process possibilities taking into account the constraints given by the simulation performed and by the technology and budget limitations.
Finally other milestones can be foreseen just before the end of the Project or, eventually, for a possible extension to phase 3.

Duration/Schedule:

See Workplan.

4.4.4 Subtask: Elaborate policies, priorities and schedules for different models

The aim of the task is to study, and then define different schemes of use of the LHC collaborations' central and distributed resources for computing and data handling. The schemes will include priority-assignments for each of the classes of activity that make up the data analysis, in an attempt to ensure that all components of the data analysis are completed as needed, in an acceptably short time. Policies and relative priorities for the use of regional centres by users from other regions, for the use of network bandwidth, and for access to remote data-handling facilities will be expressed parametrically, and methods that relate the throughput or the "rate of doing work" to the priority-profiles will be developed in the course of this subtask.

The relative prioritisation schemes will include components that are driven by immediate physics needs, as well as the ongoing throughput requirements for organised high-priority activities (such as the first-round reconstruction of the raw data). For example, when an analysis topic is granted priority over others on physics grounds, the coordinating analysis group must have the methods and be granted the authority to set priorities on the collaboration's resources. The response of the system to the new high-priority task should be to complete the task within a specified (short) time, without undue disruption of the other analysis activities. The aim of the task is to study, and then define, different schemes of use of the LHC collaborations resources (central and distributed) in order to guarantee that prioritised analysis will obtain what they need in due time. Regional centre access, use of the network and database distribution are the key parameters for this task. For example, when an analysis topic is granted priority over others on physics grounds, the coordinating analysis group must have the methods and be granted the authority to set priorities on the collaboration's resources and to understand and predict the scheduling needed to achieve the results in time.

Deliverables:

A set of rules to access collaboration's computing resources, together with methods to implement them. The rules have to be "tuned" to different analysis resources and to different computing model architectures.

Milestones: The task will end with the project, since the coordination and management of resources is an item intrinsically embedded into the analysis process.

A First milestone can be foreseen for Phases 1B-1C, when some different schemes of Processes have to be simulated on distributed resources.
A Second milestone is certainly due for Phase 2 when priorities and schedules have to be taken into account to get reasonable simulation results.
Some more refined milestones will be necessary for the evolution to Phase 3.

Duration/Schedule:

According to previous milestones and to the evolution of distributed database management systems during the project's life.

4.4.5 Subtask: Identify key parameters to evaluate simulated models

The aim of the task is to establish those parameters which have the greatest effect in determining whether the overall Model is feasible, both in terms of the resource requirements and time-to-completion of the components of the data analysis. Obvious examples are the network bandwidth, computing power, data handling capacity and the times required to return data-samples of varying sizes at each site. Less obvious examples are the means of responding to peak demands, the efficiency as a function of the load for various system components, the character and flexibility of the prioritisation mechanisms, and trade-off procedures for maximising the (priority-weighted) throughput of the system when all demands cannot be met simultaneously.

This task is complicated by the fact that realistic political constraints such as policies for the use of computing resources by remote users (from other countries or world-regions), as well as the technical parameters that determine the ideal performnance of the system have to be taken into account.

The task will attempt to produce a relatively small set of parameters that contain most of the information required to determine if a given Model is feasible, and to evaluate its effectiveness in satisfying the needs of LHC data analysis, relative to the computing and manpower resources required.

Deliverables:

A set of parameters to evaluate the simulated analysis models in terms of their feasibility and relative effectiveness.

Milestones:

A First milestone stating a preliminary set of parameters is needed during Phases 1B/1C.
A Second milestone is due during Phase 2 to specify the results of the simulations in a measurable way.
A Third (and probably not definitive) milestone will be requested for Phase 3 of the Project.

Duration/Schedule:

According to previous milestones.

4.4.6 Resources

The manpower currently committed by collaborating institutes for this task amounts to 30 person-months (Birmingham, Bologna, Caltech, FNAL, Milano, Tufts).

4.5 Task 4: Testbeds and Measurement of Critical Parameters

In order to accomplish the analysis of computing models and to evaluate the impact of data distribution schemes, testbeds have to be implemented, in order to measure key parameters.

The task can be subdivided into the following subtasks, for their milestones and schedules see the workplan.

4.5.1 Subtask: Define scope and configuration of testbeds

This subtask will implement several "use- cases" based on different configurations for the distributed computing model, and data access patterns related to different functionalities such as reconstruction and analysis.
This subtask requires preliminary information to be collected in collaboration with the "Analysis Process Design" and the "Site and Network Architecture" Working Groups:

Input information from RD45 with respect to the ODBMS issues
From experiments, such as ATLAS and CMS, distribution requirements and map them to possible architectures based on the previous point.
From experiments such as BaBar information about actual performances of distributed databases.
HPSS configurations and performances

This SubTask will define which Testbeds will be implemented to evaluate:

Data access patterns
The degree of data replication permitted in the federated database
The distribution of CPU and storage resources

4.5.2 SubTask: Implementation and operation of testbeds

Testbeds will involve many institutes running the appropriate tests. There will be defined procedures for managing the global setup and facilitating test bed execution in remote sites. This global setup includes the configuration and management of hardware and software in all the involved regional centres.

This subtask includes:

Creation of scripts which automate the test bed installation/execution in remote sites.
Set up access mechanisms to real and simulated data in an Objectivity/DB database (as is currently available from the CMS test beam runs and the GIOD project respectively).
Set up a dedicated ODBMS at CERN (on the requested testbed system), and in some outside sites, in collaboration with the GIOD and RD45 projects.
Set up resources to use the mass storage management (HPSS) facilities at CERN, in collaboration with the IT/PDP group.

4.5.3 Subtask: Verify key simulation parameters

This subtask is in collaboration with the other working groups and RD45.

Verify, on the testbeds, the key parameters obtained in the simulation phase.
Study ODBMS-related parameters, including overheads and network protocol latencies, using the testbeds.
Study parameters related to the network-link bandwidth and the topology of the connections between regional centres.

4.5.4 Resources

The manpower currently committed by the collaborating institutes for testbed measurements amounts to 60 person-months (Bari, Birmingham, Bologna, Caltech, CERN, FNAL, Genova, Milano, Padova, Perugia, Pisa, Roma, Tufts).
The extra manpower required from CERN is 6 person-months, for the setting up, maintenance and operation of a central testbed facility and defining configurations which can be mostly replicated in the outside institutes.

Chapter 5: Deliverables

MONARC will deliver:

Specifications for a set of feasible models.
Guidelines for the LHC collaborations to use in building their computing models.
A set of modelling tools to enable the LHC experiments to simulate and refine their computing models.

Chapter 6: Resources

More than 50 physicists and computing experts have joined MONARC, and committed a significant fraction of their time. Many others have expressed interest and are expected to join the project in the near future. Most MONARC members are also involved in other activities within Atlas, CMS, LHCb etc. The total manpower provided by the present participants is estimated to be 200 person-months.
The Caltech, Tufts, Milano, and Bologna groups are actively searching for professionals or young people to be recruited to devote most of their time to the technical work of the project.

Manpower totalling 36 person-months is requested from CERN, for specific tasks for which CERN can provide the most efficient solution.

Manpower Description

18 Design and implementation of the models in the simulation plus development and support of the simulation tools

6 Setup and maintenance and operation of the CERN-based testbed system and the related software tools

8 Analysis and design of the CERN-site architectures

4 Analysis of networks

36 Total manpower (person-months) requested from CERN

200 Availability of MONARC participants

Manpower	Description
18	Design and implementation of the models in the simulation plus development and support of the simulation tools
6	Setup and maintenance and operation of the CERN-based testbed system and the related software tools
8	Analysis and design of the CERN-site architectures
4	Analysis of networks
36	Total manpower (person-months) requested from CERN
200	Availability of MONARC participants

For the operation of the MONARC project, computing equipment, travel funds and probably software licenses are needed. The biggest investment is related to equipment, especially for the testbeds. Most of the needed equipment is available in the institutes, partly being acquired specifically for MONARC and partly being reused from or shared with other activities.
Both in Italy and in the US, more than 200 GBytes of disk space will be devoted to MONARC Objectivity/DB storage, with read/write speeds expected to reach 100 MBytes/sec. All the groups taking responsibility in the testbed task have one or more workstations, or PC farms that can be largely devoted to the MONARC work. Caltech and Milano have access to Exemplar systems (at CACR and at CILEA respectively) which can be used for short term tests.

Funding at the level of 140 kCHF is requested from CERN, for the duration of the MONARC project. This includes 40 kCHF to indicate the cost of commercial discrete event simulation software in case the studies in the start-up phase (section 3.5) determine such a purchase is needed.

Funding
(kCHF) Description

80 Capital cost of the CERN based testbed system and development systems for modelling and simulation running

20 Travel money

40 Potential cost of commercial discrete event simulation software

140 Total funding (kCHF) requested from CERN for the duration of phases 1 and 2 of the MONARC project

500 For comparison, estimated value of the dedicated and shared computing facilities outside of CERN

Funding (kCHF)	Description
80	Capital cost of the CERN based testbed system and development systems for modelling and simulation running
20	Travel money
40	Potential cost of commercial discrete event simulation software
140	Total funding (kCHF) requested from CERN for the duration of phases 1 and 2 of the MONARC project
500	For comparison, estimated value of the dedicated and shared computing facilities outside of CERN

Chapter 7: Schedule

In this chapter the main milestones of the MONARC project are summarised. More details on the work flow are given in the Workplan.

MONARC Main Milestones

December 98:

Choose Modelling Tools

December 98:

Hardware setup for testbed systems

January 99:

Validate the chosen tools with a Model taken from an existing experiment

March 99:

Complete first technical run of Simulations of a well defined LHC type Model

March 99:

Start measurements on testbed systems

April 99:

Choose the range of models to be simulated

June 99:

Completion of the first cycle of simulation of possible Model for LHC experiments. First classification of the Models and first evaluation criteria.

June 99:

Progress Report with detailed workplan for Phase-2

August 99:

Completion of the coding and validating phase for second-round Models

November 99:

Completion of the second cycle of simulations of refined Models for LHC experiments

December 99:

Completion of the project and delivery of the deliverables.

Chapter 8: Risk Identification

The risks facing the MONARC project come primarily from the unknown technology and price evolution from now up to the time when LHC will be running. In this respect the most uncertain areas are:

Network: bandwidth and price
"Tape" mass storage: capacity and price.

Another source of concern is due to the very ambitious statements made in the CTP's about the quality of the analysis environment for every physicist: the full transparency and minimum turnaround time for any query, which are probably unrealistic goals (albeit useful as asymptotic aims).
Realistic user requirements, in view also of a meaningful cost/benefit ratio, will have to be negotiated with the physicists of the experiments.

As the role of the professional "Modellers" is of key importance for the project, the milestone time scale relies on an early, effective contribution from this highly skilled staff.

The schedule is agressive out of necessity, bearing in mind the coming CPRs in late 1999. Even if not all the objectives are fully met in this timescale, the work is necessary for planning LHC computing and will be continuing in some form or other beyond the end of 1999. Such results as they exist will be used for the CPRs, with these being continually refined beyond the publication of the CPRs.

Chapter 9: Management and Organisational Responsibility
The responsibilities for the management of the MONARC project are shared between the Spokesperson, the Project Leader, and the Steering group.

Spokesperson

The project reference person in technical matters: assures the technical direction of the project and the coordination of resources.

Project Leader

The project reference person for the operation of the project: takes care of the relations with LCB and CERN in general; is responsible of keeping the project on track.

Steering Group responsibilities:

Classification of "Baseline Models", based on figures of merit
Coordinate comparative model study cycles
Oversee model evolution
Perform periodic reviews
Advise and help in gathering and coordinating resources

The members of the Steering Group will include: the Spokesperson, the Project Leader, the Chairs of the Working Groups, plus representatives of major computer centres.

The responsibilities which are already accepted are:

Steering Group Function Person Accepting Responsibility

Spokesperson Harvey Newman

Project Leader Laura Perini

Simulation and Modelling WG Krzysztof Sliwa

Site and Network Architecture WG

Analysis Process Design WG Paolo Capiluppi

Testbed WG

Computer Centres

Steering Group Function	Person Accepting Responsibility
Spokesperson	Harvey Newman
Project Leader	Laura Perini
Simulation and Modelling WG	Krzysztof Sliwa
Site and Network Architecture WG
Analysis Process Design WG	Paolo Capiluppi
Testbed WG
Computer Centres

Chapter 10: References

MONARC PAP, June 1998
http://atlasinfo.cern.ch/Atlas/GROUPS/WWCOMP/pap_june30.html

The analysis model and the optimisation of the geographical distribution of computing resources, M.Campanella,l.Perini,INFN/Milan,July 98, MONARC Note 98/1
http://www.mi.infn.it/~cmp/rd55/rd55-1-98.html

ATLAS Computing Technical Proposal,CERN/LHCC 96-43,19 Dec 1996
http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/TDR/html/Welcome.html

CMS Computing Technical Proposal,CERN/LHCC 96-45.19 Dec 1996
http://cmsdoc.cern.ch/ftp/CMG/CTP/index.html

Status report of the RD45 project, 8 April 1998
http://wwwinfo.cern.ch/pl/cernlib/rd45/reports.htm
The RD45 web site is at: http://wwwinfo.cern.ch/asd/rd45/index.html

Simulation of Distributed Architectures(SODA), C Von Praun (follow WWW links for several relevant documents)
http://wwwinfo.cern.ch/pdp/pc/soda/

Objectivity/DB official page
http://www.objectivity.com/

HPSS High Performance Storage Systems project at CERN
http://wwwinfo.cern.ch/pdp/vm/guide/hsm_project.html
HPSS official page http://www.sdsc.edu/hpss/hpss.html

GIOD Globally Interconnected Object Databases, Caltech, CERN, HP Joint Project
http://pcbunn.cithep.caltech.edu/

Status report ICFA Networking Task Force, July 1998. ICFA/98/671
http://nicewww.cern.ch/~davidw/icfa/July98Report.html
and their requirements report http://l3www.cern.ch/~newman/icfareq98.html

PASTA Technology Tracking Team for Processors, Memory, Storage and Architectures
Home page http://nicewww.cern.ch/~les/pasta/run2/welcome.html

NT3 Network Technology Tracking Team have documents on the CS Group web pages
Home page http://wwwcs.cern.ch/

LCB 98-xx

Models of Networked Analysis at Regional Centres for LHC Experiments

(MONARC) PROJECT EXECUTION PLAN

4.2.1 Subtask: Survey existing modelling tools

4.2.2 Subtask: Use the tools for coding the models which MONARC will explore

4.2.3 Subtask: Develop modelling packages or a combination of existing tools

4.2.4 Subtask: Run simulations of the coded models

4.2.5 Subtask: Validate simulation results on testbeds

4.2.6 Subtask: Establish a repository for the MONARC project

4.2.7 Resources

4.2.8 Milestones and Schedules

4.3.1 Subtask: Survey of existing computing architectures

4.3.2 Subtask: Survey of planned computing architectures

4.3.3 Subtask: Survey of potential regional centres and proposed architectures

4.3.4 Subtask: Technology evaluation and cost tracking

4.3.5 Subtask: Network performance and cost tracking

4.3.6: Resources

4.4.1 Subtask: Analyse contemporary production and analysis procedures

4.4.2 Subtask: Identify user requirements

4.4.3 SubTask: Identify feasible models to be simulated

4.4.4 Subtask: Elaborate policies, priorities and schedules for different models

4.4.5 Subtask: Identify key parameters to evaluate simulated models

4.4.6 Resources

4.5.1 Subtask: Define scope and configuration of testbeds

4.5.2 SubTask: Implementation and operation of testbeds

4.5.3 Subtask: Verify key simulation parameters

4.5.4 Resources

(MONARC)

PROJECT EXECUTION PLAN