CREAM Performance/Reliability tests preparation

This page is a collection of notes, up-to-date information on this activity are reported in the CREAM pilot page as

The points mentioned in LCGCEtoCREAMCETransition plan require a performance comparison between CREAM and the lcg-CE. In general, CREAM performance must be >= lcg-CE perfomance, with a few well defined metrics goal. In this document we clarify the performance/reliability tests that will be done within the CREAM Pilot activity (PpsPilotCream) in order to satisfy some of the points mentioned in the transition plan.

In order to satisfy point 'A' of the LCGCEtoCREAMCETransition plan, an identical set of test must be performed on both CREAM and the lcg-CE, and results must be compared. At the same time, choosing some factors(*) (controlled variables), we can study the effect of these factors on the CREAM performance.

Transition plan points to be verified

In this section we list the points of the transition plan that we plan to verify.

A. The CREAM CE should provide at least equivalent functionality and performance as the LCG CE (excluding the ability for users to fork processes directly on the CE)

D.i. The ICE / CREAM job submission chain should be able to meet all performance criteria and otherwise perform at least as well as the WMS / LCG CE submission chain.

J. At least 5000 simultaneous jobs per CE node

K. Unlimited number of user/role/submission node combinations from many VO's (at least 50), up to the limit of the number of jobs supported on a CE node

L. Job failure rates in normal operations due to the CE < 0.1%

M. Job failures due to restart of CE services or reboot < 0.1%

O. Graceful failure or self-limiting behavior when the CE load reaches its maximum (e.g. if a CE node can support only 5000 jobs it must not crash or become unresponsive with more than that)

Parameters (variables) list

  1. LRMS (Batch system): PBS-Torque/Maui, LSF, Condor (work in progress), SGE (work in progress, IFAE).
  2. Submission method: WMS, Direct, Condor-G (work in progress).
  3. Hardware specification
  4. Job type: collection, simple, parametric (it should not affect the CE performance since jobs are delivered to the CE sequentially, shall we check this?)
  5. Submission rate/pattern: constant (define rate), in bulk (define periodicity and rate).
  6. Job length: a few minutes to a few hours
  7. Number of user submitting jobs: at least 50 user/role/VO combinations must be used
  8. Proxy renewal: yes, no.
  9. Delegation: explicit, automatic.
  10. DB, MySQL: on the CREAM host, separated.

Factors (controlled variables) and levels (variables values).

Factors define the parameters that will be varied during the tests and the levels (values) that they will assume. For each factor we can study its effects on the measured metrics.

  1. LRMS: PBS, LSF
  2. Submission method: WMS, Direct
  3. Submission rate: ? 5000 simultaneous job must be handled.
  4. Delegation: Explicit, Automatic.
  5. Proxy renewal: Yes, No.

To be completed

Metrics

In this section we list the so called response variables, that is the metrics that will be measured throughout the tests in order to state the measured performance.

  1. Memory footprint on the CREAM host.
  2. CPU utilization on the CREAM host.
  3. % Jobs Done
  4. % Jobs Aborted
  5. % Jobs not finished
  6. % Re-submsission
  7. Submission rate (that is the rate of submission from the CE to the LRMS)

To be completed


(*) The terminology used and some good methods are presented in R. Jain, "The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling," (Wiley- Interscience, New York, NY, April 1991)

-- GianniPucciani - 30 Mar 2009

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2009-04-02 - GianniPucciani
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback