Castor Monitoring

the purpose of this page is to collect all requirements for additional monitoring in CASTOR. It also includes the GridFTP, FTS and SRM area.

all monitoring information should be sampled in short time intervals, the visualization method regularly updated (max 2 minutes) and all information stored forever, to be able to compute trends.

One of the crucial units is the disk pool, the first following questions are relative to a given disk pool

1. what is the total input and output data data rate to/from a pool ? --> this is already implemented in lemon

2. what is the input and output data rate to/from tape servers ?

3. what is the input and output data rate to/from the client nodes (e.g. lxbatch) ?

4. what is the input and output data rate to/from external sites ?

5. how are the rates split over the different protocols (rootd, rfiod, GridFTP?, xrootd) ?

6. how many files of size X have been read/written during the last Y time interval ?

7. percentage of files (# and total size) have not been read during time interval X ?

8. number of files staged into a pool more than once in time interval X ?

9. age distribution of files in time interval X ?

10. transfer performance histograms for the different protocols per time interval X ?

11. histograms of the achieved compression ratios on tape per time interval X ?

12. which files are currently on disk in a pool ?

13. LSF job throughput numbers, how many jobs have started/finished during a time interval ? per pool queue and with user details, like the batch monitoring

14. 2d view of nodes and throughput over time (identify not working nodes and judge the load-balancing)

15. monitoring presentation of the GB switches. CS has enabled this already 9 month ago, but there are no lemon plots yet. essential for congestion evaluations

16. throughput of SRM requests by type

17. backlog of SRM requests

The Castor operations team has currently a "parallel" lemon facility where some part of the questions are answered and more details are presented : http://castoradm1/new/ http://castoradm1/new/c2svc_class.php?cluster=c2atlast0%25t0perm&detailed=Get%20Detailed20Information&auto_update=Not%20auto%20update&time=0

This needs to be integrated into the official lemon system. The iptables monitoring has started, but is still only local to the disk servers and needs to be accessible in lemon.

A very import view for debugging and load distribution analysis is the state of all servers in a collection over time in a 2d plot, which does not yet exist.

-- BerndPanzersteindel - 03 Apr 2007

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2007-05-30 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback