Data Popularity of the EOS files

The purpose of this working page is to collect feedback and requests about the popularity service looking at the data accessed at CERN inside EOS.

Two major goals should be achieved:

  • validation of the monitoring workflow
    • the monitoring tool must provide the expected answer to known aspects of the EOS system (read-only rates, number of pps test files accessed per hour, etc)
  • collection of the metrics needed to satisfy the monitoring purposes
    • this implies the formulation of well defined questions to be converted in specific data aggregations and DB queries

Brief description of the monitoring workflow

xrootd-popularity-workflow.png

  • Collector of the Xrootd detailed monitoring data
    • Based on UDP packets listener developed by M. Tavel and described in this talk Tadel-UsCmsXrdMon-Lyon-20111122.pdf
    • xrootd monitor configuration: xrootd.monitor all flush 30s mbuff 1472 window 5s dest files (PLEASE CHECK)
  • Messaging System for Grid (MSG)
    • Publish-subscribe model
    • Reduce the number of services collecting the UDP packets
    • Several consumers can access the MSG Broker
  • MSG Consumer to Oracle DB
    • Collects the UDP packets cached in the MSG broker and uploads them in the DB.
      • NB: only read file information is uploaded
  • Web Frontend to expose query results

More details about the popularity service system already developed for CMS and about its extension to the EOS Data can be found in

Data collected for popularity purposes

Not all the data in the udp packets are used for the popularity.

Data currently received in the udp packets are provided when a file is closed, and summarize the activity on that file, from open time to close time.

udp packet content

  • unique_id=xrd-1326472302000000
  • file_lfn=/eos/cms/store/data/Run2011A/Cosmics/ALCARECO/TkAlCosmics0T-v4/000/166/065/58C1BF80-6C8E-E011-AB24-001D09F24024.root
  • start_time=1326472302
  • end_time=1326472302
  • read_bytes=0
  • read_operations=0
  • read_min=0
  • read_max=0
  • read_average=0.000000
  • read_sigma=0.000000
  • write_bytes=0
  • write_operations=0
  • write_min=0
  • write_max=0
  • write_average=0.000000
  • write_sigma=0.000000
  • read_bytes_at_close=642348
  • write_bytes_at_close=0
  • user_dn=
  • user_vo=
  • user_role=
  • user_fqan=
  • client_domain=cern.ch
  • client_host=lxbrg1204
  • server_username=
  • server_domain=cern.chserver_host=lxfsre07a02

popdb content

Only the following data are stored into the DB

  • unique_id=xrd-1326472302000000
  • file_lfn=/eos/cms/store/data/Run2011A/Cosmics/ALCARECO/TkAlCosmics0T-v4/000/166/065/58C1BF80-6C8E-E011-AB24-001D09F24024.root
  • start_time=1326472302
  • end_time=1326472302
  • read_bytes=0
  • read_bytes_at_close=642348
  • write_bytes_at_close=0
  • user_dn=
  • user_vo=
  • client_domain=cern.ch
  • client_host=lxbrg1204
  • server_username=
  • server_domain=cern.chserver_host=lxfsre07a02

fields removed

Here the data not included in the popDb are reported

  • write_max
  • read_sigma
  • read_average
  • read_bytes
  • user_fqan
  • user_role
  • read_min
  • write_operations
  • write_average
  • write_bytes
  • write_min
  • read_max
  • write_sigma
  • read_operations

Monitoring page

The entry point for the monitoring page of EOS Atlas is

http://dashboard28.cern.ch/eosatl

The entry point for the monitoring page of EOS CMS is

http://dashboard28.cern.ch/popdb/xrdpopularity/

We will mainly use the EOS Atlas for this validation. But sometimes it will be useful to compare also with EOS CMS.

Since the development of the GUIs for CMS and for our EOS Atlas test are going in parallel, not necessary the two GUIs will evolve in the same way. Do not expect to find exactly the same information in both GUIs. I will try to keep the symmetry as much as possible.

Since the GUI is under deployment, it could be not accessible all the time.

Validation

DB content

First of all I would make sure that the information collected in the popularity DB is exhaustive or if there are other fields that should be added. Please have a look at the removed fields and at the kept fields and comment about it. Keep in mind that currently only the files accessed in read mode are stored in the popularity DB

Efficiency of the collection workflow

Given that the data collection workflow is based on several steps (udp packets -> MSG Broker -> DB) with services running at each step to extract and handle the information, I'm strongly interested in validating this workflow, to verify that we do not have any inefficiency.

In order to do that we can compare the rate of read files measured with what expected, for the full set of files in EOS and/or for specific files that are systematically read for test purpose.

pps monitor

The number of pps test files accessed per hour in the time range defined by [StartDate, EndDate], by the user dteam001 in read-only mode are shown in http://dashboard28.cern.ch/eosatl/xrdmonplotppstest

It would be interesting to know if EOS people can confirm that the numbers shown are expected. In particular, are the glitches expected and known. Is there any way to check if these glitches really did happen in the EOS side, or if they are a consequence of the popularity collection chain?

For CMS two pps recursive accesses are found and shown in this plot.

http://dashboard28.cern.ch/popdb/xrdpopularity/xrdmonplotppstest

Do they correspond to different test, as we see? In particular

  • pps_dteam accesses files: /eos/ppsscratch/test/slstest-eospps/test-eospps.cern.ch-from-srmmon04.cern.ch-xrdcp.static

  • pps_srmmon accesses files: /eos/ppsscratch/test/slstest-eospps/test-srm-eospps.cern.ch-PPSEOSSCRATCHDISK-4f930089-cfe9-4bb7-83b9-4adf2869cf59

Monitoring Requests

Please put here your desiderata in terms of metrics, aggregations you want to extract the informations you need

  • Access patterns - how much of each file has been read?

  • Access patterns - which files were most popular? (idea: automatic suffix removal to get top-10 most common prefixes - may need to develop algorithm).

See plots on Phillip Zigann's page at ZigannGeneralRequirementsECC and his initial presentation to IT-DSS group. Example: pie chart on percentage of file read by Phillip Zigann:
Zigann-bytesfromfile.png

  • File access count (or used bandwidth) grouped by client network (use DNS to group by DNS domain; group non-resolvable hosts into subnets (class-B or below). Ideally should auto-detect groups (i.e. no preconfigured list; just do top-10 automatically)

-- DomenicoGiordano - 02-Mar-2012

-- SpigaDaniele - 09-Jun-2011

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Zigann-bytesfromfile.png r1 manage 355.0 K 2012-04-23 - 16:01 JanIven pie chart on percentage of file read by Phillip Zigann
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2012-04-23 - JanIven
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback