The block latency analysis

Meetings

For our first meeting (Wednesday 26th November, 10:00am) we will use vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=AGJFEAldmWkm. The goals for this meeting are to organise people, data and process.

Timeline

This analysis needs to be done in time for CHEP 2015, which means we need final plots by about February, not much later than that. Time is not on our side!

Eventually, we want to be able to monitor the latency of transfers continuously in a way that makes sense. How we do that will depend partly on what we learn from this analysis.

Background

Since PhEDEx version 4, more or less, in 2012, we have information stored in PhEDEx on the latency of block transfers and file transfers. Historical information is kept in the t_log_block_latency and t_log_file_latency tables. Their definitions are:
create table t_log_block_latency
  (time_update          float           not null,
   destination          integer         not null,
   block                integer                 , -- block id, can be null if block removed
   files                integer         not null, -- number of files
   bytes                integer         not null, -- block size in bytes
   priority             integer         not null, -- t_dps_block_dest priority
   is_custodial         char (1)        not null, -- t_dps_block_dest custodial
   time_subscription    float           not null, -- time block was subscribed
   block_create         float           not null, -- time the block was created
   block_close          float           not null, -- time the block was closed
   first_request        float                   , -- time block was first routed (t_xfer_request appeared)
   first_replica        float                   , -- time the first file was replicated
   percent25_replica    float                   , -- time the 25th-percentile file was replicated
   percent50_replica    float                   , -- time the 50th-percentile file was replicated
   percent75_replica    float                   , -- time the 75th-percentile file was replicated
   percent95_replica    float                   , -- time the 95th-percentile file was replicated
   last_replica         float           not null, -- time the last file was replicated
   primary_from_node    integer                 , -- id of the node from which most of the files were transferred
   primary_from_files   integer                 , -- number of files transferred from primary_from_node
   total_xfer_attempts  integer                 , -- total number of transfer attempts for all files in the block
   total_suspend_time   float                   , -- seconds the block was suspended since the start of the transfer
   latency              float           not null, -- final latency for this block
);

create table t_log_file_latency
  (time_subscription    float           not null,
   time_update          float           not null,
   destination          integer         not null, -- destination node id
   fileid               integer                 , -- file id, can be NULL for invalidated files
   inblock              integer         not null, -- block id
   filesize             integer         not null, -- file size in bytes
   priority             integer                 , -- task priority
   is_custodial         char (1)                , -- task custodiality
   time_request         float                   , -- timestamp of the first time the file was activated for transfer by FileRouter
   original_from_node   integer                 , -- node id of the source node for the first valid transfer path created by FileRouter
   from_node            integer                 , -- node id of the source node for the successful transfer task (can differ from above in case of rerouting)
   time_route           float                   , -- timestamp of the first time that a valid transfer path was created by FileRouter
   time_assign          float                   , -- timestamp of the first time that a transfer task was created by FileIssue
   time_export          float                   , -- timestamp of the first time was exported for transfer (staged at source Buffer, or same as assigned time for T2s)
   attempts             integer                 , -- number of transfer attempts
   time_first_attempt   float                   , -- timestamp of the first transfer attempt
   time_on_buffer       float                   , -- timestamp of the successful WAN transfer attempt (to Buffer for T1 nodes)
   time_at_destination  float                   , -- timestamp of arrival on destination node (same as before for T2 nodes, or migration time for T1s)
)

The block latency information is kept without pruning, but the file latency information is only kept for about 3-4 months (I'm not sure of the exact time), because it would grow too quickly. For this analysis, we only need the block information, I don't think we will want to look at the file information. In any case, we can't, because we haven't been harvesting it, and anything older than the last few months is lost.

The key parameters in the block latency log table are the first_replica, percent_X, and last_replica fields. These record the times at which the first file was transferred, at which X percent of the block (by file count) was transferred, and at which the last file was transferred.

Q for Nicolo: Are these values adjusted correctly for blocks that are filled while transferring? I think so, but am not sure.

Code, organisation...

I have a github repository,
git@github.com:TonyWildish/PhEDEx-latency.git
, with some initial code that I wrote a while ago to explore this data. There's nothing complete there, but it does provide a starting point, and we can use it to host the analysis code. Please send me your github account name if you want to be able to write to it directly, or just make pull requests when you have something to add and I will merge them.

The github repository contains a bin and a data directory, with README files. The short version is that the bin directory has a script for extracting the data from PhEDEx (you will need a PhEDEx installation for the Perl modules, and a DBParam with read access to the production database). For convenience, I've extracted a set of CSV files and stored them in the data directory. This may be enough for the analysis to work with, or we may need to correlate with other variables later on, in which case the extraction will need to be enhanced.

There is also a directory 'Tony' for my preliminary analysis code. There's a README there, but basically it reads the CSV files and produces a bunch of R data-frames and a few initial plots to explore the data. The code all runs, but isn't that well documented. Note that it takes quite a bit of memory, you may have trouble fitting it into an older laptop.

For now, I suggest we all make our own directories in the repository, rather than trying to share code directly. Once we get established we can see again how to organise ourselves better.

I personally don't care what language is used for the analysis. I prefer R, others will definitely prefer Python. I say we each use whatever we like best, we can converge/convert later if we need to.

Starting the analysis

Cleaning the data

The block latency data contains information on every block transferred since we made the schema changes. This includes all the single-file blocks, all the blocks with small files, and all the other odd things that we get in PhEDEx. Someone will have to spend some effort cleaning the data to extract a meaningful sub-set for analysis.

There will probably also be blocks that were growing while the transfer was taking place, in which case we may need to correct for them, exclude them, or treat them differently somehow. To spot that, we will have to look at the t_dps_file table for the creation time of files in blocks and see if they fall in the window between first_replica and last_replica.

Some analysis variables

I had the idea of defining a skew for a dataset. Define the variable skew_X for a block as:
 skew_X = (time spent transferring the last 5 percent of the files) / (time spent transferring the first X percent of the files) times X/5

If transfers happen at a constant rate, the skew should be one for all values of X. If the skew is much greater than one, then the last 5 percent took much longer than they should compared to the first X percent. If the skew is much lower than one then the first X percent took much longer than it should, compared to the last 5 percent of the files.

Given the values recorded in the table, we can calculate skew_25, skew_50, skew_75 and skew_95. Depending on what the source of latency turns out to be, one or more of these skews may be relevant, but I expect the skew_75 and skew_95 variables to show the most promise.

We could define other skews, based not on the last 5 percent but on the last 25 percent etc, but that might be less useful.

Plan of attack

Here's a proposal for how to proceed:
  • (send me your github names if you want to write code!)
  • Someone (who?) needs to do a preliminary analysis to select a subsample of blocks for the analysis. These should have a reasonable number of files, each of a reasonable size, whatever that turns out to be.
  • We need to cross-check that the blocks we select are clean, in that they weren't being added to while they were transferring.
  • For blocks that were growing while being transferred, we need to see if there are lots or not, and figure out how to deal with them (ignore them or not?)
  • Once we have a clean subset we can figure out what the next steps are

The difficult question is: how do we break down the work to achieve this?

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2014-11-26 - TonyWildish
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback