Recovery of files lost on NIKHEF

We recieved a list (atlas_md5sums - 56,2 MB ) of lost files, which were damaged by malfunctioning hardware on matrix.sara.nl. The list included also their calculated md5 checksum. The original file was checked using check_weird_lines script for unexpected input.

The file with correct input data was then given to a slightly modified David Camerons script checkforfiles - using this script, output is needed to be stored (i.e. using nohup command etc.) This gave two files - file "results" which contains the files, which weren't found and the actual output, where were stored the files, which were found somewhere in some lfc catalogue. This results were parsed with scripts parse_output which stores lists files, which were found into different files using depending on which lfc they have been found and parse_results which strips the "results" file from checkforfiles script just to contain lfns of files which weren't found (just to have better control about the counts of files)

Then I merged each file which was created with parse-output using script LFN_from_filename_with_md5 which merges given list of files with original list of files to obtain md5 sum for each file in the list (script toNativeLFN is needed using this script).

Then I had to use my md5check script for checking every lfc separately - in my case 9 times. It uses as input list of files - on each line one name of file along with md5 and certain LFC catalogue, which is to be searched through. It gives as output list of files which md5 is the same in given LFC as in the input files, which md5 is different, which md5 is completely missing in the LFC and lis of files, which couldn't be found in LFC (it signs something went wrong). Then I merged results from all the LFCs (using cat command), and used parse_md5check_results script which checks for duplicities and whether the files have stored different md5 in different catalogues.

These are the results:

  • correct files list (22,2 MB) - list of 99434 files which were unharmed
  • corrupted files list (9,3 MB) - list of 34188 files, which were damaged at NIKHEF
  • list of files missing their md5 (1,3 MB) - list of 10959 files which md5 couldn't be retrieved from any of the LFCs
  • bad input list (3,1 MB) - list of 32309 files, which were provided us without md5 and probably weren't even found on the NIKHEF disks. I'll try to find them anywhere else, just for sure.
  • not found list (2 KB) - 22 files, which weren't found in any LFC or LRC even if they were shipped us with their md5 sum
  • LFC collision list (3 KB) - list of 19 files, which had in two different LFCs different md5 sum. One of these was always the same we recieved with the original file list, so the file appeared to be OK and the other was different. This bug(?) will be reported.
Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt check_weird_lines.py.txt r1 manage 0.4 K 2007-06-27 - 19:28 JanKubalec Checks for particular not expected lines in input file
Texttxt dump_check.py.txt r1 manage 1.8 K 2007-06-27 - 19:27 JanKubalec Finds correct srm for files in a separate dump file
Texttxt md5check.py.txt r1 manage 2.0 K 2007-06-27 - 19:26 JanKubalec Checks md5's from the file in the LFC
Texttxt strip_bad_lines.py.txt r1 manage 0.3 K 2007-06-27 - 19:29 JanKubalec Strips defined lines from input file
Texttxt unregister_files.py.txt r1 manage 1.5 K 2007-06-27 - 19:30 JanKubalec Unregisters given list of files from current LFC
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2007-11-08 - JanKubalec
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback