Corrupted files
From Castor Operations: during routine checks we have detected files of yours in a inconsistent status in the CASTOR repository at CERN.
The origin could be an aborted transfers (like by crash or a ctrl-c on the sending side) or a genuine data corruption.
The affected files have been renamed with
.badchecksum and migrated to tape as such. The original file names now correspond to a zero size file.
Castor fid CASTOR file details
631440648 -rw-r--r-- 1 lhcbprod z5 56013448 Jan 06 14:43 /castor/cern.ch/grid/lhcb/MC/MC10/DST/00008526/0007/00008526_00078236_5.AllStreams.dst
What to do
Check if the replica is registered in the LFC:
dirac-dms-lfn-replicas /lhcb/MC/MC10/DST/00008526/0007/00008526_00078236_5.AllStreams.dst
2011-01-12 11:31:51 UTC dirac-dms-lfn-replicas.py/DiracAPI INFO: Replica Lookup Time: 0.87 seconds
{'Failed': {},
'Successful': {'/lhcb/MC/MC10/DST/00008526/0007/00008526_00078236_5.AllStreams.dst': {'NIKHEF_MC_M-DST': 'srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lhcb/MC/MC10/DST/00008526/0007/00008526_00078236_5.AllStreams.dst'}}}
this replica on Castor has never been registered in the LFC, possibly because the transfer never managed to complete.
In this case notify Castor Operations to remove the file
Why does this happen?
In principle the use case is that a file tried to transfer at CERN and failed. Then the request has been put in a failover and copied somewhere else.
But when the transfer failed in first instance the entry is not deleted (??) the corrupted replica?
. So the file is in the status that Massimo discovered. So if the DMS code is changed to properly trapped the failure then these use cases will disappeared.
-- ElisaLanciotti - 12-Jan-2011
Topic revision: r1 - 2011-01-12
- unknown