The whole model (all diagrams) are currently stored in my private svn
here.
The current database schema is
here. The new db schema:
A FTS request is defined as:
- list of
LFNs
,
-
targetSE
(could be a list of targetSEs
),
- timeout condition (deadline or interval of validity),
- catalogue for replica registration.
Average number of files in one FTS request is about 10:
mysql> select avg(c),stddev(c) from ( select count(FTSReqID) as c from FileToFTS group by FTSReqID ) a ;
+--------+-----------+
| avg(c) | stddev(c) |
+--------+-----------+
| 9.5519 | 21.2040 |
+--------+-----------+
but real number varies from 1 to more than 100.
db probably not completed.
- What about keeping all information about particular file (or FTS transfer) history in one table?
- All information could be kept in 4-5 tables instead of 10 (not including
Requests
and SubRerquests
, which are going to separate db anyway).
ver 1 (lost in space)
ver 2
ver 3
ver 4
Removed status
New, setting
Scheduling as default.
Missing table for transfers.
ver 5
TODO: add
OwnerDN and
GroupDN into Tranfser.
Classes and inheritance
For brevity only public methods are shown.
What about introducing an object made of
FTSRequest
and related
LFNS
,
PFNS
,
Channel
and
ReplicationTree
records (a la ORM)?:
-
- db interface would be light
- during one transaction (session) records would be locked (thread safe?), so it will be a real state machine
- full encapsulation
-
- on the other hand
FTSRequest
object would be heavy, but it would be used only internally (inner class) in FileTransferDB
Also there is a possibility to introducing cascade of objects, retrieved on demand, e.g.:
- first select =FTSRequest=s with some status, say New, (or whatever else selection criteria)
- then, if required by processing attach to them on demand list of
FTSLFNs
and/or list of FTSPFNs
with some selection criteria
State machines
All processing should be (
exclusively) done in new python module interacting with db itself.
- this module should be inherited from
DIRAC.Core.Utilities.MySQL
class
- in above model agents would be used rather to trigger some actions depending of the FTS state
- state transitions would be done in DB module itself (no excuses! no exceptions!)
Possible way of enforcing integrity is "passing a token" between agents and db: python object built up from
FileTransferDB.FTSRequest
,
FileTransferDB.FTSFIles
and
FileTransferDB.ReplicationTree
. This could be a client (in sense of
DIRAC) and ORM in sense of db.
FileTransferDB.PFNS.Status
:
FileTransferDB.FTSRequest.Status
:
ver 1
The first version, raw and outdated.
ver 2
Comments
Obviously too complicated:
- put new table for spotted errors (i.e.
Alarms
, name it if you have a better idea) with content:
1table Alarms (
2 AlarmID (PK)
3 FTSReqID (FK)
4 Timestamp (datetime)
5 AlarmType (enum 'SchedulingError', 'SubmitError', 'TransferError', 'RegistrationError', 'Fatal' what else?),
6 AlarmInfo (varchar could be a tarceback, or some description, reason of raising etc.)
7);
8
- store only the latest alarm raised in
FTSRequest
as FK
- remove 'SchedulingError', 'SubmitError', 'TransferError' and 'RegistrationError' from states and add 'Fatal' for some definitive-100%-lethal-and-completely-unrecoverable failure (like some vital service not running etc.).
In that way state machine would be much easier too implement:
- agents will pick up only requests within certain state,
- they will change the state of request only if no errors would be spotted during processing,
- on error conditions they will raise alarm, which will go to the Alarms table, but state of request would remain unchanged - just keep trying redo your job until time out, unless of course some "definitive-100%-lethal-and-completely-unrecoverable failure" - whatever it is (we should meet together and discuss some day about that).
Also please notice it will be rather easy reset FTSRequest on time out condition: you have to change it state to New and extend its deadline. So if system will be dead for a while we have simple recovery solution. At least on R&D stage.
ver 3
Simplicity is the ultimate form of sophistication.
Comments
No need to keep
New state, of course all new request undergo scheduling eventually, so we could start from
Scheduling, right?
ver 4
Simplicity is the ultimate form of sophistication (again!)
Comments
- what if file is already there? Or is being transferred in other request? (could be marked as 'Done' in FTSScheduler.execute, for being transferred situation seems to be more complicated)
- re-scheduling should come out from schedule, submit or transfer (or from all???)
Token passing
DB structure
- encapsulation each table in DB has to have its own class describing it with full API for manipulating records
- state machine each above class should behave like a state machine
- foreign keys if any exists in table, should be retrieved on demand from within object that has it
--
KrzysztofCiba - 01-Apr-2011