General information about the HTR board (search the "DATA FORMATS" link for more details of the HTR data formats and internal logic).

Dynamic Latency Tuning in HCAL

The CMS HCAL frnt-end electronics send (FEE) detector data to the counting room over about 3000 fibers, running at ~1.6 Gbps (using the GOL serializer). The latency of the data links from the FEE to the HTR cards (in the counting room) can change from link to link, every time a link is re-established. The root of the problem is mostly the random latency of a commercial component used to receive the link (TLK2501 from TI); it is aggravated by bad settings or status of TTCrx and QPLL chips, and possibly by temperature variations. the situation is complicated by the fact that the fibers have differences in length of a few meters. The problem is reduced but not eliminated by calibrating the TTCrx delay registers in the RBX (front-end) or in the HTR. This latter calibration is a "static latency tuning" a.k.a. "sweet spot".

The main tool for dynamic latency tuning is the possibility to send to the FEE and to the HTR a command (we call it QIE_reset, which is derived from BC0) with a fixed timing wrt the accelerator orbit. This command will be used to "time stamp" the transmitted data, and then align the data at the receiver side. When the FEE receive a QIE_reset TTC command, they send 69 IDLE words, instead of normal DATA words. The HTR cards (where the front-end link are received) detect the transition from IDLE to DATA words and store the arrival time, expressed as BCN. The resulting entity is called Idle-BCN and indicate the latency of each link.

We decided to implement a dynamic latency tuning in the HTR fw, we have 2 possible schemes for doing that:

  • dynamic latency tuning with the Latency Fifo
  • dynamic latency tuning with delay registers

The two schemes are described below. The scheme implemented in the HTR firmware since Autumn 2009 is "dynamic latency tuning with the Latency Fifo". It has worked succesfully since then. The scheme "dynamic latency tuning with delay registers " has been abandoned.

Dynamic Latency Tuning with the Latency Fifo

This scheme relies on a FIFO (it could be the Synchroniization Fifo used to transfer the data from the domain of the TLK2501 recovered clock to the domain of the HTR FPGA system clock or a new fifo). The scheme is proposed in the following email:

From: Richard Kellogg <richard.kellogg@cern.ch>

To Tullio Grassi <tullio.grassi@gmail.com>

Cc Drew Baden <drew@umd.edu>, Edward Laird <edward.laird@cern.ch>, Jose Carlos Da Silva <Jc.Silva@cern.ch>, Harold Nguyen <Harold.Nguyen@cern.ch>

Date Wed, Apr 29, 2009 at 5:38 PM

Subject Re: cross-checks on TTCrx tuning

Ciao Tullio,

[...] I have also been thinking about failure modes, including some that you discuss. I have been a bit schematic about the detection of the data/idle/data transitions, and the treatment of the idle frames themselves, but I have always thought of retaining the idles in the data stream, and thought this would happen rather automatically, due to properties of the current fifo.

[...] I propose to use ~8 consecutive idles as the "stop writing" signal, so that by the time the fifo stops accepting data, the first idles will already have appeared on the output side.

After the writing stops, the fifo will gradually empty, and deliver all eight of the accepted idles to the output. After a maximum of 16 frames the fifo will be empty. Now, if the fifo behaves as I think it does, subsequent fifo reads will simply repeatedly read copies of the last idle, ie, the output will look as if our old scheme is still intact.

The next step in the sync cycle is to stop reading the fifo, say at at 24 frames after the first idle frame was read. We should continue passing the last "stale" idle frame up the chain, so the data continues to mimic what it would have been in the old scheme. I guess this is sort of a "reading, but not turning the page" mode, rather than a mode of not reading at all, but it is important that the data being repeatedly transfered be stored outside the fifo, since it should remain constant when the fifo writes are reenabled, in the next step.

This continues until the first data frame is received. This triggers the restart of fifo writes, and the first data frame enters the fifo. With subsequent data frames the fifo continues to to fill, but the output data is still idles, since the reads have not yet been reenabled.

Then, on the expiration of the QIE_Reset delay, reads are reenabled, and the first data frame after the idles is transferred to the output. Since the fifo is now being read and written at the same rate (although with different clocks) the fifo latency is established at just the value which delivers the first data frame at the time of the delayed QIE_reset, and remains constant (fluctuating by one) until the next idle string.

Voila.

I propose that the fifo is initialized as it is now, with an occupancy of 1 and reading and writing enabled. If an idle string never arrives, it just keeps passing the data like that. (first of all, do no harm). This behavior implies the re-sync operations described above are governed by a state-machine, so that reception of QIE_Reset does nothing unless the reception of the idle string has already initiated the re-sync. But I strongly suspect you were already thinking in terms of a state-machine anyway. Actually, since the reception of delayed QIE_Reset only enables fifo reads, reenabling reads in the case where the reading has not yet been disabled would do no harm. This looks all very simple. [...] If a link is intermittent (and to consider the 99.9% case, is also delivering idle streams) it would re-sync on every QIE_Reset. If the link-drop rate is comparable to this (so that an appreciable fraction of data arrives after a re-link, but before a QIE_Reset) we cannot expect much from this data in any case, so we are not losing much at all.

In the most likely trouble mode, where a CCM QPLL unlocks, we are going to lose latency no matter what we do, but the proposed scheme is no worse than the old. The fifo will shift latency when the recovered clock (ie, unlocked QPLL) shifts by one cycle. If this happens fast enough to either empty or fill the fifo between QIE_Resets, the performance of the new scheme is about what it was before. If the QIE_Rate is large compared with the cycle-slip rate, then the re-syncs will give significant help in improving the latency behavior of the affected links. Typically an unlocked QPPL runs a few 100 Hz off the LHC frequency, so the present QIE_Reset rate (prescaled by 103) is comparable to the cycle-slip rate. But we could chose a smaller QIE_Reset prescale if we wanted to.

In conclusion, I am increasingly bullish about this radical change of scheme. Let's keep thinking.

Yours, Dick


Note that HTR pattern data are injected after the FIFO, so they would not be affected by this scheme. Implement the scheme above on a second fifo, just after the synchronization fifo. The advantage is that this second fifo will operate on a single clock domain. call this fifo: LatencyFifo. MONITOR the time difference between the 1st data (before the LatencyFifo) and QIE_reset_delayed. If it is not stable than assert an error.


Dynamic Latency Tuning with delay registers (abandoned scheme)

After a run is configured, the time-marker Idle-BCN is compared to a target ("TargetIdleBCN"); delays ("ProgInputDelay") are added on a channel-by-channel basis so that all channels report the same Idle-BCN.

Specifications and Constraints

1) the FE can send the sequence of IDLE every orbit or every N orbits. Typical values are N=1 and N=103.

2) there should be the options to run the tuning all the time (while a control bit is set), or only in between runs. Possibly also the option of running a stepper mode (where the steps of the algorithm are executed individually) and the option to execute the tuning every N orbits.

3) for "historical" reasons, the HTR can add 1, 2, or 3 delays (in units of 25-ns clock cycles, or ticks). This is normally enough to cover our need if all other settings are sensible.

4) the HTR must provide a read-only register ("StatusLatencyTun") with the status of the tuning, per channel. If bit=1 : target was reached.

5) after a delay register is changed (by fw or external sw), Idle-BCN should take the value 0xDEF. This value is bigger than the last bunch in the orbit (= 0xDEB = 3563) and indicates that Idle-BCN is not valid. When the next IDLE-to-DATA transition is detect, Idle-BCN will acquire a new valid value.

6) when the latency tuning is stopped, the channels that have not been not successfully tuned should be set to the original delay (before the tuning started). Even better would be if the firmware avoid playing with the delay settings if it appears that the target is not reachable.

Examples

EXAMPLE 1 (simple case)

ProgInputDelay = 0 (set by HCAL DAQ at configuration)

Idle-BCN = 6

TargetIdleBCN = 8

Desired behavior : the HTR fw should increase the ProgInputDelay to 2, so that Idle-BCN becomes 8, and report that the Target is reached


EXAMPLE 2 (simple case)

ProgInputDelay = 0 (set by HCAL DAQ at configuration)

Idle-BCN = 6

TargetIdleBCN = 10

Desired behavior : the HTR fw should realize that it cannot reach the target, leave the ProgInputDelay at the original value, and report that the Target is not reached


EXAMPLE 3 (less simple)

ProgInputDelay = 3 (set by HCAL DAQ at configuration)

Idle-BCN = 8

TargetIdleBCN = 6

Desired behavior : the HTR fw should decrease the ProgInputDelay to 1, so that Idle-BCN becomes 6, and report that the Target is reached. Note that performing regular subtractions with negative results in fw can be tricky, due to bugs in the development tools.


EXAMPLE 4 (complicate)

ProgInputDelay = 3 (set by HCAL DAQ at configuration)

Idle-BCN = 1

TargetIdleBCN = 3563

Desired behavior : the HTR fw should recognize the orbit boundary (at BCN=3563), decrease the ProgInputDelay to 2, so that Idle-BCN becomes 3563, and report that the Target is reached. Note that this require a "smart" subtraction (orbit-aware).


-- TullioGrassi - Jan 2010

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2014-03-21 - TullioGrassi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback