Main Web>MatiasAlejandroBonaventura>TDAQSimulation>TDAQSimulationGroupCommunication>PhaseISimulation>PhaseISimulationTestsWithNewHybridSwitch (2016-08-25, MatiasAlejandroBonaventura)

EditAttachPDF

Tests with the new hybrid switch

In this page we describe tests with TCP and AQM using the new hybrid switch. See the implementation and idea of the new hybrid switch here.

In the previous test to verify the implementation we tested with almost constant flow rates. Here we will try with TCP controlled rates

Tests with the new hybrid switch
- Basic test with discrete TCP
- Solutions and Model changes

Basic test with discrete TCP

Implementation

To test with discrete TCP we introduce a coupled model that implements TCP (this is exactly the same as the one used in the TCP-AQM model). This coupled model has the TCP sender and receiver sides. The ACKs sent by the receiver suffer a constant delay to simulate the return path (to start with we only test one way path. ACKs are very small so it is expected that they don't generate congestion).

To compare and contrast we use a normal discrete router. For practicity we put it inside the same model. When we run the hydrid model we disconect the discrete router. To run the discrete model to compare, we disconect the hydrid router.
In the discrete model, note that the hybrid router still receives packets so it measures everything, but the forwarded packets are not sent anywhere. The simulation is managed by the forwarded packets of the discrete model

Left: hybrid router ; right: discrete model

Configuration

Router_Capacity = 5Mb/s (1250 packets/s)
Discrete_Generation = 5Mb/s (1250 packets/s)
Continous_Generation = Pulse(Start=3, Finish=7, Amplitud=500 packets/s)
Router_queueMax = 100;

We expect with this configuration that during the first 3s the Wd moves alone.

Detected bugs & workarounds

Detected bugs & workarounds

a bug in the TCPSND model was detected and creating problems, so to work around it we set the INTER_SND_TIME and the INTER_REQ_TIME to 4e-12. This is just a workaround so that the bug has less changes to appear but not a fix.
Also TCP at SS seems to be duplicating the window at the very first ACK received. For example, if the windowSize=2 (there are 2 flight packets). When the first ACK arrives, 3 new packets are sent. Then when the 2nd ACK arrives a single new packet is sent. The expected beheavior is that when an ACK arrives, 2 new packets are sent.

This was not fixed or workaround, as it should not change much the behavior.

Results

Discrete model

In red are TCP variables (CWND and measuredRTT). In green is the discrete queueSize. In blue are the hybrid variables (queueSize and Wd sampled rate).

It is interesting to see how well the hybrid router estimates the queueSize. Also you can see that the packetSampler output almost constantly the same rate.

Hybrid model

Here we can see that the hybrid queue model seems to forward packets fine and the window size shape seems reasonable with expected TCP behaviour.
Nevertheless, it is quite different from the discrete model results. You can see that the discrete model had 2 drop of the CWND while the hybrid model had 3. The sampled rate in the discrete model seems quite stable and in the hybrid model the sampled rate is changing (aparently similar to the window size). It is still interesting to see that the queueSize estimation is very similar to the discrete queue.

Tracking discrepancies

To understand this discrepacy we dig deeper into the results.
The biggest difference is the measured rate of packets. Below we print the values of the sampled wd, first for the discrete experiment, then for the hybrid one

Discrete Router_hybrid.wd.value(1:40)
column 1 to 11
100. 0. 200. 0. 400. 0. 800. 0. 1600. 0. 2400.

column 12 to 20
800. 2000. 1200. 1300. 1300. 1300. 1200. 1400. 1200.

column 21 to 28
1300. 1200. 1400. 1200. 1300. 1200. 1400. 1200.

column 29 to 36
1300. 1300. 1300. 1200. 1300. 1300. 1300. 1200.

column 37 to 40
1300. 1300. 1300. 1200.

Hybrid Router_hybrid.wd.value(1:40)
column 1 to 11
100. 0. 200. 0. 400. 0. 800. 0. 1600. 0. 3200.

column 12 to 22
0. 4500. 0. 0. 4600. 0. 0. 0. 4700. 0. 0.

column 23 to 33
0. 4800. 0. 0. 4900. 0. 0. 0. 5000. 0. 0.

column 34 to 40
0. 5100. 0. 0. 0. 0. 5200.

This shows that in the discrete model the packets arrive at the sampler in a more constant manner (always 12-13-14 packets in every 0.1 sampling period). In the hybrid model packets seem to arrive in bursts (45 packets in 0.1s, then 0 packets for 0.2s, 46 packets then 0 for 0.3s, etc).

This suggest problems with the sampling (also seen before related to packet discards here). In this case, the bursty sampling seems to generate a bad calculation of the queue size.
Looking at the blue plot in each experiment, in the discrete case the wd is constant causing the queueSize to raise smoothly. The queue becomes full at ~t=5.8s and falls abruptly.
In the hybrid case, the wd is bursty (lot of 0s) and each burst becomes bigger. The 0s causes the queueSize to go down abruptly, the big burst causes the queue to grow very fast. At ~t=2 it becomes full for the first time and reaches this state several times until ~t=3s.

Tracking these bursts and TCP behavior (reading logs) we found that they are caused because ALL packets from the CWND are applied exactly the same delay at the hybrid queue. In the discrete queue, each packet suffers a delay according to the queue size which with a window size of 32 the queueing delay difference between the first and last packet can be of ~0.012s. In the hybrid queue all packets arrive at the same time (that is how the TCP model generates them) and because the queue size will not change until the next sampling they will all be aplied the same delay (maybe 0 if the queue was empty). Because they are all applied the same delay they exit at the same time the router, arrive at the same time at the TCP_rsv and ACKs arrive all at exactly the same time. This creates the bursts. The bigger the CWND the bigger the burst in the hybrid model, and the bigger the difference of delay in the discrete model.

To reduce this effect one would think to increase the sampling rate to avoid sampling bursts and then 0s. But whatever value we set to the sampling rate this effect will happen because all packets arrive at EXACTLY the same time. A very small sampling rate would also make the burst even bigger (but for short times).
We need each packet in the burst to be applied a different delay.

Indeed, it is not necessary at all packets get a different delay but it would be good enougth to detect when the delay would change above a certain threadhold. This start sounding like a QSS sampler! That outputs a new rate only if the rate would change above a certain threadhold.
Another problem is that packets arrive at the same time and we need any change to be applied to incomming packets BEFORE they get to the queue. So we need to properly sent the priorities in the queue model in order for any sampling output to impact the delay first and then forward packets to the queue (were they are applied the calculated delay). This is hard to solve because the delay depends on the queueSize. In the continous world the queueSize can grow when time advances, but can not grow instantly (no matter how high we set the input rate, the queue will not grow in 0s from 0 packets to 50 packets).

In summary we found the following issues:

Bugs in the TCP models (mainly in tcpsender)
hybridQueue can not handle bursts of packet in the same t
packetDiscard and delay are applied a lot AFTER the packet has actually arrived

Solutions and Model changes

Solutions to detected problems

To solve previous issues we made several changes to the hybrid queue:

We worked around the bugs in tcpsender model. The main bug is: when an ACK arrives new packets are requested to the queue and then sent. If another ACK arrives in the time between the previous ACK and the sending of new packets, the second ACK is forgoten (does not trigger new packets to be requested and sent)
1. We set INTER_REQ time to a very small number (4e-12) to reduce changes of the BUG to trigger
2. We set INTER_SND time to 1/(C*10)=0.00008. This is the value used in the original model. (this value interfears with the bug but is required for item #2)
3. We added an INTER_SND time to the tcpreceiver and set it to TCP_SND.interPacketSendTime*2=0.00016. This is to avoid the bug from triggering and allowing for the tcpsender to use the intersend time
To avoid burst of packets at the queue in the same t (which never happens in the real world as packets are always serialized in the cable) we used the tcpsender INTER_SND time. To proper thing to do is to add a link between the tcpSender and the router.
To fix the packetDiscard and delay issues the following changes were introduced (see image below for reference)
1. The continuous models that calculate the discard rate were moved inside a discardRate coupled model to simplify visualization. This does not change behaviour.
2. The continuous models that calculate the discard probability were moved inside a discardProb coupled model to simplify visualization. This does not change behaviour.
3. A new packetToRate model replaces the sampler. This model outputs a rate every time a new packet arrives. The output value is calculated based on the routerCapacity and is designed to cause an increase of 1 unit in the queueSize in a very short time (it would be the continous counterpart of a discrete packet arrival). The "very short time" parameter is called rateFrequency.
  Thus, the output value is calculated as: (1 / this->rateFrequency) + this->routerCapacity
4. In order to have accurate calculation of the queueSize the integrator parameters were updated to: dqmin=1 ; dqrel=0 (uniform quantum);
  Setting dqrel to 1 caused many packets to be applied the same delay. TODO: More tests should be perform to understand the effect of relaxing these values.
5. Priorities: in order for the packet discard to work properly the calculated discard probability has to arrive into the packetDiscard model BEFORE the actual packet. To allow this to happen the priorities for models had to be adjusted as follows:
  1. PacketToRate, discardProb and discardRate models are set with the higher priority. The packetDiscard model is set with low priority.
  2. A LessPriority model is introduced just before the packetDiscard model at set with the least priority. This model forwards packets without any delay. It is here to allow the discardProb signal to arrive BEFORE the packet into the packetDiscard model.

Model after changes:

Tests with new changes

With the changes in the model described above we performed new tests. First with only a discrete flow.

Results only a discrete TCP flow

In this experiment we set only one flow of discrete paceket.

Configuration

- The source generates at very high rate so that the sending rate is controlled by the TCP window.

Discrete router

HybridRouter

-- MatiasAlejandroBonaventura - 2016-08-15

Topic revision: r8 - 2016-08-25 - MatiasAlejandroBonaventura

Main

Webs

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
Main All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback