Bandwidth Latency’s Relationship to Throughput
This post aims to explain bandwidth and latency individually and then look at how they relate to throughput. It came about when discussing how some transfers like mailbox moves got nowhere near to using the overall capacity on a network link, and also why SMTP transfers for large emails were taking much longer than expected. We will keep this post largely at the lower levels of the network model. For those of you who don’t know what the network model is, here is a quick recap before we explain the individual terms. At the end we put it all together and explain how latency is often a more determining factor than bandwidth.
OSI Network Model
There are 7 layers in the OSI Model which allows network traffic to go from one application on one machine across to another application on another machine. Your computer, network card and network equipment will dissect or combine piece of information as they need to be able to pass it onto the next layer.
- Physical – the hardware that actually transmits/receives and converts the signal from digital to light, electrical pulses, etc.
- Data Link – flow control (who can send/receive) and frame synchronization
- Network – switching and routing
- Transport – end to end recovery and data flow
- Session – sets up, coordinates, and terminates conversations, exchanges, and dialogues between the applications at each end
- Presentation – works to transform data into the form that the application layer can accept
- Application – the actual application you’re working with
Refer to OSI_Layers for more information
There is a handy acronym to remember these that I use, which is ‘Please Do Not Take Sales Peoples’ Advice’ The key thing to remember is that when you send anything from any application to another application on another machine, it will traverse up and down this stack.
At the different layers a message is broken up into smaller pieces. For example the network layer needs packets, the data link layer uses frames and the physical layer uses bits. At the transport layer you can have segments (for TCP) or datagrams (for UDP). So there are different terms used for the same information at different layers of the OSI model
Bandwidth
Bandwidth is the overall size of the pipe, or the theoretical maximum amount of traffic you can get down that cable. It can be likened to the number of cars that can drive along a stretch of highway. Increase the number of lanes on a highway and you can get more cars along that stretch (well in theory anyway…)
Bandwidth is measured in bits per second. Note that a bit is different to a byte (which is 8 bits) The measures used for bandwidth include Kbps (Kilobits per second), Mbps (Megabits per second, Gbps (Gigabits per second). Note again that this is different to KB, MB and GB. An interesting fact about bits per second is that as you go from K to M to G you are increasing by an order of 1000 and NOT 1024. So 1 Mbps = 1000 Kbps and not 1024 Kbps.
Latency
Latency is the total amount of time it takes for a message to get from one layer on one machine to that same layer on another machine. We often measure latency at the layer 7 (for application or end to end latency), or in the case of network monitoring we are measuring at layer 3 (for network latency)
Many factors go into latency, starting with perhaps the most obvious one… the speed of light. Since ultimately speed is ‘hampered’ by the fastest theoretical speed possible, that’s the first piece. Now light travels quickly at just over 186,000 miles per second, or 186 miles per millisecond. However if your cabling takes you the long way around (for example international fibre cables are not in as straight a line as you might think) then you can quickly get to thousands of miles or quite a few milliseconds.
The second factor is hops, or more importantly how long it takes the device at each hop to receive, process and forward the packet on. Now a hop can include many different kinds of network devices, include switches, routers, firewalls, load balances and WAN optimization devices. All of these will add a little bit of latency to the overall measurement.
It is interesting to note that most of these devices are fully aware of this and will try to process the packets as low down as possible in the OSI model. For example a router will often only actually route the first packet it sees and after that allow switching technology to process the packet instead. It is all about reducing latency. This is also why sometimes when you are doing a traceroute, the intermediate steps can show a higher latency than the end to end. This is because a traceroute is a series of pings with an increasing Time To Live (TTL), and at each hop the TTL is reduced by one until it reaches zero when that device needs to handle the packet and respond. All of this just means that the device with the higher latency is under some kind of load which makes it take longer to push the packet up the stack for processing (since a packet sent to the router itself requires application level processing and not just routing or switching at the network layer)
Similarly a firewall will act as a router or switch after a session is established and it has checked if the traffic is allowed.
Types of Packets
If we concentrate on TCP for now, which is the more common protocol. Then there are different types of packets. You may have heard or seen terms like SYN, FIN, ACK. These three packet types in particular are used to start and end a transmission. The start of transmission is three packets also called a three way handshake, which is a SYN from source to destination, a SYN-ACK from destination to source and an ACK from source to destination. After this the two sides know about each other and know that both are ready to start chatting.
When one or the other side is done it will most often send a FIN packet, which is responded to with a FIN-ACK to acknowledge that all communication is now over.
There is also a type of packet called RST, which is used to reset the transmission. This is often used by firewalls and intrusion prevention devices to terminate a rogue transmission.
The flow of TCP data
After a three-way handshake has taken place and both sides know we’re ready to go then the flow of data really begins. Now simplistically the sender sends the data in segments and the receiver acknowledges (ACKs) receipt of the packet. If this was done one single segment at a time then the effect of having to wait to hear an ACK would quickly become pretty limiting. So what is done is the sender sends multiple packets and the receiver ACKs them when it gets them (or even selectively ACKs based on how much data/time since the last ACK)
However now the problem is what happens when the recipient misses a segment? The sender needs to keep track of how much of its transmission has been ACKed. If it doesn’t get an ACK back in time then it has to assume that something went wrong and it will start back at that last ACK point and retransmit the traffic again.
Ok so this is great but all this retransmission could cause an enormous storm of traffic. To resolve that a TCP Window Size was introduced. This is negotiated between the sender and the recipient and is constantly monitored by both. The TCP Window Size being the amount of non-ACKed traffic, or traffic in flight. Since TCP is quite an old protocol, the maximum TCP Window Size is 64KB. This was plenty in the old days when network links were always measured in Kbps, and when network links did drop packets on a regular basis. The two sides would find a happy balance between waiting for ACKs and retransmission.
As networks became faster and more reliable they found that applications were still waiting for ACKs to be received before being able to send more data. To overcome this they added something called TCP Window Scale. which is a factor of 2 multiplier. This can be at most 14, making the the maximum window size 64KB * 2^ 14 or 1GB
Putting this all together
Right so finally now we understand the various bits (pun intended) and so let’s see how they relate. Latency is how long messages take to get from A to B, and dictates how long A has to wait before sending more data. So if latency is say 250ms and the maximum Window Size is 64KB then at most we can send 4 x 64KB in a second, or 256KB per second. Note this is regardless of how much bandwidth you have, as long as it is more than 256KB per second, or 2.048Mbps. So even on a 100Mbps connction two devices that negotiate a 64KB window size will only ever be able to transmit just over 2 Mbps when latency is 250ms.
What about Window Scale though, I hear you say. That’s right IF window scale is supported by the sender, the recipient and all intermediary devices then it is likely that you will be able to get closer to the 100Mbps limit set by the bandwidth. However once you leave your own network and head out onto the Internet you may well find that TCP Window Scale is not something that you can get to work.
So on the Internet if you have a latency of 250ms and you’re stuck at no more than 2Mbps then you’re falling victim to TCP Windows Size maximums.
Final Comments
Two other terms should at least be mentioned. QOS and Traffic Shaping. QOS is Quality of Service. It assigns a priority to different types of traffic, for example voice traffic could be given a high priority and email a low priority. It is important to note that this does not prevent saturation of a link, it just means that when a link is saturated then it will be more likely the email traffic is dropped before voice traffic.
Traffic Shaping is about splitting a physical path into multiple smaller paths and ensuring that some traffic can only use the smaller path. So again you could split voice and email traffic, but this time the email traffic could be limited to no more than 10% of the bandwidth, leaving plenty free for voice at all times.