Preventing Long Tail Latency

We recently had a client approach us who was struggling with long tails in latency times with their previous CDN provider. They had a small portion of customers who were experiencing up to 30 second load times, which in their terms was “completely unacceptable”.

For this particular client, reduction in overall average latency was important, but more important to them was getting the outliers under control.

What Is Long Tail Latency and What Contributes To It?

Tail latencies are always expressed in a percentile term; long tail latencies refer to the higher percentiles (such as 98th, 99th) of latency in comparison to the average latency time. When looking at your analytics dashboard, you may notice it says something like “per request, 1% of your users are experiencing an average delay of a second”.

For service providers, it can be a challenge to keep the tails of latency distribution short, particularly for interactive services at scale. In a study led by Luiz André Barroso, distinguished Engineer at Google, it was found that the larger a system grew in scale, the wider the degree of latency variability. When the system’s size and complexity scales up, it is more difficult for service providers to provide consistency. It is not simply a question of scaling everything up as overall use increases. When a request is processed in parallel, long tail distribution of the parallel operation can immediately become an issue and dominate the overall response time. Each response must have a consistent and low latency otherwise the overall operation response time will be very slow. High performance leads to high tolerances, meaning your entire system needs to be designed to exacting standards.

Other causes of long tail latency include:

  • Shared resources – If machines are shared by different applications all vying for the same shared resources (e.g. memory or network bandwidth, processor caches, CPU cores) within the same application, different requests might end up competing for those same resources.
  • Global resource sharing – Applications that are running on separate machines might compete for global resources (e.g. shared file systems, network switches).
  • Background daemons – Background daemons may use a specific amount of resources on a typical basis; if scheduled, however, they can generate multi-millisecond hiccups. Even if these are few and far between, they can affect a significant number of all requests in large-scale distributed systems.

The Impacts of Tail Latency

Why is it that the tails are more important than the average? Gil Tene put it succinctly in a podcast on tail latency for SE-Radio: It is “because humans do not perceive the commonplace; they do not forgive you because your average is good… pain and bad experience is what you remember”.

Systems that are able to respond quickly to user actions (within 100ms) feel more natural to users than those which take longer. Responsiveness is key. Good averages are not enough as web users respond to speed. In a study run by Amazon involving delaying pages in increments of 100 milliseconds in A/B testing, even tiny delays led to substantial and costly drops in revenue. Similarly, in a study Google ran, it was shown that half a second’s delay in loading time led to a 20% drop in traffic.

In the Barroso study, he uses an example to illustrate how long tail latencies can occur:

Imagine a scenario in which a client makes a request to a single web server. Ninety-nine times out of a hundred their request will be returned within a reasonable duration of time. However, one time out of hundred it might be slow. If you examine the distribution of latencies, most are small, however there might be one out on the tail end that is large. This doesn’t have a major impact. It only means that one customer receives a slightly slower response every so often.

However, imagine you have millions of requests to multiple servers. Now instead of only one customer receiving a slow response rate, 10K are affected, significantly changing the impact of the tail latencies.

Using the same components and scaling them leads to an unexpected outcome. This is a fundamental property of scaling systems: high performance equals high tolerances. At scale, you can’t afford to ignore tail latency.

How Section Addresses Long Tail Latency

Completely eliminating all the sources of latency variability issues is not practical at scale or in shared environments; however, tail-tolerant software techniques can be implemented to help form a predictable whole out of less predictable parts.

Section practices multiple tail-tolerant techniques to combat issues of long tail latency, including:

  • Isolation - (via containers) for different environments
  • Memory limits on all containers - prevents putting a strain on the entire system
  • Kubernetes - manages the deployment of multiple replicas of data items and provides a high availability architecture
  • AnyCast DNS resolution - allows DNS queries to be routed to the closest data center for fast DNS connection times and improved website performance

A Higher Level Solution

A unique solution at a higher level that Section uses involves caching HTML in a way that gets the time to first byte down. Our competitors do not use the same method, tending to focus more on the static assets side of things. Caching HTML is an essential step when trying to eliminate long tail latency. We have found that getting the HTML delivered as fast as possible to the user is the best way to reduce page load times.

Another method for reducing the time to first byte is our ability to spin up our PoPs anywhere via our Composable Edge Cloud. Our network currently includes over 60 PoPs across North and South America, Europe, Asia and Australasia, and we are able to create new PoPs on demand.

Third, slow load times can be attributed to the resources of a third party. At Section, we actively work on either providing suggestions on how to defer these resources after the page load event, or we bring third party resources onto the client domain and cache them, which also leads to faster delivery.

Finally, we consistently provide excellent visibility on long tail latency through Section’s comprehensive RUM data, which provides our customers with the insight they need to help get those long tail outliers under control.

Similar Articles