Users of the platform occasionally ask: “Why is the time to first byte (TTFB) so large even though Varnish Cache indicates it is a cache hit?” In this post, we will take a moment to address this niggling question.
The user will first notice this issue when they examine the time to first byte of the asset in question within their browser dev tools and look at a cache debug header. A cache hit typically takes around 5 to 100 milliseconds for the TTFB - the exact time depends on the latency between the user and the cache. Occasionally, however, they will see this process take several seconds; hence the question.
The usual culprit is the Varnish Cache behaviour known as Varnish Cache request coalescing or request collapsing. Varnish Cache Request Coalescing
If an asset is not in cache and Varnish Cache receives multiple requests for the same URL, it will coalesce these into a single request to the web application. Once the web application responds with the asset, Varnish Cache will examine the response and if determined to be cacheable, will serve all the coalesced requests from the single response. For more information on this, see: Varnish Cache docs
Consider this example scenario:
- Varnish Cache is set up to cache HTML for www.example.com and has an empty cache. It sends HTTP response header “x-cache” with value “hit” or “miss” for debugging.
- The web application is slow and takes 20 seconds to generate HTML files (lots of DB queries, third party API calls etc).
- Let’s pretend the latency between user to Varnish Cache, and Varnish Cache to web application, is negligible.
Time = 0s:
User A requests www.example.com/product.html User A’s request hits Varnish Cache, which performs a lookup in cache and doesn’t find it. Varnish Cache sends the request on to the web application. Web server receives request for product.html and starts to generate the HTML.
Time = 5s:
- User B requests www.example.com/product.html
- User B’s request hits Varnish Cache, which performs a lookup in cache and does not find it. Varnish Cache sees a request already in progress from user A and coalesces user B’s request to request A.
Time = 10s:
- User C requests wwww.example.com/product.html
- User C’s request hits Varnish Cache, which performs a lookup in cache and does not find it. Varnish Cache sees a request already in progress from user A and coalesces user C’s request to request A.
- Now there are three requests: from users A, B and C for the same asset to Varnish Cache, but only one request from Varnish Cache to the origin.
Time = 20s:
- Web application finishes generating /products.html and sends a response to Varnish Cache.
- Varnish Cache examines /product.html and determines it is cacheable. It serves the asset to user A along with the two other requests in the queue.
- User A receives the response, and sees x-cache: miss. The time to first byte is 20 seconds.
- User B receives the response, and sees x-cache: hit. The time to first byte is 15 seconds.
- User C receives the response, and sees x-cache: hit. The time to first byte is 10 seconds.
Time = 25s:
- User D requests www.example.com/product.html
- User D’s request hits Varnish Cache, which performs a lookup in cache, finds the asset and immediately serves it to the user - in contrast to the earlier responses.
- User D receives the response, and sees x-cache: hit. The time to first byte is 5 to 100ms - as expected for a standard cache hit.
Why does Varnish Cache do this?
Imagine you run an online shop and your origin server can take a few seconds to generate an HTML page. You send out a Tweet telling tens of thousands of users about a sale event and your Tweet links to the same handful of sales pages. Suddenly, users on the site increase from a few dozen to several thousand. If these pages were not already stored in cache, the traffic spike can cause havoc to the web application. Even if the web application had auto scaling, instances always take time to spin up before serving traffic. With Varnish Cache sitting in front of the servers (and when configured correctly to handle HTML caching), instead of thousands of requests hitting the web application at once, now there are only a handful of requests that the server can easily handle.
What does this Look Like in the section.io Logs?
section.io records detailed logs about each and every request the platform serves. In the screenshot below, you can see a typical example of Varnish Cache request coalescing behaviour.
In the screenshot, Varnish Cache-ncsa-logs show a request was made to Varnish Cache at 17:06:23 for the homepage, labelled A in the screenshot of the logs. This was a cache miss, so a request was then made by Varnish Cache to origin. We log this as the last_proxy-access-log record, in which you can see the time the origin took to respond with the home page as 25,615ms (25 seconds). During this time, another user requested the homepage at 17:06:41, labelled as B in the screenshot. Request B was “coalesced” together with request A. When the origin finally answered request A, Varnish Cache used the same response for request B which was a cache hit, however, this took a long time because the origin was slow to respond to request A.
Edge-access-logs then indicate that response A was completely served to the user at 17:06:49 and response B was served at 17:06:53. The difference in serving time is an indicator of network congestion between section.io and the end user B, normally caused by poor bandwidth on user B’s Internet connection.
section.io specialises in getting the best performance out of Varnish Cache. Learn more about how section.io can help you improve Varnish Cache performance and gain valuable insights into real world behaviour.