How Varnish Cache matches requests to cached responses.

Once you’ve been introduced to Varnish Cache’s power as a web application accelerator (HTTP caching reverse proxy) and understand the basics of the benefits it provides, you’ll quickly want to make sure it is maximizing the number of responses it can handle itself and not pass on to your web application server. To do this, it will need to match the incoming requests to items in its cache as often as possible without ever giving an incorrect or mismatched response.

What requests should be considered equal?

The flexibility of HTTP means that only the web server constructing the responses truly knows what parts of an HTTP request are important.

  • Downloading binary files? Only the URI path is important.
  • Serving content customized for the requesting device? The HTTP user-agent header is important.
  • Serving content customized for the user’s location? The requesting IP address or geo-location is important.
  • Running multiple sites sharing some common resources (CSS/images)? The website (host) name may not be important.
  • Need to cache per-user (session) content? Cookies (or specific parts) may be important. The patterns for what aspects of the request alter the content of the response vary as much as the features of web applications do. The dilemma of how to deal with all of these parts within Varnish Cache is handled by a function called “hashing”

Finding matches without onerous searches or masses of original data

Let’s try an analogy:

A carpet & floor covering shop wants to keep copies of floor plans for all the housing units it supplies. A number of apartment buildings and housing developments have duplicate buildings/apartment layouts and we don’t want to keep duplicate copies of the same floor plans for each one. How could we store a single unique copy of the housing layout and still connect it to a specific building when we get a call to install some carpet?

One approach to this general problem is to take some aspects of the items and mix them together into a form that is unique, simple and sortable. Though perhaps a bit impractical in real life, our hypothetical carpet shop goes about solving this layout problem by asking each customer for one drop of paint of each color used on the building mixed together. This mixed color provides a unique signature for each building that is then kept with our copy of the abstract floor plan and placed in a filing cabinet where we sort them by the swatch’s hue. When we get a new order for carpet and need to pull the right set of plans from our filing cabinet, we request another mixed paint sample and use it to match the plans in our filing cabinet. Because we can match the swatches very precisely, the chances of matching a layout to the wrong building are essentially zero.

Varnish Cache uses a similar approach. After a request arrives, a few of the request’s properties are mixed into the “hash” — by analogy our mixed paint sample. By default (https://github.com/varnishcache/varnish-cache/blob/6.0/bin/varnishd/builtin.vcl#L78-L86), this includes the request’s URL (including the hostname if available).

If you’d like to differentiate your stored responses by another property (for example, if your web server customizes the content for different types of devices/browsers), you can add your own vcl_hash subroutine that will run before the default:

sub vcl_hash {
    if (req.http.user-agent) {
        hash_data(req.http.user-agent);
    }
}

How specific is the hash (paint mixture)? Varnish Cache can differentiate 115 quattuorvigintillion (that’s a 1 with 75 zeros after it) different hash values. For some perspective, if every atom in the visible universe was numbered, each unique Varnish Cache hash value would only match 1000 atoms.

Summary

The default operation of Varnish Cache will allow you cache efficiently and without false-matches in most scenarios. To unlock the full power of caching, you’ll need to give Varnish Cache an understanding of what aspects of a request can make the response unique. Understanding hashing and VCL’s vcl_hash is the key to imparting that understanding.

Blog Categories

Interested in articles about a specific topic? Click on a category to see all related content.