A highly effective website cache is optimised to store and share every possible response that only contains information relevant to all users, whilst ensuring that responses containing user-specific data are only ever served to the correct user.
The nature of state management on the web (ie cookies) means that maintaining this separation is challenging and when mistakes happen it can potentially have severe consequences as has been demonstrated by Valve’s Steam Store last December, and Menulog this month.
Essentially there are three different ways that caching can fail to work as intended:
Arguably the preferred failure mode is when responses that should be cached and shared are not and instead are served by the origin web server for every request. This can lead to slower page load times for users and also much more traffic to the origin. Sometimes this could escalate into a complete origin failure if the web servers do not have the capacity to handle all these requests.
This is the stance taken in a default configuration of Varnish Cache, the caching proxy offered by section.io. It is the result of Varnish considering the presence of a cookie in either the request or the response as a potential indicator of user-specific data and bypassing the cache to avoid either of the next two failure modes.
Sharing user-specific response bodies
The next possible failure scenario is when an origin web server response is intended for only one specific user but is cached and then served to other users, exposing potentially sensitive data. In a trivial case this may just mean that Alice sees “Hi Carol” at the top of a page but a much worse example would be Alice seeing Carol’s contact details and purchase history on the account page.
This is what happened to both Valve and Menulog. It usually results from stripping cookies from responses in the cache configuration to achieve better cache utilisation on assets that should be shared but then accidentally applying the same logic to user-specific assets.
Having a CDN like section.io where the cache configuration can be tested in environments prior to Production can help to catch these issues before they affect real users.
Sharing user session cookies
The last failure mode, and probably the worst, is when an origin web server response, including a Set-Cookie response header containing a user’s specific session identifier is cached and then served to other users. This means that other users receive a new session identifier, which represents someone other than themselves, and then continue to browse the website under the context of the user whose session id was cached.
This means two or more unrelated users can end up adding items to the same shopping cart and when one user logs in with their credentials, the other users with the same session cookie become logged in as that user too and able to perform actions on the site as that user such as changing their delivery address and perhaps even making a purchase with saved payment details.
This situation can result from a cache configuration that leaves the Set-Cookie header in a response while explicitly marking the response as cacheable - more often than not this will be an unexpected edge case in some complex conditional logic as it is rare to want to cache and share a Set-Cookie header under most circumstances.
While none of the effects of failing to cache are ideal, falling back to increased load on the origin is usually the best of bad options. Be mindful when configuring your cache solution to err in that direction in the event that changes to the shape of responses happen unexpectedly. With Varnish Cache this usually means working with the built-in VCL instead of bypassing it.
Be sure to test your caching configuration before releasing new changes to production, and preferrably integrate your caching solution into the workflow of developing new website features.
Lastly, setup monitoring on your cache miss rate so you can be notified quickly when excess traffic is passing to your origin web servers.