Contributing back to the community

Varnish Cache was the first reverse proxy offered by section.io and, for many of us, it is our favourite because it always just works and gives great results.

Yesterday though we encountered an issue where our Traces page in the section.io Aperture portal failed to retrieve the Varnish Logs for a particular section.io-hosted Varnish instance. It had timed out before the results returned. Our platform team investigated.

Aperture Traces page

Broken

Our system gathers the Varnish logs on-demand by executing the varnishlog tool in the relevant proxy container and capturing the output. By default, varnishlog would tail the log indefinitely, reporting activity as traffic is handled by Varnish, but because we want a sample to return to the portal quickly, we specify the -d option to varnishlog which instructs it to output the recently recorded logs, then exit.

Upon investigation though we found that varnishlog -d hadn’t exited, it was still running and still outputting logs as traffic flowed. My colleague, Ivan, discovered a Varnish bug reported in 2013 that varnishlog was ignoring the -d option entirely. This bug report included to a reference to a code change that resolved the issue, so we investigated that code further.

Ivan also noted that the output from the still active varnishlog process included the text Log overrun and Log reacquired which lead us to another part of varnishlog’s source code where we found logic almost identical to that which was changed in the aforementioned bug fix. This led us to conclude that there was an edge case where the -d option would become ignored if the circular buffer used Varnish for its log was overrun whilst trying to read it. This edge case also appeared to exist in all versions of Varnish Cache since 4.0 inclusive.

Fixed

We did not know how long it would take to fix this issue in Varnish’s log tool itself, or if any such fix would even be applied to all affected Varnish versions. The chance of the issue re-occuring is also somewhat unpredictable, so we needed to work around it in our system. We decided on an approach that would improve upon our platform’s general reliability and began to implement the changes.

Meanwhile, we submitted a description of the problem to the Varnish team along with a suggested code change to fix the issue. Within about 4 hours, our change had been accepted to be included in the eventual release of Varnish 5 and very likely in any updates to both Varnish 4.1 and 4.0.

I think I like Varnish even more now.

Blog Categories

Interested in articles about a specific topic? Click on a category to see all related content.

section.io Sign up

Want to get started improving your website performance, scalability, and security? Sign up for a 14 day free trial of section.io and see what we can do for you!

Get started