Sometimes when you are working on Varnish you want to pinpoint a specific request that has happened. For example, your customer service team reports that customers are seeing a “Guru Meditation” error.
When we see problems like this we often run for our SSH session and grep logs. We might even find ourselves grepping a huge log file on a production system trying to find a single log entry in the varnishncsa log.
The job gets even harder when we have to work in a high availability Varnish cluster, as we need access to all the systems. In the worst case we have to check all the servers and the log we are looking for is on the last one.
Even worse, sometimes are are alerted too late, and our log files have rotated and we don’t have access to the data anymore.
And this is further exacerbated by auto-scaling Varnish behind something like an Amazon Web Service Elastic Load Balancer, because the servers may even be terminated before we get a chance to get onto the server due to failed Varnish health checks.
Instead, you can set up a log centralization system for Varnish logs, based on the ELK stack:
- Run varnishncsa on your hosts, and use rsyslog or syslog-ng to ship the data to a Logstash endpoint.
- Configure Logstash to accept your data, and enrish the data with the various filters that Logstash provides.
- Configure Logstash to output your data to an elasticsearch cluster.
- Use Kibana to query the data in an ad hoc fashion, or build your own Varnish management console.
All the tools here are free for you to download, set up and run.
Now, when your boss says “what is this Guru Meditation business all about” you’ll have tools at your fingertips that let you quickly answer that question with confidence. You can type your query into the Kibana interface and you’ll be presented with a timeseries chart, and every single log line that your Varnish instances wrote.
You can say to your boss “Don’t worry, there’s only been X instances of the Guru Meditation in the last 24 hours”. You’ll be able to move on with your life and focus on the priorities.
Having these tools available is a necessity for a low stress operations environment.
You can also set up Varnish dashboards using the Elasticsearch-Logstash-Kibana (ELK stack).
You could use the ELK stack to generate the types of metrics described in my article on Varnish Monitoring. You’ll find that the aggregation that is performed by Graphite outperforms large Elasticsearch queries, and the cost of data retention and server infrastructure is much lower. Graphite is able to show metrics that span years of data in milliseconds.
So, log all the things, measure everything that moves.