New Relic is one of the best tools that many of our clients use for performance monitoring, for both their internal infrastructure as well as performance of their sites through a CDN. To be able to accurately measure performance and impacts from deployments, New Relic needs to be setup correctly so the relevant metrics are not obfuscated by extraneous data.
Static vs dynamic transactions
One of the common mistakes we see is when New Relic’s APM combines both dynamic and static resource metrics into a single Web Transaction response time. This can happen out of the box when installing New Relic on certain platforms. The problem here is that serving a static file is normally orders of magnitude faster than generating a dynamic asset which may require database queries, external calls to 3rd parties, PHP application etc.
Combining these two different types of resources into a single metric for performance monitoring will often mask issues and does not offer true insight on how the application is performing.
Here you see a site that shows an excellent Web transaction time of 134ms.
To look at this metric might lead someone to think that the origin application is very fast at responding to all requests. But closer inspection shows that much of this metric is made up of static resources.
The median time for HTML responses is actually over 1 second.
We win!…..oh wait
A problem can occur with the above setup when examining the effects of a deployment, e.g. impact of caching performance and load on the origin. Unlike other CDNs, the Section platform allows easy caching of dynamic content such as HTML documents as well as static files. Offloading commonly requested HTML files from the origin frees up resources for higher value tasks. As sites evolve over time, the caching configuration may need to be updated to achieve better offload. In one instance, a client deployed a change to the caching configuration which led to a reduction in cache hit rates for static assets. This meant there was an increase in requests for static resources to the origin, but because of their New Relic setup, it actually showed as a drop in median Web Transaction time.
So what was actually a deploy that had a negative impact to performance was celebrated as a win until a deeper investigation revealed the true result.
Ok, what else can go wrong?
Another problem that can occur, is when there is a sudden change in performance for dynamic requests. For example, if dynamic requests are reliant on a database which suddenly becomes very slow to respond, the Web transaction time should reflect this to warn the dev team that something is wrong. However, if for every dynamic asset, multiple static assets are served, then the degree to which the Web transaction metric increases is now a ratio of static vs dynamic rather than the true increase. This can be detrimental if there are associated monitors and alarms using this metric, for example Apdex score. The site can be slowing down to a crawl, but as long as the static content is served fast, any monitors in place maybe blissfully unaware of the sudden drop in performance. This can cause delays in responding and deploying fixes, which can turn a minor incident into a full blown outage.
How to combat this
Focusing New Relic to gather metrics that are of value will greatly enhance the information it provides. Does it need to know the speed at which a static resource is served? In most cases, the answer is no. Even when the answer is yes, it should not be assessed together with dynamic resources. Make a separate custom metric for it.
Bruce Lee said it best, “Absorb what is useful, discard what is not, add what is uniquely your own”.
Section is a leading New Relic partner. We can assist in implementation and configuration of New Relic, through to full stack performance and availability reviews. Contact firstname.lastname@example.org for more info.