Without Observability, DevOps is Doomed

October 5, 2020

Speed of innovation is a key differentiator for businesses in a competitive market. Developers working at the bleeding edge of technology are always looking for ways to iterate quickly, yet safely. Observability helps DevOps teams move quickly with greater confidence.

As Arijit Mukherji laid out in his opening talk at a recent Splunk event, DevOps and observability go hand-in-hand.

“Move fast and break things. If we don’t move fast, the competition will beat us. That’s the reality. 52% of Fortune 500 companies have disappeared since 2000. Without DevOps services, digital transformation is doomed because it won’t be able to keep up. Without observability, DevOps is doomed. Observability provides me with vision. Imagine DevOps as a fast car, but to stop it from going in circles or crashing headlong into a wall, I need to know where I’m going. That’s the lens observability provides.” - Arijit Mukherji, Distinguished Architect, Splunk (formerly CTO at SignalFx)

Are There Three Pillars of Observability?

Kaslin Fields, Developer Advocate at Google (who illustrates cloud-native concepts as comic books) describes observability as “the board you see at an amusement park that tells you the wait time and whether or not a ride is closed”, i.e. knowing your systems so you know what’s going on. Robust observability tooling can help you iterate more quickly by providing visibility into every stage of the development lifecycle.

The three commonly defined pillars of observability are logs, metrics, and tracing. (Thanks to O’Reilly for the definitions.)

Logs

“an immutable, timestamped record of discrete events that happened over time”

Logs let you see who the last person in the system was before you and help surface highly granular local information. Since a log is simply a string or a blob of JSON or typed key-value pairs, any data can be represented in the form of a log line.

There are three forms of event logs:

  1. plaintext - the most common format of logs; might be free-form text;
  2. structured - becoming more popular, typically in JSON format;
  3. binary - in the Protobuf format, MySQL binlogs, or the pflog format.

Metrics

“a numeric representation of data measured over intervals of time”

Metrics allow you to track trends and patterns in the behavior of a system over set intervals of time, making them ideally suited to building dashboards that reflect historical trends.

Tracing

“a trace is a representation of causally related distributed events that encode the end-to-end request flow through a distributed system”

Traces are a representation of logs. They offer visibility into the path traveled by a request and its structure. Traces help system admins understand the amount of work performed at each layer and preserve causality by using happens-before semantics.

Despite widespread adoption of these definitions, there is a counter argument out there that these three pillars are too narrow and there should be a wider definition of observability tooling.

“You may achieve observability with all three, or none [of the pillars] - what matters is what you do with the data, not the data itself.” Ben Sigelman, CEO & Co-Founder, LightStep

His mindset is that these pillars are most appropriate for enterprises of “planet-scale”, such as Google, Facebook or Twitter, and engineers at smaller organizations need to find “a new scorecard for observability.” Charity Majors, CEO of Honeycomb, instead of listing preset tools defines a set of technical specifications that observability tooling should have. The debate continues.

The Growing Popularity of Observability

“In 2020, observability is making the transition from being a niche concern to becoming a new frontier for user experience, systems and service management in web companies and enterprises alike.” – James Governor, RedMonk

There are various key drivers of the rise in observability interest and adoption over the last few years. These include a rise in uncertainty in systems and apps and an increase in complexity tied to the rise of microservices and containerization. Visibility into these complex ecosystems is essential.

Feature management and experimentation require the ability to query the system from end to end at any given time in order to understand performance levels, detect bugs and execute fixes. In addition, as mentioned, there is an increased demand for regular changes and updates. Indeed, change is the goal in today’s development environment and observability provides the visibility necessary to give developers confidence to move quickly.

There is often a debate over whether observability is only necessary when companies reach a certain scale. However, in Charity Majors’ words,

“Developing software with observability is better at ANY scale. It’s better for monoliths, it’s better for tiny one-person teams, it’s better for pre-production services, it’s better for literally everyone always. The sooner and earlier you adopt it, the more compounding value you will reap over time, and the more of your engineers’ time will be devoted to forward progress and creating value.” Charity Majors, CEO, Honeycomb

The Role of Observability at Every Stage of the Development Lifecycle

There are several critical initiatives where observability should play a particularly central role. These include:

  • Hybrid/multi-cloud monitoring
  • Cloud migration
  • Application monitoring
  • Incident response

However, it’s clear that observability plays a critical role in shining a light on software systems at every stage of the development lifecycle. It provides an essential feedback loop for Engineering, Ops, SREs and DevOps teams to improve end user experience.

Ultimately, observability is about understanding performance from the end user point of view. How well does your request execute from end to end? It’s essential to understand the entire service delivery chain. Observability provides that lens from the perspective of the user.

Finally, observability is a mindset and set of standards as much as it is a technology. The tools you adopt and how you use it will, of course, depend on your specific needs.