Instana U

 View Only

Logs Are an Anti-Pattern for Observability

By Trent Shupe posted Mon February 14, 2022 08:31 AM


Why Storing Every Trace Is Always Better Than Sampling

It seems that everyone refers to their application monitoring tool as Observability. Even experienced IT team members have been known to use the terms “observability” and “application performance monitoring” interchangeably – but they’re all wrong!

This confusion is natural – it comes with the territory of “newness.” The new thing always sounds cooler, so APM vendors like to call themselves observability companies instead. What exactly is the difference?

  • Monitoring is a passive process. According to the Oxford English Dictionary, it is “the act of observing and checking the progress or quality of something over a period of time.”
  • Observability is an active process, to troubleshoot application problems when they occur, to reduce outages and get back to “normal” operations.

In a recent webinar, Instana Observability Strategist Chris Farrell dove into some of these differences to explain what observability is and what it isn’t.

Analyzing anti-patterns to arrive at best practices

Common solutions to recurring problems become standardized as patterns. Solutions that have proven to not work or even be counter-productive are known as anti-patterns. Farrell said we can learn a lot from anti-patterns, and one that he mentioned specifically was throwing all the monitoring data into logs.

“There are quite a few great log analysis tools out there now, [and] they’re not all extremely expensive,” Farrell explained. “That’s fed this idea of [just throwing] everything into a log.”

Unfortunately, this approach just puts an unmanageable amount of data in one place. One reality of observability is that sometimes you don’t have enough information to know what you’re looking for, and a disorganized mess of data doesn’t necessarily help.

Logs, anti-patterns, and Murphy’s Law

“How do you find your data?” Farrell asked. “You don’t know what you don’t know, and if you’re relying on what you wrote into the logs to be what helps you solve the problem… Murphy’s law states that you won’t have the data at the time when you need it.

“I personally shudder at the thought of having one of my high-powered, very smart, high-paid – hopefully high-paid – software engineers take time out of their day when I need them writing new code and creating new features, to go spend a couple of hours analyzing log data, to try to figure out where the problem occurred.”

The right ways to reduce MTTR

A related anti-pattern that Farrell discussed is having your developers be the first line of troubleshooting defense.

In the you-build-it-you-own-it model, a developer would have to sift through logs when there is a problem to find the root cause. Fortunately, this is a forcing function for developers to write code that is easier to triage when they know what they’re looking for.

But what if the developers don’t know what the problem is? Then your developers are spending time analyzing logs instead of creating new product features.

“I talked about the idea of having your developers be your first line of MTTR,” Farrell said. In the non-monitoring mode, you’re practically guaranteeing that your developers will be the only ones who can help figure out where the problems are before the resolvers (actual problem owners) can fix them.

In the best circumstances, this is fine. In the worst circumstances, it creates a bottleneck.

Finding a real solution to modern application monitoring

All APM solutions reflect the health of your applications, but getting under the hood into details can present challenges. Observability tools track metrics, traces and logs to give a deeper understanding beyond a binary (good/bad) health check.

Enterprise Observability platforms automate continuous discovery of applications, including microservice-based containerized components. They continuously monitor upstream and downstream infrastructure components, and track the relationships and dependencies between everything. They even monitor testing and development environments.

This level of detail is critical for identifying potential problems before they manifest themselves as issues. Enterprise Observability also automatically provides context when it detects trouble, making it easy for your team to fix without the bottlenecks that logs can present.

Can your APM do that? If not, please listen the webinar on-demand to hear more about anti-patterns and Enterprise Observability. You can also get your hands dirty in Play With, a guided tour through use cases in the Instana platform.