Category
5 min read

Stop Making Classical Monitoring Famous

There is a discrepancy between what developers want and what evangelists or architects think the right way is. It’s clearly seen in the world of monitoring and observability. Architects and C-level executives promote monitoring solutions that developers just can’t stand. Monitoring solutions are just wrong. Let’s see why.
Published on
August 4, 2024
Share this post
Contributors
Adam Furmanek
Dev Rel
Metis Team
See how Metis can make your database 3x faster and 50% cheaper!

There is a discrepancy between what developers want and what evangelists or architects think the right way is. It’s clearly seen in the world of monitoring and observability. Architects and C-level executives promote monitoring solutions that developers just can’t stand. Monitoring solutions are just wrong. Let’s see why.

Why Developers Have Difficulties With Monitoring

There are many reasons why developers have difficulties with solutions like Datadog, Dynatrace, New Relic, or Appdynamics. We are going to examine many aspects where these solutions fall short and later we will see why we need a new approach.

Useless Data

Business loves metrics, targets, and KPIs. Efficient organizations need to have a way of measuring their performance and achievements. They form measurements that can be easily compared over time and used to brag about in the industry.

These measurements should reflect our business goals as directly as possible. That’s often not the case. We can’t easily quantify the customers’ satisfaction, so we look for some indirect signals like the number of their complaints. Unfortunately, it’s well-known that we get what we measure. If the metric is the number of complaints (issues, tickets, etc.), then the support teams will focus on making sure there are fewer of them. They will try to prove that complaints are unjustified, will merge many complaints into one to reduce the number of tickets, or will juggle when the tickets are open and for how long. Effectively, the number of complaints may go down suggesting that customers are happy whereas, in fact, the customers are even more frustrated.

One of the best examples of inefficient measurements in the world of monitoring is the number of signals the monitoring solution can bring automatically. The leadership thinks that the more metrics we have the better. Effectively, monitoring solutions strive to capture everything that can be captured and show these metrics efficiently.

Monitoring solutions excel at bringing OS-level metrics, CPU counters, memory statistics, garbage collector trends, network traffic aggregates, JVM figures, and other numbers from across the stack. This is wrong. Developers don’t need that.

When was the last time you checked your L1 cache size to optimize the slow database query? Or when did you check the number of used inodes in your system when troubleshooting why your database index wasn’t used? We know the answer. And yet, the architects and C-level executives think monitoring solutions are great because they bring more metrics.

It’s not about how many metrics we bring. It’s more about how many useful metrics we have. Most of the metrics that monitoring solutions bring are effectively useless for developers. And yet, architects are just eluded when they hear that their monitoring solution just scraped another thousand metrics with little to no changes.

Developers don’t need thousands of metrics. They need a few relevant metrics about their business.

Tedious Configuration

Another thing that architects love is alarms and alerts. That’s another example of optimizing measurements instead of doing the right things. What’s the best way to reduce the MTTR? It’s obvious - just bring the people to work on the issue earlier. How to achieve that? Just page them at night when the metrics spike.

This sounds reasonable at first glance. We observe the metrics going up and we call the technicians. And again, monitoring solutions have super simple KPI to optimize - the number of alarms and configuration options. They decide to bring more and more alerts and swamp developers with pages at night.

After some troubleshooting, developers learn how to deal with alerts. They learn which metrics are important and which can be safely ignored. So they reconfigure their alerts to be paged less often. Unfortunately, they need to do it on a case-by-case basis for each application they deal with.

Monitoring solutions fall short again. They don’t have decent ways to detect what’s important. They want to be rather safe than sorry so they notify about every possible anomaly. Developers need to tune the alerts manually and configure them based on their business knowledge. Monitoring solutions don’t make it any easier as they don’t understand what they’re doing. Monitoring systems blindly bring data, detect anomalies based on numbers, and then scream that metrics changed. Metrics, not the business.

Developers don’t need alerts when metrics spike. They need alerts when their business breaks.

Non-Actionable Signals

So we get the CPU going up. We detect that with our beloved monitoring solution. And then we page the developer. The developer comes and sees the CPU spiked. And then they say “CPU spiked, so what?” Is it good? Is it bad? Should they do something? What should they do exactly? So many questions and so few answers.

Monitoring solutions focus on showing what happens without explaining what to do. They bring data to show potential issues and then let the developers do the work. Unfortunately, current monitoring solutions focus on the data instead of the solutions. For instance, New Relic explicitly asks the developers to read the logs:

Similarly, Dynatrace introduces AI to let developers find more metrics to browse

Developers don’t want to read logs or browse metrics. They need to fix the issues and they need help with doing so. Showing the data is like giving them textbooks about computers. Obviously, this is a solution and developers should expand their knowledge. However, it’s not efficient. Standing on the shoulders of the giants is not about doing the same things they do. It’s about using the understanding they gained. Monitoring solutions bring data that developers can use to educate themselves but they don’t bring understanding.

Developers don’t need to be paged about the issues. Developers need to get solutions!

Reaction Rather Than Prevention

Yet another metric the monitoring solutions optimize is time for reaction. They need to detect the issues as early as possible to notify the developers and let them troubleshoot the issues. This means that monitoring systems react when issues appear instead of preventing them from happening

Instead of preventing the issues from happening by analyzing what may go wrong during deployments, migrations, or traffic changes, monitoring solutions just observe what happened and detect the errors afterward. It’s easy to understand why they do it this way. It’s easier to prove that monitoring solutions detected issues earlier rather than prove that they made the issues not happen at all.

Developers don’t want to troubleshoot their issues fast. They want to avoid issues entirely.

What Developers Want Instead

Let’s now see what developers want instead and how Observability 2.0 can bring that.

Avoiding the Issues

First and foremost, developers want to have no issues at all. Developers want modern observability solutions that can prevent the issues from happening. This is far bigger than just observing the metrics. This includes the whole SDLC pipeline and all stages of the development inside the organization.

Issues in production do not start with excessive traffic. They start far earlier when the developers implement their solutions. Later, developers deploy these solutions to production, customers start using them, and then issues arise.

Metis takes a different approach. Metis integrates with developers’ environments and provides observability very early during the development. Metis checks queries and schema migrations to detect which things will take the production down.

Metis prevents data loss and slow schema migrations by integrating with CI/CD pipelines.

Metis observes everything from the very first keystroke. This way, developers can avoid issues entirely instead of fixing them afterward.

No Manual Tuning

Developers deal with hundreds of applications each they. They can’t waste their time manually tuning alerting for each application separately. Metis takes care of that.

Meis analyzes database-oriented signals to detect anomalies and fix issues automatically. Metis can analyze schemas, indexes, extensions, configurations, and queries running in the database. Metis knows how databases work and understands which things are issues.

Developers don’t need to analyze their applications manually and use their business knowledge to tune the alerts. Metis handles that automatically.

Solutions Instead of Alerts

Developers need to optimize their databases Monitoring solutions focus on bringing data and ask the developers to do the hard work. Metis fixes the problems instead.. Metis finds slow queries, analyzes them, and shows how to make them faster.

Similarly, Metis analyzes database schemas, indexes, configurations, and extensions to find improvements. Metis gives actionable solutions instead of raw metrics.

Metis monitors all the database activity and detects anomalies or performance issues. Instead of showing the data points, Metis troubleshoots automatically and provides solutions.

This way, developers immediately know how to fix the issues to make their systems better.

Stop Making Classical Monitoring Famous. Go For Contextful Observability 2.0

All big monitoring solutions are wrong. They overfocus on bringing the data. They are great at extracting thousands of infrastructure metrics about your CPU, network, memory, and garbage collector. However, they swamp you with data points instead of giving you answers. They want to optimize KPIs that are easy to tackle but bring little to no benefits.

Observability is not about bringing more metrics and paging developers earlier. Observability is about prevent the issues from happening and automatically giving solutions when they happen. 

Architects and C-level executives love monitoring because they see KPIs that seem reasonable. Developers dislike classical monitoring solutions because these solutions make the developers waste time. Observability 2.0 is about bringing solutions and understanding to developers so they can be true owners. Metis leads the way and replaces tedious troubleshooting with actionable insights and automated solutions. Stop using monitoring. Use observability 2.0.

This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block.

Never worry about your
database again!

Start using Metis and get your database guardrails set up in minutes