Category
5 min read

Your Tests Are Not Enough and How Linear Lost Their Data

We’re all familiar with the core principles of DevOps: building small, well-tested increments, deploying frequently, and automating pipelines to eliminate manual processes. We closely monitor applications, set up alerts, roll back problematic changes, and ensure we’re notified when issues occur. Yet, when it comes to databases, this level of control and visibility is often lacking. Debugging performance issues can be daunting, and understanding why databases slow down isn’t always straightforward. Schema migrations and modifications can quickly become chaotic, creating significant challenges. Addressing these issues requires strategies to streamline schema migration and adaptation, allowing for seamless database changes with minimal downtime or performance degradation. Without these measures, the risk of data loss becomes very real - just as Linear experienced. Learn how to avoid such pitfalls.
Published on
December 18, 2024
Share this post
Contributors
Adam Furmanek
Dev Rel
Metis Team
See how Metis can make your database 3x faster and 50% cheaper!

We’re all familiar with the core principles of DevOps: building small, well-tested increments, deploying frequently, and automating pipelines to eliminate manual processes. We closely monitor applications, set up alerts, roll back problematic changes, and ensure we’re notified when issues occur. Yet, when it comes to databases, this level of control and visibility is often lacking. Debugging performance issues can be daunting, and understanding why databases slow down isn’t always straightforward. Schema migrations and modifications can quickly become chaotic, creating significant challenges. Addressing these issues requires strategies to streamline schema migration and adaptation, allowing for seamless database changes with minimal downtime or performance degradation. Without these measures, the risk of data loss becomes very real - just as Linear experienced. Learn how to avoid such pitfalls.

Your Tests Miss a Lot

Databases are susceptible to a variety of failures, yet they often don’t receive the same level of rigorous testing as applications. While developers typically ensure that applications can read and write data correctly, they frequently overlook how these operations are performed. Critical factors, such as proper indexing, avoiding unnecessary lazy loading, and ensuring query efficiency, are often left unchecked. For instance, while we may verify the number of rows returned by a query, we might ignore how many rows were read to achieve that result. Additionally, rollback procedures are rarely tested, leaving systems vulnerable to data loss with every change. To address these issues, we need robust automated tests that proactively identify problems, reducing reliance on manual interventions.

Load tests are often used to uncover performance issues, but they have significant limitations. While they can confirm whether queries are production-ready, they come at a high cost. Building and maintaining load tests is expensive and requires careful attention to GDPR compliance, data anonymization, and state management. More critically, load tests are performed too late in the development process. By the time performance issues are detected, the associated changes have already been implemented, reviewed, and merged, requiring teams to backtrack and potentially start over. Moreover, load tests are time-intensive, often taking hours to warm up caches and validate application reliability, which makes them impractical for catching issues early in the development cycle.

Another common challenge is testing with databases that are too small to reveal performance issues early in the development process. This limitation not only wastes time during load testing but also leaves critical areas, such as schema migrations, untested. As a result, development velocity slows, application-breaking issues arise, and overall agility is compromised.

However, there is another issue we miss.

What Happened to Linear

Schema migrations are often overlooked in testing processes. Typically, test suites are run only after migrations are completed, leaving critical factors unexamined - such as the duration of the migration, whether it caused table rewrites, or whether it introduced performance bottlenecks. These issues frequently remain undetected during testing, only to surface when the changes are deployed to production.

Linear lost their data due to wrong schema migration. As they explain in their post-mortem, they had a faulty schema migration that deleted the data in production and they had to restore the backup. Notice that the schema migration took 19 minutes to complete which is concerning as the database may experience an outage during the migration. Next, their system was up and running for another two and a half hours until they decided to put it in maintenance mode to restore the backup. What’s worse, the backup was taken over 2 hours before the faulty migration started. This makes over 5 hours of data loss.

All that could have been prevented. Let’s see how.

Introduce Database Observability

When deploying to production, system dynamics inevitably evolve. CPU usage might spike, memory consumption could grow, data volumes expand, and distribution patterns shift. While identifying these issues quickly is crucial, merely detecting them isn’t enough. Current monitoring tools flood us with raw signals but offer little in the way of context, leaving us to manually investigate and piece together the root cause. For instance, a tool might flag a spike in CPU usage but fail to explain why it occurred. This outdated, inefficient approach places the entire burden of analysis on us.

To move faster and more effectively, we need to transition from traditional monitoring to full observability. Instead of drowning in raw data, we require actionable insights that illuminate the root cause of issues. Database guardrails make this possible by connecting the dots, illustrating how different factors interrelate, identifying the source of the problem, and providing guidance for resolution. For example, rather than just reporting a CPU spike, guardrails might reveal that a recent deployment altered a query, bypassing an index and causing a surge in CPU load. With this understanding, we can take targeted action - such as optimizing the query or index—to resolve the issue. This shift from merely "seeing" to truly "understanding" is critical to sustaining both speed and reliability.

Metis empowers you to overcome these challenges. Metis monitors everything that happens in all environments, including development and non-production, and captures details of database interactions. This includes queries, indexes, execution plans, and statistics of everything. Next, Metis projects these activities on the production database to understand if they are safe to be run in production or not. This happens automatically and shortens the feedback loop. Developers don’t need to test their code anymore - it all happens automatically. Also, this captures everything that happens with the database, including schema migrations and configuration changes.

If Linear had had Metis in place, they would have been warned about the faulty schema migration. The data loss could have been prevented automatically without any explicit actions from developers or database administrators. Metis provides database reliability across all environments, CI/CD pipelines, and databases.

Database Observability Is a Shared Concern

Database observability focuses on proactively preventing issues, advancing toward automated insights and resolutions, and embedding database-specific checks into every stage of the development process. Outdated tools and workflows can no longer keep up with today’s complexities. Modern solutions, like database guardrails, address these challenges. They empower developers to avoid inefficient code, assess schemas and configurations, and validate each step of the software development lifecycle directly within the development pipelines.

Metis changes the world of databases by catching issues automatically. This way, your business can avoid data loss and database outages. Use Metis and never worry about your databases again.

This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block.

Never worry about your
database again!

Start using Metis and get your database guardrails set up in minutes