Testing shows the presence of defects, not their absence
Just because all your tests pass doesn't mean the software works well. This testing principle explains why a green suite can give you a false sense of security.

The test suite passes. Green across the board. The CI pipeline completes without a single failure. Then someone from support messages you: “Hey, a customer says they haven't been able to complete signup for three days”. You look at the tests, they're all green. You look at production, the bug has been there since the last deploy. Your tests didn't fail because they were never designed to catch that defect.
What the principle says
The first testing principle according to ISTQB makes it clear: testing can show that defects exist, but it can't show that they don't. It's a subtle but fundamental distinction. When your tests pass, all it means is that the scenarios you defined behave as expected. Nothing more.
The idea isn't new. It has roots in the philosophy of science, specifically Karl Popper's principle of falsifiability. In the same way that a single black swan disproves the claim that all swans are white, a single bug found proves the presence of defects. But running a million tests without failures doesn't prove that the software is bug-free.
Dijkstra summed it up memorably decades ago: “Program testing can be used to show the presence of bugs, but never to show their absence”. It wasn't a pessimistic opinion, but a technical observation about the inherent limits of any verification process.
Why it matters in practice
This principle sounds theoretical until it hits you in production. There are patterns that repeat in pretty much every team I've worked on.
The false confidence of a green suite
A team with 2,000 tests and 95% coverage tends to feel like their product is locked down. That feeling is dangerous. I've seen projects with enviable coverage metrics that still kept piling up critical bugs because the suite exercised lots of lines of code but validated very little real behavior. Coverage measures what code runs during tests, not what conditions are actually being checked.
High coverage without meaningful asserts
This is one of the most common problems, and one of the hardest to spot in a quick review. A test that calls a function and checks that it doesn't throw technically covers those lines. But if it doesn't validate that the result is correct, that side effects happen, or that the final state is what you expect, that test is smoke. In my experience, when you do a serious audit of a large suite, you find that somewhere between 10 and 20% of the tests have no meaningful asserts, or only validate trivial things like whether the result isn't null.
The classic “it works on my machine”
The tests pass in the dev environment. They pass in CI. But the bug shows up in production because there's a difference in server configuration, in the version of a transitive dependency, in real user data, or in concurrency under load. Your tests prove that the code works under the conditions you've defined. Production has its own conditions, and they don't always match.
The signup flow nobody tested
Imagine a signup form with email, password, and name validation. You have tests for invalid email, short password, and empty fields. They all pass. But nobody wrote a test for an email with unicode characters that are valid according to the RFC but not supported by your validation library. Nobody tested what happens when the confirmation email service takes more than 30 seconds. Nobody simulated a signup from a browser with JavaScript partially blocked. Tests cover what you imagined, and bugs live where you didn't.
Common mistakes when you ignore this principle
When a team forgets that tests only prove the presence of defects, it falls into predictable traps.
- Using coverage as a goal instead of an indicator. Chasing a coverage number leads to tests that walk through code without validating behavior. The result is a high metric and low protection.
- Cutting back on exploratory testing because the suite is green. If the automated tests pass, people assume there's no need to explore manually. But the suite only tests what is known. The unknown needs human eyes and a destructive mindset.
- Making release decisions based only on the suite result. I've seen teams automate deploys to production if the tests pass, with no extra validation at all. That works until it doesn't, and when it stops working the impact is usually big.
- Not questioning tests that always pass. A test that hasn't failed in a year can mean two things: the code it validates is very stable, or the test isn't sensitive to real changes. The second option is more common than it seems.
How to apply it in your team
Accepting that tests don't guarantee the absence of bugs isn't giving up. It's adjusting your strategy to maximize actual defect detection.
1. Add mutation testing
Mutation testing is the most direct way to tell whether your tests actually detect defects. Tools like Stryker change production code in controlled ways, for example by changing a > to >=, deleting a line, or flipping a boolean, and then check whether any test fails. If the mutation survives, your tests wouldn't have caught that kind of mistake.
You don't need to run it on the whole project. Start with the most critical modules and expand from there. The results are usually revealing: suites with 90% coverage where 40% of mutations survive.
2. Do focused exploratory testing
Exploratory testing is not “clicking around to see what happens”. It's a structured activity where you define a mission, set a time limit, and document what you find. For example: “for 30 minutes I'm going to explore the password recovery flow using email addresses with special characters and slow connections”.
This kind of testing complements the automated suite because it goes exactly where automation doesn't reach, the paths nobody thought to encode.
3. Review the quality of your asserts
Set aside a session to review the tests in a critical module and classify each assert into three categories. Strong asserts validate specific behavior and would fail if the code behaved incorrectly. Weak asserts validate generic things like the result not being null or the response having a 200 status, without checking the content. And missing asserts are tests that execute code but don't check anything meaningful.
If more than 20% of your asserts are weak or missing, you have a test quality problem that coverage doesn't show.
4. Ask yourself what you're NOT testing
After writing tests for a feature, stop and ask yourself an uncomfortable question: “if I had to find a bug here, where would I attack it?” Think about the inputs you haven't considered, the prior states you didn't set up, the external dependencies you mocked with happy responses, and the race conditions you ignored.
In my experience, the most painful production bugs live exactly in those gaps between what we test and what we assume.
5. Diversify your defect sources
Don't rely only on automated tests. Combine several layers of detection that complement each other.
- Automated tests for known regressions and happy paths.
- Exploratory testing to discover what automation doesn't cover.
- Production monitoring with alerts for 5xx errors, abnormal latency, and error rates by endpoint.
- Feature flags for gradual rollouts that limit the blast radius if something breaks.
- Code review focused on the paths that aren't being tested, not just style or readability.
Tests are a net, not a wall
A test suite is like a fishing net. It catches a lot, but there are always fish that slip through the gaps. The key isn't pretending the net is watertight, but knowing where the biggest holes are and using complementary tools to cover them.
Next time your suite passes at 100%, instead of feeling relieved, ask yourself: “what defects could still be out there that my tests can't see?” That question, repeated honestly, is what separates a team that blindly trusts its tests from one that actually protects its software.
An exercise for this week: pick three tests from your most critical module and analyze their asserts. If any of them only check that there's no exception or that the result isn't null, rewrite them so they validate real behavior. That small change already gets you closer to a suite that actually catches defects.
First principle of the seven ISTQB testing principles. The next one is Exhaustive testing is impossible.

Jose, author of the blog
QA Engineer. I write out loud about automation, AI and software architecture. If something here helped you, write to me and tell me about it.
Leave the first comment
What did you think? What would you add? Every comment sharpens the next post.
If you liked this
Variables de entorno en scripts E2E: secretos seguros en JMO Labs
Los scripts E2E necesitan datos sensibles —tokens de API, credenciales, URLs privadas— sin que aparezcan en el código. En JMO Labs hemos añadido variables de script con modo privado: se inyectan automáticamente, se enmascaran en los logs y se acceden con una sintaxis limpia.

Tests E2E que se reparan solos: cómo construimos un pipeline de self-healing con IA
Los tests E2E se rompen con cada cambio de interfaz. En JMO Labs construimos un pipeline de 5 fases con IA que planifica, ejecuta, repara selectores, diagnostica fallos y verifica resultados de forma autónoma. La caché de selectores hace que cada ejecución sea más rápida que la anterior.

Construir una plataforma de testing con Playwright: arquitectura de JMO Labs
Playwright no es solo para tests E2E. En JMO Labs lo usamos como motor completo: 9 fases de comprobación, localizador de 9 estrategias con self-healing, grabación de vídeo, testing responsive con viewports reales y accesibilidad con axe-core.