Research · Software Testing
Software Testing in CI/CD Pipelines
Continuous integration (CI) and continuous delivery (CD) pipelines automate the process of building, testing, and deploying software. At the heart of every effective pipeline is a reliable automated test suite. When that suite is unreliable — slow, flaky, or poorly organised — the entire pipeline suffers.
By Hasnain Iqbal · Updated 1 April 2025
What CI/CD Pipelines Expect from Tests
A CI/CD pipeline treats the test suite as a quality gate. Every time code is pushed, the pipeline runs tests and either lets the change through or blocks it.
For this to work well, tests must be:
Fast — a 60-minute test suite blocks developers for an hour per commit. Target under 10 minutes for the main feedback loop; run slower tests in parallel or on a separate schedule.
Deterministic — a test must always produce the same result on the same code. Non-deterministic (flaky) tests undermine the trustworthiness of the gate.
Independent — tests must not rely on execution order. Any test should be able to run in any position without affecting others.
Complete — the suite should cover the code changes being merged, particularly the critical paths.
Common Testing Problems in CI/CD
Flaky tests — tests that randomly pass or fail create noise and erode trust. Developers begin ignoring failures, masking real bugs. Research suggests that 14–16% of test failures in large suites are attributable to test flakiness.
Slow builds — as suites grow, build times creep up. When a full run takes 45 minutes, developers push fewer commits, batch changes, and accumulate more risk per deployment.
Insufficient parallelisation — running tests sequentially when they could run in parallel is the most common cause of unnecessarily slow pipelines.
Poor test granularity — end-to-end tests are slow, brittle, and expensive to maintain. A healthy test suite has many fast unit tests, fewer integration tests, and very few E2E tests (the testing pyramid).
Missing test environments — tests that only run locally, not in CI, miss environment-specific failures. All tests that matter should run in CI.
Structuring Tests for CI/CD
The testing pyramid — the classical model divides tests into three layers: many unit tests (fast, cheap), fewer integration tests (medium), and very few end-to-end tests (slow, expensive). CI/CD pipelines should reflect this pyramid.
Test splitting — divide the suite into fast (unit) and slow (integration/E2E) groups. Run fast tests on every push. Run slow tests on merge to main, or on a scheduled basis.
Parallelisation — use CI platforms' built-in parallelisation to distribute tests across multiple workers. Combine with test prioritization to ensure each worker is loaded evenly.
Caching — cache dependencies and build artefacts between runs. Most CI platforms support this natively.
Quarantine — move known flaky tests to a separate, non-blocking suite while they are being fixed. Do not let known flaky tests block production deployments.
Measuring Testing Health
Track these metrics to understand the health of your testing pipeline:
Pass rate — what percentage of builds pass on the first run? Low pass rates indicate either real bugs or high flakiness.
Flake rate — what percentage of test failures are not reproducible on retry? High flake rates indicate systemic unreliability.
Build time — how long does the full pipeline take? Track over time to catch regressions.
Test coverage — what percentage of the codebase is covered by tests? Track critical paths specifically.
Time to feedback — how long does a developer wait to know if their change is good? This is the metric that most directly affects developer productivity.
Frequently asked questions
What is CI/CD?▼
CI/CD stands for Continuous Integration and Continuous Delivery (or Deployment). CI is the practice of automatically building and testing every code change. CD extends this to automatically deploying changes that pass the CI stage.
Why do tests fail in CI but pass locally?▼
CI environments differ from local environments in several ways: they may have different operating systems, fewer resources, stricter network policies, or run tests in parallel. Flaky tests that depend on timing, resource availability, or specific orderings are especially prone to failing in CI.
What is the testing pyramid?▼
The testing pyramid is a model for test suite composition. The base is unit tests — many, fast, cheap. The middle is integration tests — fewer, slower. The top is end-to-end tests — very few, slow, expensive. Healthy CI/CD pipelines reflect this pyramid.
How do I speed up my CI test suite?▼
Common approaches: parallelise tests across multiple workers, split tests into fast and slow groups (run only fast tests on every push), cache dependencies between runs, use test prioritization to fail fast, and eliminate or quarantine slow flaky tests.
What is a test quarantine?▼
A test quarantine is a separate test suite where known flaky tests are moved while they await fixing. Quarantined tests run separately and do not block the main CI build. This prevents flaky tests from blocking deployments while ensuring they are not forgotten.
Related topics
What Are Flaky Tests?
Flaky tests are automated tests that produce different results — pass or fail — without any change to the source code. Learn what causes them, why they matter, and how to address them in CI/CD pipelines.
What Are Order-Dependent Flaky Tests?
Order-dependent (OD) flaky tests are tests that only fail when run after a specific other test. They are a significant source of CI/CD instability. Learn how they work, how to detect them, and how to fix them.
What Is Test Prioritization in Software Testing?
Test prioritization is the practice of ordering tests so that the most valuable tests run first. In CI/CD, it reduces feedback time and helps detect flaky or order-dependent tests faster. Learn the strategies and research behind it.