# QA Advisor — Delivery Health Reference

*Part of the QA Advisor skill: https://wavect.io/.well-known/agent-skills/qa-advisor/SKILL.md*

DORA metrics as a quality signal, and shift-left testing (where each test type belongs in the pipeline).

## DORA Metrics — Delivery Health as a Quality Signal

DORA (DevOps Research and Assessment) metrics measure delivery pipeline health.
They are tightly correlated with software reliability and quality. A team with
poor DORA metrics is a team that cannot safely change their system.

### The Four Metrics

**Deployment Frequency** — How often do you deploy to production?
- Elite: multiple times per day
- High: once per day to once per week
- Medium: once per week to once per month
- Low: less than once per month

Low deployment frequency correlates with large batch sizes, which correlate
with high-risk deployments, which correlate with more production incidents.
If a team cannot deploy daily, the test suite is part of the reason — either
it is too slow, too flaky, or requires too much manual verification.

**Lead Time for Changes** — From commit to production: how long?
- Elite: less than one hour
- High: one hour to one day
- Medium: one day to one week
- Low: more than one week

Long lead time means changes are batched. Batched changes mean correlated failures.
If CI takes 40 minutes, deploys are manual, and there is a staging environment
that requires human sign-off, the lead time is measured in days — not hours.

**Mean Time to Restore (MTTR)** — When a production incident occurs, how long
to restore service?
- Elite: less than one hour
- High: less than one day
- Low: more than one day

MTTR is primarily a function of observability (can you find the cause?) and
deployment speed (can you ship the fix quickly?). If MTTR is high, the testing
strategy likely does not include rollback testing or feature flag testing.

**Change Failure Rate** — What percentage of production deployments cause a
production incident?
- Elite: 0–5%
- High: 5–15%
- Medium: 15–30%
- Low: more than 30%

High change failure rate is the most direct evidence that the test strategy
is failing to catch real bugs before production.

**The DORA audit questions:**
- What is the current deployment frequency? Can it be verified from CI/CD logs?
- What is the P50 lead time from merge to production?
- What was the MTTR for the last 5 production incidents?
- What is the change failure rate over the last 90 days?
- Is there a feature flag system? Are flags used to decouple deploy from release?

---

## Shift-Left Testing — Where Each Test Type Belongs

"Shift-left" means moving testing earlier in the development pipeline. The
later a bug is found, the more expensive it is to fix. Exponentially more
expensive.

**The cost multiplier (empirical, from NIST):**
| Phase found | Relative cost |
|---|---|
| During design / requirements | 1× |
| During coding | 6× |
| During integration testing | 15× |
| During system testing | 40× |
| In production | 100× |

**The pipeline map — each stage and what should run:**

```
┌─ Developer's machine (pre-commit hook) ───────────────────────────────┐
│  • Type checking (tsc --noEmit / mypy / cargo check)                 │
│  • Linting (eslint / ruff / clippy)                                   │
│  • Unit tests (< 30 seconds)                                          │
└───────────────────────────────────────────────────────────────────────┘
                    ↓
┌─ PR pipeline (every commit to a branch) ──────────────────────────────┐
│  • All of above + full unit test suite                                │
│  • Dependency vulnerability scan (npm audit / pip-audit)             │
│  • SAST (Semgrep / CodeQL / SonarQube)                               │
│  • Integration tests against real services (Docker Compose)          │
│  • Contract tests (Pact provider verification)                       │
│  • Coverage enforcement (fail if below threshold)                    │
└───────────────────────────────────────────────────────────────────────┘
                    ↓
┌─ Merge to main ───────────────────────────────────────────────────────┐
│  • All of above + E2E tests (Playwright / Cypress on staging)         │
│  • Performance regression test (k6 baseline comparison)              │
│  • Visual regression (Percy / Chromatic)                             │
│  • DAST (OWASP ZAP against staging endpoint)                         │
└───────────────────────────────────────────────────────────────────────┘
                    ↓
┌─ Production deploy ────────────────────────────────────────────────────┐
│  • Smoke tests (critical path verification post-deploy)              │
│  • Synthetic monitoring (every 5 minutes, canary region first)       │
│  • Rollback trigger if error rate > threshold within 10 minutes      │
└───────────────────────────────────────────────────────────────────────┘
```

**What most pipelines are missing:**
- Pre-commit hooks (tests run only in CI = 15-minute feedback loop minimum)
- Dependency vulnerability scanning (added only after a breach)
- Contract tests (added only after a breaking API change hits production)
- Performance regression (added only after a slow release ships)
- Rollback automation (added only after a bad release stayed up too long)
