# QA Advisor — Reliability and Scale Reference

*Part of the QA Advisor skill: https://wavect.io/.well-known/agent-skills/qa-advisor/SKILL.md*

Database testing, chaos engineering, and load testing.

## Database Testing — The Most Underspecified Area

Most teams fall into one of two traps: they mock the database entirely (so tests
pass but real queries are never verified), or they write integration tests that
share state (so tests are order-dependent and randomly fail).

### Transaction Rollback Testing

The correct pattern for database integration tests: wrap each test in a
transaction and roll it back. No cleanup needed. No state leakage.

```python
# Django / SQLAlchemy pattern
@pytest.fixture(autouse=True)
def db_transaction(db):
    with transaction.atomic():
        yield
        transaction.set_rollback(True)

# TypeORM / Node.js pattern
beforeEach(() => queryRunner.startTransaction());
afterEach(() => queryRunner.rollbackTransaction());
```

**What to test at the database level (not mockable):**
- ORM query correctness: does the query return the right rows?
- Index usage: does the query hit an index or do a full table scan?
  Use `EXPLAIN ANALYZE` in tests that touch large-ish datasets.
- Constraint enforcement: unique constraints, foreign keys, not-null — these
  cannot be tested with mocked repositories
- Migration correctness: does the migration produce the exact schema expected?
  Run migrations in CI against a real database, not against mock schema

### Migration Testing

```bash
# CI step: verify migrations are reversible and idempotent
# Run up migrations
alembic upgrade head
# Run down migrations
alembic downgrade base
# Run up again — if this fails, the migration is not idempotent
alembic upgrade head
```

The most dangerous migration bugs: adding a NOT NULL column without a default
to a table with existing rows, and non-reversible data migrations. Both are
invisible until they cause a production deployment failure.

### Deadlock and Race Condition Testing

```python
# Test concurrent writes for deadlock risk
import threading, pytest

def test_concurrent_inventory_deduction():
    errors = []
    def deduct():
        try: inventory_service.deduct(product_id='SKU-1', qty=1)
        except Exception as e: errors.append(e)

    threads = [threading.Thread(target=deduct) for _ in range(20)]
    [t.start() for t in threads]
    [t.join() for t in threads]
    assert not any(isinstance(e, DeadlockError) for e in errors)
    final_stock = inventory_service.get_stock('SKU-1')
    assert final_stock >= 0  # inventory must not go negative
```

---

## Chaos Engineering — Testing Failure, Not Just Success

Most test suites verify that the system works correctly when dependencies
cooperate. Chaos engineering verifies that the system degrades gracefully
when they do not.

**The Netflix Simian Army principles applied at codebase level:**

1. Define the "steady state" — the observable behavior that indicates the system
   is healthy (e.g., orders are processed, error rate < 0.1%, P99 < 500ms)
2. Hypothesize that the steady state holds during a failure
3. Introduce the failure in a controlled way
4. Observe whether the steady state was maintained

**Failure modes to test:**

| Failure | How to test | What correct behavior looks like |
|---|---|---|
| Dependency unavailable (DB down) | `docker stop postgres` during test run | Service returns 503, circuit breaker opens |
| Slow dependency | Add artificial latency (Toxiproxy) | Timeout triggered, retry with backoff |
| Partial response (truncated) | Fault injection at HTTP layer | Error surfaced, no data corruption |
| Message queue full | Fill queue to capacity | Producer applies backpressure, does not crash |
| Disk full | Fill disk to 100% | Graceful shutdown, no data corruption |
| Clock skew | Advance system clock 1 hour | JWT expiry validated correctly, caches invalidated |

**Toxiproxy** (Shopify) is the most practical tool for introducing network-level
faults in integration tests without requiring a real network failure.

**Circuit breaker testing:**
```typescript
it('should open circuit breaker after 5 consecutive failures', async () => {
  const gateway = new PaymentGatewayWithCircuitBreaker(brokenGateway, {
    threshold: 5, resetTimeout: 30_000
  });
  for (let i = 0; i < 5; i++) {
    await expect(gateway.charge(100)).rejects.toThrow();
  }
  // 6th call must fail-fast without calling the broken gateway
  const start = Date.now();
  await expect(gateway.charge(100)).rejects.toThrow(CircuitOpenError);
  expect(Date.now() - start).toBeLessThan(10); // fast-fail, not timeout
});
```

---

## Load Testing — Methodology and Interpretation

Load tests are frequently run but rarely interpreted correctly. A load test
that does not stress the actual bottleneck of the system tells you nothing.

**The correct ramp pattern (k6):**

```javascript
// k6 load test — ramp to peak, sustain, ramp down
export const options = {
  stages: [
    { duration: '2m', target: 50 },   // warm-up
    { duration: '5m', target: 200 },  // ramp to expected peak load
    { duration: '10m', target: 200 }, // sustain — look for memory leaks
    { duration: '2m', target: 500 },  // spike — burst above peak
    { duration: '5m', target: 200 },  // recover — system must recover
    { duration: '2m', target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1500'], // P95 < 500ms, P99 < 1.5s
    http_req_failed: ['rate<0.01'],                  // error rate < 1%
  },
};
```

**The six load patterns and what each reveals:**

| Pattern | Reveals |
|---|---|
| **Ramp test** | At what load does performance degrade? |
| **Spike test** | Does the system recover after sudden burst? |
| **Soak test** (24h constant load) | Memory leaks, connection pool exhaustion, log file growth |
| **Stress test** (beyond peak) | Where does it break? Graceful or cascading? |
| **Breakpoint test** | Exact breaking point — scales linearly or exponentially? |
| **Capacity test** | Maximum throughput the system can sustain |

**The most common load test mistake:** Testing a single endpoint in isolation.
Real load tests must reflect production traffic patterns (mix of reads, writes,
searches) because bottlenecks often emerge from the interaction between
concurrent operations, not from any single one.

**Reading P95 / P99:**
P99 = 1500ms means 1% of requests take longer than 1.5 seconds. If you have
1 million requests per day, that is 10,000 requests per day with unacceptable
latency. P99 is the user experience of your worst 1% — it should be part of
your SLA, not your average.