Testing and Quality

Mar 18, 2026

Computer Science

I used to think testing was something you did after writing code—a checkbox on the PR template. Then I shipped a "minor refactor" that broke checkout for six hours because the one scenario I didn't test was the one customers hit. After that, testing stopped being a chore and started being a design tool. The test isn't proving your code works. It's defining what "works" means before you have to explain it in a post-mortem.

This post covers the testing strategies that actually prevent bugs in production, the tradeoffs between different testing levels, and the practices that keep code quality high over time.

The Testing Pyramid

The testing pyramid is a model for how to allocate your testing effort. Many fast, cheap tests at the bottom. Fewer slow, expensive tests at the top.

        /  E2E  \          Few — slow, brittle, high confidence
       /----------\
      / Integration \      Some — moderate speed, real boundaries
     /----------------\
    /    Unit Tests     \  Many — fast, isolated, focused
   /---------------------\

Unit tests verify individual functions or components in isolation. They're fast (milliseconds), cheap to write, and easy to debug when they fail.

Integration tests verify that multiple units work together correctly—your API route calls the database, your component renders with real data, your auth middleware correctly blocks unauthorized requests.

End-to-end (E2E) tests verify the entire system from the user's perspective—open a browser, click buttons, fill forms, assert results. They're slow, flaky, and expensive, but they catch bugs that no other level can.

The Practical Distribution

The pyramid is a guideline, not a religion. For a typical web application:

70% unit tests: Pure functions, business logic, data transformations, utility functions.
20% integration tests: API endpoints, database queries, component rendering with context.
10% E2E tests: Critical user flows—signup, checkout, payment, core feature.

The mistake I see most often: teams skip unit and integration tests and write only E2E tests. The test suite takes 20 minutes to run, is constantly flaky, and nobody trusts it. Invert the pyramid at your own risk.

Unit Tests

A unit test verifies a single unit of behavior in isolation. The unit might be a function, a class method, or a React component.

// The function
function calculateDiscount(
  price: number,
  tier: "basic" | "premium" | "vip"
): number {
  if (tier === "vip") return price * 0.7
  if (tier === "premium") return price * 0.85
  return price
}
 
// The tests
import { describe, it, expect } from "vitest"
 
describe("calculateDiscount", () => {
  it("applies 30% discount for VIP tier", () => {
    expect(calculateDiscount(100, "vip")).toBe(70)
  })
 
  it("applies 15% discount for premium tier", () => {
    expect(calculateDiscount(100, "premium")).toBe(85)
  })
 
  it("applies no discount for basic tier", () => {
    expect(calculateDiscount(100, "basic")).toBe(100)
  })
 
  it("handles zero price", () => {
    expect(calculateDiscount(0, "vip")).toBe(0)
  })
})

What makes a good unit test:

Fast. Milliseconds, not seconds. If your unit tests hit the network or database, they're integration tests.
Isolated. No shared state between tests. Each test sets up its own data and cleans up after itself.
Deterministic. Same input, same output, every time. No randomness, no timestamps, no network calls.
Focused. Tests one behavior. When it fails, you know exactly what broke.

Testing React Components

import { render, screen } from "@testing-library/react"
 
function Greeting({ name }: { name: string }) {
  return <h1>Hello, {name}!</h1>
}
 
it("renders the greeting with the given name", () => {
  render(<Greeting name="Alice" />)
  expect(screen.getByRole("heading")).toHaveTextContent("Hello, Alice!")
})

Test behavior, not implementation. Don't assert that useState was called with a specific value. Assert that the user sees the right thing on screen.

Integration Tests

Integration tests verify that components work together across boundaries—your code talking to a real (or realistic) database, your API endpoint handling a full request-response cycle, your component rendering with actual context providers.

import { describe, it, expect, beforeAll, afterAll } from "vitest"
import { createTestDatabase, destroyTestDatabase } from "./test-helpers"
 
describe("UserRepository", () => {
  let db: TestDatabase
 
  beforeAll(async () => {
    db = await createTestDatabase()
    await db.migrate()
  })
 
  afterAll(async () => {
    await destroyTestDatabase(db)
  })
 
  it("creates and retrieves a user", async () => {
    const repo = new UserRepository(db)
 
    await repo.save({ id: "1", name: "Alice", email: "alice@test.com" })
    const user = await repo.findById("1")
 
    expect(user).toEqual({
      id: "1",
      name: "Alice",
      email: "alice@test.com",
    })
  })
 
  it("returns null for non-existent user", async () => {
    const repo = new UserRepository(db)
    const user = await repo.findById("nonexistent")
    expect(user).toBeNull()
  })
})

Integration tests are slower than unit tests but catch bugs that unit tests can't: SQL syntax errors, missing database indexes, incorrect join logic, serialization issues.

API Integration Tests

import { describe, it, expect } from "vitest"
 
describe("POST /api/orders", () => {
  it("creates an order for authenticated users", async () => {
    const response = await fetch("http://localhost:3000/api/orders", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${testToken}`,
      },
      body: JSON.stringify({ items: [{ id: "prod_1", quantity: 2 }] }),
    })
 
    expect(response.status).toBe(201)
    const order = await response.json()
    expect(order.items).toHaveLength(1)
    expect(order.status).toBe("pending")
  })
 
  it("returns 401 for unauthenticated requests", async () => {
    const response = await fetch("http://localhost:3000/api/orders", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ items: [] }),
    })
 
    expect(response.status).toBe(401)
  })
})

End-to-End Tests

E2E tests automate a real browser and interact with your application as a user would. They're the ultimate confidence test—if the E2E passes, the feature works for real users.

import { test, expect } from "@playwright/test"
 
test("user can sign up and see their dashboard", async ({ page }) => {
  await page.goto("/signup")
 
  await page.fill('[name="email"]', "test@example.com")
  await page.fill('[name="password"]', "SecurePass123!")
  await page.click('button[type="submit"]')
 
  await expect(page).toHaveURL("/dashboard")
  await expect(page.getByRole("heading")).toContainText("Welcome")
})
 
test("user can add item to cart and checkout", async ({ page }) => {
  await page.goto("/products")
 
  await page.click('[data-testid="product-1"] button')
  await page.click('[data-testid="cart-icon"]')
 
  await expect(page.getByTestId("cart-count")).toHaveText("1")
 
  await page.click("text=Checkout")
  await expect(page).toHaveURL("/checkout")
})

E2E tradeoffs:

Pros: Highest confidence. Catches CSS, JavaScript, API, and database issues together.
Cons: Slow (seconds to minutes per test), flaky (network timeouts, animation timing), expensive to maintain (UI changes break selectors).

Best practice: Only write E2E tests for critical user journeys—the paths that, if broken, would cost you revenue or users. Use stable selectors (data-testid, roles) rather than CSS classes or text content.

Test Doubles: Mocks, Stubs, and Fakes

When a unit test needs to isolate from a dependency (database, API, file system), you replace the real dependency with a test double.

Stub: Returns a predetermined response. Doesn't verify how it was called.

const stubUserRepo = {
  findById: async (id: string) => ({
    id,
    name: "Alice",
    email: "alice@test.com",
  }),
}

Mock: Records interactions and asserts that specific calls were made.

import { vi } from "vitest"
 
const mockEmailService = {
  sendWelcomeEmail: vi.fn(),
}
 
await userService.createUser(data)
expect(mockEmailService.sendWelcomeEmail).toHaveBeenCalledWith("alice@test.com")

Fake: A working implementation that's simpler than the real thing.

class InMemoryUserRepository implements UserRepository {
  private users = new Map<string, User>()
 
  async save(user: User) {
    this.users.set(user.id, user)
  }
  async findById(id: string) {
    return this.users.get(id) ?? null
  }
  async findByEmail(email: string) {
    return [...this.users.values()].find((u) => u.email === email) ?? null
  }
}

When to use which:

Stubs when you just need to control the return value.
Mocks when you need to verify that a side effect happened (email sent, event emitted).
Fakes when you need realistic behavior without real infrastructure (in-memory databases, fake HTTP servers).

The over-mocking trap: Tests that mock everything test nothing. If your unit test mocks the function under test's only dependency and then asserts the mock was called—what have you actually verified? Mock at boundaries (database, HTTP, file system), not within your own code.

TDD vs Test-After

Test-Driven Development (TDD): Write a failing test first, write the minimum code to pass it, refactor. Red → Green → Refactor.

1. Write test: expect(add(2, 3)).toBe(5)         → RED (add doesn't exist)
2. Write code: function add(a, b) { return a + b } → GREEN
3. Refactor: (nothing to refactor here)

Test-after: Write the code first, then write tests to verify it.

The honest comparison:

Aspect	TDD	Test-after
Design feedback	High — forces you to think about interfaces first	Low — tests are shaped by the implementation
Coverage	High — every behavior has a test by construction	Variable — depends on discipline
Speed (initial)	Slower — writing tests before code feels unnatural	Faster — just write the code
Speed (long-term)	Faster — fewer bugs, easier refactoring	Slower — bugs found later, refactoring is risky
Suitability	Pure logic, business rules, utilities	UI, exploratory work, prototypes

I use TDD for business logic and pure functions. I use test-after for UI components and integration points. The important thing isn't the order—it's that tests exist and test behavior, not implementation.

Property-Based Testing

Traditional tests check specific examples. Property-based testing checks that a property holds for all possible inputs (by generating random ones).

import { fc } from "@fast-check/vitest"
 
// Traditional: specific examples
it("sorts an array", () => {
  expect(sort([3, 1, 2])).toEqual([1, 2, 3])
})
 
// Property-based: the property holds for ALL arrays
it("sorted array has same length as input", () => {
  fc.assert(
    fc.property(fc.array(fc.integer()), (arr) => {
      expect(sort(arr)).toHaveLength(arr.length)
    })
  )
})
 
it("sorted array is monotonically non-decreasing", () => {
  fc.assert(
    fc.property(fc.array(fc.integer()), (arr) => {
      const sorted = sort(arr)
      for (let i = 1; i < sorted.length; i++) {
        expect(sorted[i]).toBeGreaterThanOrEqual(sorted[i - 1])
      }
    })
  )
})

Property-based tests find edge cases you'd never think to write: empty arrays, arrays with one element, arrays with all identical elements, arrays with NaN, very large arrays.

Good properties to test:

Roundtrip: deserialize(serialize(x)) equals x.
Idempotence: f(f(x)) equals f(x) (applying twice = applying once).
Invariants: A sorted array is always non-decreasing. A balanced tree always has O(log n) height.
Commutativity: add(a, b) equals add(b, a).

Refactoring Discipline

Refactoring is changing the structure of code without changing its behavior. Tests are what make refactoring safe.

The refactoring loop:

Ensure tests pass (green).
Make a small structural change.
Run tests immediately.
If green, continue. If red, revert and try a smaller change.

Without tests, refactoring is gambling. With tests, it's routine maintenance.

Code smells that signal a need for refactoring:

Duplicated logic in multiple places (extract a function).
Long functions (>30 lines) doing multiple things (split by responsibility).
Deep nesting (3+ levels of if/for) (extract conditions, use early returns).
Primitive obsession (passing string where a type would be clearer).
Feature envy (a function that mostly accesses data from another module—move it there).

Measuring Quality Beyond Coverage

Code coverage measures which lines/branches your tests execute. 80% coverage means 20% of your code is untested. But coverage is a measure of quantity, not quality.

// 100% coverage, zero value
it("calls the function", () => {
  calculateDiscount(100, "vip") // no assertion!
})

This test executes the code but verifies nothing. Coverage is a ceiling indicator (low coverage = definitely undertested), not a floor indicator (high coverage ≠ well tested).

Better quality signals:

Mutation testing: Automatically changes your code (mutants) and checks if tests fail. If a mutant survives (tests still pass despite the change), you have a gap.
Test failure rate in CI: If tests never fail, they're either too simple or the code never changes.
Time to diagnose failures: If a failing test points you directly to the bug, it's a good test. If you have to debug the test itself, it's testing implementation, not behavior.
Confidence to refactor: The ultimate test of your test suite is whether you feel safe making structural changes.

The Pragmatic Takeaway

Testing isn't about proving your code is correct—it's about defining the behavior you care about and getting fast feedback when something violates it.

Write unit tests for business logic. Write integration tests for boundaries. Write a few E2E tests for critical flows. Mock at the edges, not in the middle. Test behavior, not implementation. If your tests make refactoring harder instead of easier, your tests are the problem.

The best test suite isn't the one with 100% coverage. It's the one that gives the team confidence to ship changes on Friday afternoon without breaking a sweat.