Patterns for the Age of Agents

The shapes that emerged because LLM-driven agents now write, refactor, test, and review code at machine speed.

None of these patterns are agent-specific in the sense of belonging to no other era. Naming discipline, small modules, property tests, type checkers, and staged rollouts have always paid off. Agents punish the codebases that already neglected them and reward the ones that didn't. What is new is the cadence: feedback loops have shrunk from days to minutes, and the bottleneck is no longer keystrokes. It is judgment about what to build and which diffs to trust.

Three groups, in order: how a codebase is shaped so an agent can do useful work, how that work is verified before merge, and how humans stay in the decision path without becoming the bottleneck. The Foundations section anchors the lineage. For the timeless shapes that pre-date all of this, see Patterns of Practice.

Foundations

The lineage. Essays, talks, and books the rest of this directory is in conversation with. Some load-bearing today, some worth knowing as the argument behind a pattern.

AGENTS.md

Best Practices Agentic AI Foundation

A README for agents. An open Markdown convention for build steps, conventions, and guardrails that coding agents actually read.

Vibing a Non-Trivial Ghostty Feature

Discussion Mitchell Hashimoto Oct 2025

Hashimoto ships a real Ghostty feature across 16 agent sessions for $15.98 and publishes the full transcripts. A concrete look at staying the architect.

Designing agentic loops

Discussion Simon Willison Sep 2025

Willison frames agent skill as designing the loop itself. Pick the right tools, scope the goal, and run safely in YOLO mode.

Spec Kit

Tutorial GitHub Sep 2025

GitHub's open toolkit for Spec-Driven Development. Spec, Plan, Tasks, Implement, with each phase producing an artifact the next phase consumes.

Ralph Wiggum as a "software engineer"

Discussion Geoffrey Huntley Jul 2025

Huntley's Ralph loop. A bash while-loop feeding the same prompt to a coding agent until it converges. Brute-force agentic engineering, working.

Software Is Changing (Again)

Talk Andrej Karpathy · YC AI Startup School Jun 2025

Karpathy lays out Software 3.0: LLMs as a new computer programmed in English, with humans verifying what models generate.

The New Code

Talk Sean Grove · OpenAI (AI Engineer World's Fair 2025) Jun 2025

Grove argues specifications, not code, are the durable artifact. The spec compiles to implementations. Prompts thrown away are wasted source.

Practices for Governing Agentic AI Systems

Best Practices Shavit, Agarwal et al. · OpenAI Dec 2023

Baseline practices for keeping agentic systems aligned with operator intent: scoping, oversight, interruptibility, accountability.

Exploring Generative AI

Discussion Birgitta Boeckeler et al. · Thoughtworks (martinfowler.com) Jul 2023

A running Thoughtworks memo series on AI-assisted delivery. Context engineering, harness design, spec-driven coding, what holds up in practice.

Cheating is all you need

Discussion Steve Yegge · Sourcegraph Mar 2023

Yegge's early call that LLM-augmented coding is a step change, not a parlor trick. The polemic that primed many engineers to take agents seriously.

The Bitter Lesson

Discussion Rich Sutton Mar 2019

Sutton's argument that general methods leveraging compute beat human-engineered cleverness. The intellectual backdrop for letting agents search.

Specification by Example

Tutorial Gojko Adzic · Manning May 2011

Adzic's case studies on turning concrete examples into shared specifications that drive delivery. The pre-agent root of spec-driven coding.

Patterns by Category

Patterns grouped by where they live in the agent workflow. Each category intro names the scope. Click a pattern name for its canonical reference, then read the per-pattern resources below.

Spec-First Patterns

How a codebase is shaped so an agent can do useful work on it. Naming discipline, module boundaries, AGENTS.md as a machine-readable contract, and specification-by-example are about giving the agent the same on-ramp a competent new hire gets. None of this is agent-specific. Agents punish the codebases that already neglected it.

Naming Discipline Names are the agent's primary index into the codebase. Ambiguity multiplies into wrong calls and bad PRs.
Module Boundaries Modules small enough that an agent can hold the whole thing in its context window and reason about it.
AGENTS.md A README written for agents. Build steps, test commands, conventions, and guardrails the agent should respect.
Specification by Example Encode the spec as concrete input-output examples that double as documentation and test fixtures.
Spec-Driven Development Treat the spec as the source artifact. Plan, tasks, and code are downstream products the agent regenerates.

Two Hard Things

Discussion Martin Fowler

Fowler's bliki entry on the joke that names half of every onboarding doc. Cache invalidation and naming things.

Naming as a Process

Tutorial Arlo Belshee

Belshee's seven-step ladder from "nonsense" to "domain abstraction". A working method for renaming legacy code one move at a time.

The Power of Ten: Rules for Developing Safety-Critical Code

Best Practices Gerard J. Holzmann · NASA JPL

Ten rules for keeping modules readable by humans and agents. Small functions, no surprises, bounded loops.

Modular Monolith: A Primer

Tutorial Kamil Grzybek

How to draw hard module boundaries inside one deployable. The shape agents work best inside.

AGENTS.md spec

Best Practices Agentic AI Foundation

Community spec for the AGENTS.md file. Conventions, commands, and tests an agent should respect when changing this repo.

How to write AGENTS.md

Tutorial Amp · Sourcegraph

Amp's own AGENTS.md, published as a worked example. Build commands, code style, testing, and PR conventions in one file.

Specification by Example

Tutorial Gojko Adzic · Manning

Adzic's reference text. Living specs that double as tests, written in concrete examples humans and machines can read.

Spec-Driven Development with AI: Get Started with a New Open Source Toolkit

Tutorial GitHub blog Sep 2025

GitHub's introduction to Spec Kit. Spec, Plan, Tasks, Implement, each producing an artifact the next phase consumes.

The New Code

Talk Sean Grove · OpenAI Jun 2025

Grove makes the case that specifications, not prompts or code, are the durable artifact. The spec compiles. Prompts thrown away are wasted source.

Verification Patterns

How an agent's work is verified at machine speed. Property tests, snapshot tests, contract tests, types, differential testing, and evals close the loop without lengthening the review queue. The agent's job is to clear the fence. Humans look at the diff only when the fence catches something interesting.

Property Tests as Agent Fence Use stated invariants as the safety rail the agent's diff has to clear, not just as bug-finding tools.
Snapshot and Golden Tests Lock current outputs as truth so a refactor either reproduces them exactly or surfaces its disagreements.
Contract Tests and Types as Fences Use type checkers and contract tests as the cheap fast layer the agent has to satisfy before a human looks.
Differential Testing Run the agent's rewrite next to the original against live traffic. Ship only when disagreement reaches zero.
Evals Fixture sets of representative tasks. Track pass rate, regression rate, and cost across prompt and model changes.

What is Property Based Testing?

Tutorial Hypothesis

The Hypothesis project's intro essay. State invariants and let the runner generate inputs. The starting point for property testing on real code.

Property-Based Testing in Python

Tutorial Hypothesis docs

Hands-on quickstart. From decorator to passing assertion, with strategies for generating realistic inputs.

Snapshot Testing

Tutorial Jest docs

Jest's reference treatment. Record current output as truth, fail when it changes. Useful when the spec is "whatever it does today."

ContractTest

Discussion Martin Fowler

Fowler's bliki entry on contract tests. Narrow fast tests that pin down the shape of an inter-service contract.

Type-Driven Development with Idris

Tutorial Edwin Brady · Manning

Brady's book on types-as-fences taken to its conclusion. The type checker as the first and cheapest verifier.

Scientist

Tutorial GitHub

GitHub's open-source library for differential testing. Run new code in shadow next to old, compare, switch over.

How we rebuilt Next.js with AI in one week

Discussion Cloudflare blog

vinext at 94% of the Next.js API surface, scored against 1,700 Vitest plus 380 Playwright tests ported straight from Next.js. Differential testing at framework scale.

A 10x Faster TypeScript

Discussion Anders Hejlsberg · Microsoft

Hejlsberg on porting tsc to Go for semantic parity. The reason it's a port and not a rewrite is so the existing test suite stays the source of truth.

Anthropic's Bun team trials port from Zig to Rust

Discussion The Register

Bun's AI-assisted Zig-to-Rust port at 99.8% of the pre-existing test suite. The rewrite stays in shadow until disagreement hits zero, exactly the pattern.

Building effective agents

Best Practices Anthropic

Anthropic's reference on agent design. Workflows vs. agents, tool design, evals, and where the autonomy band actually pays off.

Demystifying evals for AI agents

Best Practices Anthropic Engineering

Anthropic's working notes on building eval suites for agents. Treat the agent like untrusted input. Run a fixture set on every change.

Human-in-the-Loop Patterns

How humans stay in the decision path without becoming the bottleneck. Stage gates, confidence-tiered autonomy, review queue design, agent-as-reviewer, and tight feedback loops like the Ralph loop are about giving humans the calls only they can make and letting agents do the rest.

Stage Gates Split agent work into plan, propose, and apply with a human checkpoint between each, not one big leap.
Confidence-Tiered Autonomy Agents act on low-risk classes, propose on medium-risk, and ask on high-risk. Risk graded per category.
Review Queue Design Treat the inbound stream of agent PRs as an explicit queue with rules and SLAs, not an ad-hoc reviewer pileup.
Agent-as-Pair and Reviewer Put the agent on the reviewer seat for human work. The cheap pass catches the boring things humans miss.
The Ralph Loop A bash while-loop feeds the same prompt to a coding agent until it converges. Brute-force, working.

Building effective agents

Best Practices Anthropic

Anthropic's reference on agent design. Workflows vs. agents, tool design, evals, and where the autonomy band actually pays off.

Designing agentic loops

Discussion Simon Willison Sep 2025

Willison's essay on splitting agent runs into plan, propose, and apply phases. Where the human checkpoints actually sit.

Vibing a Non-Trivial Ghostty Feature

Discussion Mitchell Hashimoto Oct 2025

Sixteen agent sessions to ship one Ghostty feature, all transcripts published. A working example of stage gates as a real-world workflow.

Practices for Governing Agentic AI Systems

Best Practices Shavit, Agarwal et al. · OpenAI Dec 2023

OpenAI's framework for tiering agent autonomy by risk class. The underpinnings of confidence-tiered policies.

Software Is Changing (Again)

Talk Andrej Karpathy · YC AI Startup School Jun 2025

Karpathy on the autonomy slider. The Tesla Autopilot analogy for handing tasks to an agent gradually instead of full autonomy on day one.

How to do a code review (Google)

Best Practices Google Engineering Practices

Google's public reviewer guide. The rules and SLAs that scale a review queue without sinking individual reviewers.

Pull Request Reviews with Copilot

Tutorial GitHub docs

GitHub's docs on letting Copilot review PRs. The cheap pass that catches the boring things humans miss.

GitHub Copilot Workspace

Discussion GitHub blog Apr 2024

GitHub's announcement for Copilot Workspace. The working sketch of the agent-as-reviewer interaction.

Ralph Wiggum as a "software engineer"

Discussion Geoffrey Huntley Jul 2025

Huntley's original Ralph loop post. A bash while-loop feeding the same prompt to a coding agent until it converges.

Creators to follow

Writers and engineers consistently publishing substantive material on agentic coding, spec-driven workflows, and what changes when AI joins the team.

Simon Willison blog · @simonw Daily field notes on LLM tooling, agentic coding, and the practical limits of AI-assisted development. Andrej Karpathy YouTube · @karpathy Long-form lectures on neural networks, LLMs, and how agents actually learn. Also posts at karpathy.ai and on X. Birgitta Boeckeler martinfowler.com · @bboeckel Thoughtworks lead writing field memos on agentic coding, harness engineering, and what changes when AI joins the team. Geoff Huntley blog · @ghuntley Sharp writeups on the Ralph loop, autonomous coding agents, and what it looks like to run Claude in production. Sean Grove YouTube · @sgrove OpenAI engineer making the case that specifications, not prompts or code, are the new unit of programming. Anthropic Engineering engineering blog · @anthropic Practical writeups on building effective agents, designing tools, and evaluating agent output in production. Mitchell Hashimoto blog · @mitchellh HashiCorp founder and Ghostty author writing on how agentic tools fit into serious systems work. Chip Huyen blog · @chiphuyen Author of AI Engineering and Designing ML Systems. Writes on shipping AI to production without the magic.

Frequently Asked Questions

Short answers to the questions that recur about working with coding agents in real codebases.

How do I structure a codebase so an agent can be useful in it?

Give the agent the same on-ramp you would give a new hire. Add an AGENTS.md at the repo root with build, test, and convention notes. Keep modules small and well-named. Make sure the test suite runs cleanly from a fresh clone. Agents do their best work in codebases where humans already do their best work.

Source: AGENTS.md

What's the difference between spec-driven development and just writing good tests?

Tests check behavior after you decide what to build. A spec captures intent before the code exists: goals, constraints, success criteria, and example inputs and outputs. Sean Grove's framing is that the spec is the new source code and tests are one of several artifacts generated from it. Good tests are necessary but not sufficient.

Source: The New Code (Sean Grove, OpenAI)

When should an agent merge without human review?

Only when the blast radius is small, the tests are trustworthy, and the change is reversible. Think dependency bumps, formatting passes, or scoped refactors behind a feature flag. Anything that touches auth, data migrations, billing, or public APIs still wants a human signoff. Start narrow and widen the autonomy band as your evals catch real regressions.

Source: Anthropic: Building effective agents

How do I keep agent-generated code reviewable at scale?

Make the agent write small, single-purpose PRs with a clear summary of intent and the spec or task it was working from. Require the same conventions you require of humans: meaningful commit messages, passing tests, no drive-by changes. Birgitta Boeckeler's memos on harness engineering are a good model. Invest in the scaffolding that makes the output legible.

Source: Exploring Gen AI (Birgitta Boeckeler)

What's AGENTS.md, and is it worth adopting now?

AGENTS.md is an open format for telling coding agents how your project works: build commands, test commands, conventions, gotchas. It is stewarded by the Agentic AI Foundation and read natively by major agents including Claude Code, Codex, and Gemini CLI. Adoption is cheap, the file is just Markdown, and you get immediate leverage. Worth adding today.

Source: AGENTS.md spec

How do I evaluate an agent's output without reading every diff?

Treat the agent like any other untrusted input: write evals. Build a fixture set of representative tasks, run the agent against them on every prompt or model change, and track pass rate, regression rate, and cost per task. Anthropic's writeups on demystifying evals are a solid starting point. Spot-check diffs only when the eval surfaces something interesting.

Source: Anthropic Engineering

Where does the human stay in the loop, and where does the agent take over?

Humans own intent, constraints, and the merge button. Agents own the mechanical work in between: drafting, refactoring, running tests, proposing changes. The GitHub Spec Kit workflow makes this concrete. Humans write and approve the spec, plan, and tasks. The agent implements against them. Keep the handoff points explicit and you keep your judgment where it matters.

Source: GitHub Spec Kit

From Smarter Dev

Original writing coming.

Smarter Dev essays, walkthroughs, and short courses on shipping with agents, designing your harness, and keeping judgment where it matters will land here as they're written.

Join the Discord to be notified

Last updated May 14, 2026