AGENTS.md
A README for agents. An open Markdown convention for build steps, conventions, and guardrails that coding agents actually read.
The shapes that emerged because LLM-driven agents now write, refactor, test, and review code at machine speed.
None of these patterns are agent-specific in the sense of belonging to no other era. Naming discipline, small modules, property tests, type checkers, and staged rollouts have always paid off. Agents punish the codebases that already neglected them and reward the ones that didn't. What is new is the cadence: feedback loops have shrunk from days to minutes, and the bottleneck is no longer keystrokes. It is judgment about what to build and which diffs to trust.
Three groups, in order: how a codebase is shaped so an agent can do useful work, how that work is verified before merge, and how humans stay in the decision path without becoming the bottleneck. The Foundations section anchors the lineage. For the timeless shapes that pre-date all of this, see Patterns of Practice.
The lineage. Essays, talks, and books the rest of this directory is in conversation with. Some load-bearing today, some worth knowing as the argument behind a pattern.
A README for agents. An open Markdown convention for build steps, conventions, and guardrails that coding agents actually read.
Hashimoto ships a real Ghostty feature across 16 agent sessions for $15.98 and publishes the full transcripts. A concrete look at staying the architect.
Willison frames agent skill as designing the loop itself. Pick the right tools, scope the goal, and run safely in YOLO mode.
GitHub's open toolkit for Spec-Driven Development. Spec, Plan, Tasks, Implement, with each phase producing an artifact the next phase consumes.
Huntley's Ralph loop. A bash while-loop feeding the same prompt to a coding agent until it converges. Brute-force agentic engineering, working.
Karpathy lays out Software 3.0: LLMs as a new computer programmed in English, with humans verifying what models generate.
Grove argues specifications, not code, are the durable artifact. The spec compiles to implementations. Prompts thrown away are wasted source.
Baseline practices for keeping agentic systems aligned with operator intent: scoping, oversight, interruptibility, accountability.
A running Thoughtworks memo series on AI-assisted delivery. Context engineering, harness design, spec-driven coding, what holds up in practice.
Yegge's early call that LLM-augmented coding is a step change, not a parlor trick. The polemic that primed many engineers to take agents seriously.
Sutton's argument that general methods leveraging compute beat human-engineered cleverness. The intellectual backdrop for letting agents search.
Adzic's case studies on turning concrete examples into shared specifications that drive delivery. The pre-agent root of spec-driven coding.
Patterns grouped by where they live in the agent workflow. Each category intro names the scope. Click a pattern name for its canonical reference, then read the per-pattern resources below.
How a codebase is shaped so an agent can do useful work on it. Naming discipline, module boundaries, AGENTS.md as a machine-readable contract, and specification-by-example are about giving the agent the same on-ramp a competent new hire gets. None of this is agent-specific. Agents punish the codebases that already neglected it.
Fowler's bliki entry on the joke that names half of every onboarding doc. Cache invalidation and naming things.
Belshee's seven-step ladder from "nonsense" to "domain abstraction". A working method for renaming legacy code one move at a time.
Ten rules for keeping modules readable by humans and agents. Small functions, no surprises, bounded loops.
How to draw hard module boundaries inside one deployable. The shape agents work best inside.
Community spec for the AGENTS.md file. Conventions, commands, and tests an agent should respect when changing this repo.
Amp's own AGENTS.md, published as a worked example. Build commands, code style, testing, and PR conventions in one file.
Adzic's reference text. Living specs that double as tests, written in concrete examples humans and machines can read.
GitHub's introduction to Spec Kit. Spec, Plan, Tasks, Implement, each producing an artifact the next phase consumes.
Grove makes the case that specifications, not prompts or code, are the durable artifact. The spec compiles. Prompts thrown away are wasted source.
How an agent's work is verified at machine speed. Property tests, snapshot tests, contract tests, types, differential testing, and evals close the loop without lengthening the review queue. The agent's job is to clear the fence. Humans look at the diff only when the fence catches something interesting.
The Hypothesis project's intro essay. State invariants and let the runner generate inputs. The starting point for property testing on real code.
Hands-on quickstart. From decorator to passing assertion, with strategies for generating realistic inputs.
Jest's reference treatment. Record current output as truth, fail when it changes. Useful when the spec is "whatever it does today."
Fowler's bliki entry on contract tests. Narrow fast tests that pin down the shape of an inter-service contract.
Brady's book on types-as-fences taken to its conclusion. The type checker as the first and cheapest verifier.
GitHub's open-source library for differential testing. Run new code in shadow next to old, compare, switch over.
vinext at 94% of the Next.js API surface, scored against 1,700 Vitest plus 380 Playwright tests ported straight from Next.js. Differential testing at framework scale.
Hejlsberg on porting tsc to Go for semantic parity. The reason it's a port and not a rewrite is so the existing test suite stays the source of truth.
Bun's AI-assisted Zig-to-Rust port at 99.8% of the pre-existing test suite. The rewrite stays in shadow until disagreement hits zero, exactly the pattern.
Anthropic's reference on agent design. Workflows vs. agents, tool design, evals, and where the autonomy band actually pays off.
Anthropic's working notes on building eval suites for agents. Treat the agent like untrusted input. Run a fixture set on every change.
How humans stay in the decision path without becoming the bottleneck. Stage gates, confidence-tiered autonomy, review queue design, agent-as-reviewer, and tight feedback loops like the Ralph loop are about giving humans the calls only they can make and letting agents do the rest.
Anthropic's reference on agent design. Workflows vs. agents, tool design, evals, and where the autonomy band actually pays off.
Willison's essay on splitting agent runs into plan, propose, and apply phases. Where the human checkpoints actually sit.
Sixteen agent sessions to ship one Ghostty feature, all transcripts published. A working example of stage gates as a real-world workflow.
OpenAI's framework for tiering agent autonomy by risk class. The underpinnings of confidence-tiered policies.
Karpathy on the autonomy slider. The Tesla Autopilot analogy for handing tasks to an agent gradually instead of full autonomy on day one.
Google's public reviewer guide. The rules and SLAs that scale a review queue without sinking individual reviewers.
GitHub's docs on letting Copilot review PRs. The cheap pass that catches the boring things humans miss.
GitHub's announcement for Copilot Workspace. The working sketch of the agent-as-reviewer interaction.
Huntley's original Ralph loop post. A bash while-loop feeding the same prompt to a coding agent until it converges.
Writers and engineers consistently publishing substantive material on agentic coding, spec-driven workflows, and what changes when AI joins the team.
Short answers to the questions that recur about working with coding agents in real codebases.
Give the agent the same on-ramp you would give a new hire. Add an AGENTS.md at the repo root with build, test, and convention notes. Keep modules small and well-named. Make sure the test suite runs cleanly from a fresh clone. Agents do their best work in codebases where humans already do their best work.
Source: AGENTS.md
Tests check behavior after you decide what to build. A spec captures intent before the code exists: goals, constraints, success criteria, and example inputs and outputs. Sean Grove's framing is that the spec is the new source code and tests are one of several artifacts generated from it. Good tests are necessary but not sufficient.
Only when the blast radius is small, the tests are trustworthy, and the change is reversible. Think dependency bumps, formatting passes, or scoped refactors behind a feature flag. Anything that touches auth, data migrations, billing, or public APIs still wants a human signoff. Start narrow and widen the autonomy band as your evals catch real regressions.
Make the agent write small, single-purpose PRs with a clear summary of intent and the spec or task it was working from. Require the same conventions you require of humans: meaningful commit messages, passing tests, no drive-by changes. Birgitta Boeckeler's memos on harness engineering are a good model. Invest in the scaffolding that makes the output legible.
AGENTS.md is an open format for telling coding agents how your project works: build commands, test commands, conventions, gotchas. It is stewarded by the Agentic AI Foundation and read natively by major agents including Claude Code, Codex, and Gemini CLI. Adoption is cheap, the file is just Markdown, and you get immediate leverage. Worth adding today.
Source: AGENTS.md spec
Treat the agent like any other untrusted input: write evals. Build a fixture set of representative tasks, run the agent against them on every prompt or model change, and track pass rate, regression rate, and cost per task. Anthropic's writeups on demystifying evals are a solid starting point. Spot-check diffs only when the eval surfaces something interesting.
Source: Anthropic Engineering
Humans own intent, constraints, and the merge button. Agents own the mechanical work in between: drafting, refactoring, running tests, proposing changes. The GitHub Spec Kit workflow makes this concrete. Humans write and approve the spec, plan, and tasks. The agent implements against them. Keep the handoff points explicit and you keep your judgment where it matters.
Source: GitHub Spec Kit
Original writing coming.
Smarter Dev essays, walkthroughs, and short courses on shipping with agents, designing your harness, and keeping judgment where it matters will land here as they're written.
Join the Discord to be notifiedLast updated