Agentic Architecture Playbook: Patterns for Reliable LLM Workflows

A colleague recently introduced me to GSD (Get Shit Done), a meta-prompting wrapper that sits on top of native AI CLIs like Claude Code or OpenCode. What impressed me is how deliberately it has been architected, and how robust the process is when using it within a project. Rather than just bolting prompts onto an LLM, it completely hijacks the reasoning loop to prevent “context rot”.

I’m working on an in-house workflow automation project at my employer which was already doing a small portion of what GSD does. I wanted to capture the architectural patterns GSD uses so that I can implement and adapt them. What follows is a breakdown of those patterns, analysed by Gemini and Claude in several iterations.

Core philosophy: meta-prompting and eradicating context rot

GSD is not a standalone terminal emulator or file-reading sandbox. Built by a solo developer who explicitly rejected enterprise-heavy project management tools, it leverages native AI CLI execution environments but shifts the burden of memory and planning away from the LLM’s internal context window and into a highly structured external file system, adversarial sub-agents, and strict requirement traceability.

The problem it solves is context rot: the quality degradation that occurs as an LLM fills its context window. Rather than attempting to manage this through summarisation or selective pruning, GSD’s architecture avoids the problem entirely through structural separation.

Phase 0: the thin orchestrator architecture

This is the foundational design primitive that makes everything else work. Without it, every subsequent phase degrades at scale.

GSD’s architecture rests on a three-layer separation that keeps orchestrator context usage at 10–15%, even when coordinating dozens of agents across a full phase execution.

The three layers

Commands (~50–100 lines) are thin entry points with YAML frontmatter and a single delegation instruction. They’re the user-facing interface, slash commands like /gsd:plan, /gsd:execute, /gsd:quick. A command’s only job is to validate input and delegate to a workflow.

Workflows (~200–400 lines) coordinate multi-step orchestration. They read structured JSON from gsd-tools init and pass file paths (never content) to agents via Task(). This is the bit that matters: the orchestrator never reads project files itself. It passes paths and lets agents read them in their own fresh context windows.

Agents (~800–1,500 lines) are heavyweight prompt specialists spawned with fresh 200K-token contexts. All the real work (reading files, reasoning about code, writing plans) happens inside these disposable subagent windows. When an agent completes, its context is discarded. The orchestrator only receives a structured result.

Why path-passing matters

The traditional approach is to read a file, pass its content to a sub-agent, and receive the result. This burns context in the orchestrator for content it never reasons about. GSD’s path-passing pattern means the orchestrator’s context contains only metadata: file paths, status flags, and routing decisions.

Compound init commands

The gsd-tools init <workflow> command returns all needed metadata as structured JSON in a single call, reducing orchestrator round-trips from 4+ to 1. Large JSON payloads exceeding 50KB are written to tempfiles to prevent tool-call truncation. Each field stays under 50 tokens.

Context pressure awareness

GSD calibrates agent behaviour around a 50% context target. The system explicitly acknowledges that LLMs degrade when they perceive context pressure, entering a “completion mode” where quality drops. The rule: stop BEFORE quality degrades. If an agent approaches its limit, it summarises its knowledge to a temp file, the context window is wiped, and the agent reboots reading only that summary.

Phase 1: the deterministic CLI layer

Every bash pattern an LLM improvises is a potential hallucination vector. Replace them with deterministic tooling.

gsd-tools: anti-hallucination infrastructure

GSD includes gsd-tools (bin/gsd-tools.cjs), a deterministic Node.js CLI with 40+ commands that offloads mechanical operations from LLM interpretation. The .cjs extension prevents ESM conflicts in projects using "type": "module".

Key command categories:

State management (state get, state patch, state advance-plan): atomic operations on STATE.md with schema validation, replacing fragile manual edits
Phase lifecycle (phase add, phase insert, phase remove, phase complete): manages the planning directory with safety checks
Frontmatter CRUD (frontmatter get, frontmatter set, frontmatter merge, frontmatter validate): schema-validated YAML manipulation, replacing grepping
Verification suites (verify plan-structure, verify artifacts, verify commits, verify references, verify key-links): deterministic quality gates
Roadmap analysis (roadmap analyze, roadmap get-phase): structured phase extraction
Template operations (template select, template fill): deterministic plan scaffolding
Atomic commits (commit): standardised git operations with message formatting
Context-optimising parsers (phase-plan-index, state-snapshot, summary-extract, history-digest): return structured JSON instead of raw file content, saving 5,000–10,000 tokens per workflow execution

Why this matters

Without this layer, agents must improvise bash commands like ls | sort -V for directory listings or grep through YAML for state. Each improvisation is a hallucination risk. gsd-tools replaces every such pattern with a tested, deterministic alternative.

Phase 2: context engineering and state externalisation

Stop trying to give agents infinite memory. Give them a perfect, tiny context window that is wiped clean and reloaded from text files for every single task.

Greenfield initialisation

For new projects, /gsd:new-project is the entry point that populates the entire state file hierarchy. It runs a structured flow: an interactive questionnaire captures project goals and constraints, parallel research agents investigate the domain, requirements are extracted and assigned to phases, and a roadmap is generated. The output is a fully populated .planning/ directory ready for execution.

The state file hierarchy

State files operate at four scope levels:

Living files (updated continuously):

PROJECT.md: immutable core rules, project identity, and constraints
REQUIREMENTS.md: the requirements specification with phase assignments and traceability checkboxes
STATE.md: the agent’s working memory with nested YAML frontmatter. The underlying CLI (Node/TypeScript) parses progress states, requirement completions, and boolean flags from the YAML without burning LLM tokens
config.json: 15+ workflow settings across five namespaces covering mode (yolo/interactive), depth (quick/standard/comprehensive), model profile (quality/balanced/budget), workflow toggles (research, plan_check, verifier, nyquist_validation, auto_advance), planning settings (commit_docs, search_gitignored), git strategy (branching, templates), and parallelisation

Milestone-scoped files (archived per milestone):

ROADMAP.md: the bridge between abstract requirements and granular execution. It chunks requirements into numbered Milestones and Phases. Downstream agents never look at the “whole project”; execution commands target a specific phase, strictly bounding the context
MILESTONES.md: append-only historical record of completed milestones

Phase-scoped files (created per phase):

{phase}-CONTEXT.md: frozen preference decisions from the interview loop
{phase}-DISCOVERY.md: mandatory discovery findings
{phase}-RESEARCH.md: domain research output
{phase}-VALIDATION.md: Nyquist validation feedback contract
PLAN.md files within .planning/phases/{N}-{slug}/plans/: the execution plans
SUMMARY.md: post-execution verification results
{phase}-UAT.md: user acceptance testing results
{phase}-VERIFICATION.md: three-level verification results

Historical/tracking files:

usage.json: session history tracking model selections, costs, complexity
current-agent-id.txt: active agent marker for resume detection
agent-history.json: complete agent spawn audit trail
.continue-here-*: mid-checkpoint handoff files
todos/pending/ and todos/done/: idea capture system, managed via /gsd:add-todo and /gsd:check-todos commands
debug/ and debug/resolved/: debug session persistence

Brownfield mapping: four structured artefacts

For existing codebases, the /gsd:map-codebase command runs an onboarding sweep using parallel specialist agents. It produces four distinct documents:

codebase/STACK.md: technology stack inventory (languages, frameworks, dependencies)
codebase/ARCHITECTURE.md: design patterns, module structure, data flow
codebase/CONVENTIONS.md: coding conventions discovered through analysis
codebase/CONCERNS.md: technical debt, fragility points, security risks

These load automatically during subsequent planning, so /gsd:new-project asks questions focused on what the user is adding rather than rediscovering existing structure. Future agents must adhere to the established patterns captured in CONVENTIONS.md.

Phase 3: the interview loop (constraint mapping)

Do not let the AI make invisible assumptions about what to build.

Identifying grey areas

The system analyses the requested roadmap phase and flags missing parameters based on categories: Visuals, APIs, Logic, Data, Authentication, Error Handling, and more.

The preference capture loop

The AI presents logical multiple-choice options for these grey areas. For example: “For infinite scroll, do you want to use IntersectionObserver or a React library?”

It iteratively asks if you are satisfied, freezing answers into an immutable CONTEXT.md file. This state-freezing step matters because once preferences are captured, they become the contract for all downstream agents. The /gsd:list-phase-assumptions command lets you review these frozen assumptions after the fact, so you can see exactly what constraints downstream agents are operating under.

Phase 4: the domain research layer

Before any planning happens, the system must validate the user’s preferences against reality.

Parallel researchers

Once preferences are captured in CONTEXT.md, parallel Researcher Agents are spawned. They read the preferences (e.g., “Use Stripe for payments”) and investigate external API documentation, component libraries, or the existing codebase.

The output

They generate a {phase_num}-RESEARCH.md file. The heavy Planner model is forced to read this before drafting XML, which prevents it from hallucinating deprecated APIs or made-up syntax.

Phase 5: goal-backward planning and adversarial validation

Never trust the first draft of an AI plan. Use an adversarial dynamic and mathematical traceability to ensure 100% scope fulfilment.

The five-step goal-backward derivation

GSD’s planner implements a structured methodology that transforms plans from task lists into verifiable outcome contracts:

Extract requirement IDs from ROADMAP.md and distribute across plans. Every ID must appear in at least one plan
State the goal in outcome-shaped language (“users can send messages”) not task-shaped language (“implement message API”)
Derive observable truths: 3–7 user-verifiable statements that must be TRUE for the goal to be achieved
Derive required artefacts: specific files with path, provides, min_lines, and exports fields
Identify key links: critical connections where breakage causes cascading failures, with from, to, via, and pattern (regex) fields

These derivations are encoded as must_haves in YAML frontmatter, forming machine-readable contracts that the verifier agent later checks against the actual codebase.

End-to-end traceability

When the roadmap is generated, every requirement is assigned strict IDs (phase_req_ids). These IDs are passed as hard parameters down the entire chain: roadmap → planner → plan XML → executor → verifier.

The six verification dimensions (Nyquist layer)

A separate “Auditor” agent (plan-checker) evaluates the proposed plan against six dimensions:

Requirement coverage: every ROADMAP requirement appears in at least one plan’s frontmatter
Task completeness: all tasks have <verify> elements with <automated> commands (the Nyquist Rule, and this is a blocking failure, not a suggestion)
Dependency correctness: wave assignments are consistent with declared depends_on
Key links: critical inter-component connections are identified and testable
Scope sanity: plans stay within ~50% context budget, 2–3 tasks maximum
must_haves derivation: truths, artefacts, and key_links properly derived from goals

Orphaned requirements (those mapped to a phase in REQUIREMENTS.md but unclaimed by any plan) are explicitly detected and flagged.

Wave 0 test scaffolding

When automated tests don’t yet exist, the system maps test coverage to each requirement before code is written, creating Wave 0 test scaffold tasks. This forces test-first development even when starting from scratch.

Plans as prompts

PLAN.md files are designed as direct prompt inputs for executor subagents, not human-readable documents. The XML structure is optimised for LLM consumption. This “plans-as-prompts” doctrine means the plan format itself is a prompt engineering decision.

Phase 6: deterministic task definition (XML prompting)

Format plans so rigidly that downstream coding agents cannot misunderstand them.

XML task schema

Once the plan passes the Nyquist Layer, it is formatted as strict XML:

<task type="auto" req_id="auth-01">
  <n>Create login endpoint</n>
  <files>src/api/auth.ts</files>
  <action>Use jose for JWT. Return httpOnly cookie.</action>
  <verify>
    <automated>curl -X POST localhost:3000/api/auth/login returns 200</automated>
    <manual>Check browser cookie inspector shows httpOnly flag</manual>
    <sampling_rate>always</sampling_rate>
  </verify>
  <done>Valid credentials return cookie, invalid return 401</done>
</task>

The <verify> blocks have sub-tags the planner must populate: <automated> (required command, enforced by the Nyquist Rule), <manual> (optional human check), and <sampling_rate> (execution frequency).

User setup sections

Plans carry a user_setup section for actions the AI literally cannot perform (obtaining API keys, creating webhook endpoints), with structured fields for service, why, env_vars with source, and dashboard_config with task and location. This separates human prerequisites from automated work at the schema level.

Good vs bad specificity

The planner includes concrete examples to calibrate output quality:

Good: “Create POST endpoint accepting {email, password}, validates using bcrypt, returns JWT in httpOnly cookie”
Bad: “Implement authentication system”

Anti-enterprise framing

A deliberate prompt engineering pattern: “If it sounds like corporate PM theatre, delete it.” This prevents the LLM from generating ceremonial project management artefacts that waste tokens without adding value.

Phase 7: strategic human handoffs and checkpoint taxonomy

Manage the transition from planning to execution with a typed checkpoint system, not ad-hoc pauses.

Four checkpoint types

GSD defines four distinct checkpoint types that govern human-agent interaction boundaries:

auto: fully autonomous execution, no human involvement. The executor proceeds without pausing.

checkpoint:human-verify (~90% of checkpoints): the human confirms Claude’s automated work. Each checkpoint specifies exact test steps and expected behaviour in <how-to-verify> tags. Presented with unmissable → YOUR ACTION: box headers.

checkpoint:setup: the human configures external services before Claude continues. Used for API key provisioning, webhook creation, and other actions outside the AI’s reach.

checkpoint:human-action: authentication gates that block execution even in --auto mode. These are hard stops that cannot be bypassed.

The automation-first principle

The governing rule: If Claude CAN do it via CLI/API, Claude MUST do it. Checkpoints verify AFTER automation, not replace it. Checkpoint types are declared in plan XML, and the executor agent’s prompt enforces distinct protocols for each type.

The auto-advance escape hatch

For fully automated workflows, the --auto flag (or workflow.auto_advance setting) overrides the human firewall, chaining Discuss → Plan → Execute back-to-back across full milestones. It relies entirely on the Nyquist Validation Layer to halt on errors.

One implementation detail worth noting: the --auto flag persists workflow.auto_advance: true to config.json on disk, not just in the agent’s context. This ensures the chaining instruction survives Claude Code’s context compaction events, which can silently discard in-context state.

Auto-advance is automatically cleared on milestone completion to prevent runaway chains across milestones.

Phase 8: wave execution and self-healing

Force execution into a strict, verifiable Software Development Life Cycle.

Dependency graphing and wave execution

Independent tasks run in parallel workers (Wave 1); dependent tasks wait for Wave 1 to succeed (Wave 2). The parallelization.enabled config flag controls whether waves actually run concurrently.

Atomic commits and automated self-healing

After finishing an <action>, the executor runs the <verify> command. If it passes, it executes an atomic git commit. If it fails, a Diagnostic Agent spawns.

The debug agent: scientific methodology with circuit breakers

GSD includes a dedicated gsd-debugger agent (~990 lines) implementing:

Scientific debugging methodology: hypothesis generation → testing → evidence gathering → root cause identification
7+ investigation techniques built into the prompt
A 3-strike rule: after three failed debugging attempts, the agent dumps state and recommends a fresh session to prevent circular reasoning
Human verification requirement: debug sessions need human confirmation before resolution (cannot self-close)
Persistent state: maintains debugging context in .planning/debug/, resolved sessions archived to debug/resolved/

File-system garbage collection (milestone archiving)

The milestone lifecycle has three distinct steps. First, /gsd:audit-milestone runs a pre-completion review to verify all requirements are met and flag any gaps. Once the audit passes, /gsd:complete-milestone permanently archives and zips the executed phase directories out of the active working tree. This prevents the .planning/ directory from bloating and stops the orchestrator from wasting tokens parsing outdated feature plans. Finally, /gsd:new-milestone starts the next cycle, scaffolding a fresh milestone structure ready for planning.

Phase 9: three-level verification

Trust but verify, empirically, against the actual codebase.

The verifier agent validates completed work using a three-level methodology:

Level 1, file existence: must_haves artefacts actually exist on disk.

Level 2, substantive implementation: files contain real code, not stubs, TODOs, or placeholder content. Checked via min_lines thresholds and contains assertions.

Level 3, integration wiring: components are properly connected and functional. Verified through regex pattern matching on actual source files via key_links (the from, to, via, and pattern fields declared in the plan).

The /gsd:verify-work [N] command triggers this verification for a given phase, combining automated checks with manual UAT. When automated checks fail, a diagnostic agent spawns to identify the root cause — you get an actual diagnosis, not just a pass/fail.

The verifier produces a requirements table with Source Plan, Description, and Evidence columns, cross-referencing requirement IDs from PLAN frontmatter rather than grepping REQUIREMENTS.md by phase number. This catches plans that claim to address requirements without actually implementing them.

Phase 10: session persistence and recovery

Agent work must survive crashes, context compaction, and overnight breaks.

Three-level resume

GSD implements three-level resume via /gsd:resume-work:

Agent-level: current-agent-id.txt is written before spawning each executor and deleted on completion. If the marker file exists, the system offers Task(resume="{agent_id}") to continue the interrupted subagent in its preserved context.

Mid-plan checkpoint: .continue-here-{phase}-{plan}-checkpoint-{N}.md handoff files capture the checkpoint context, allowing seamless resumption at the exact task where human input was needed.

Plan-level: if PLAN.md exists but no corresponding SUMMARY.md, the system re-runs execute-plan for the incomplete plan.

Supporting commands

/gsd:progress: read-only command that reads all state files and suggests the exact next command
/gsd:pause-work: creates a structured handoff document
agent-history.json: complete audit trail of all agent spawns

The stopped_at and resume_file frontmatter fields in STATE.md enable cross-session continuity.

Phase 11: quick mode, the graduated quality spectrum

Not every change needs the full planning pipeline. But every change needs atomic commits and state tracking.

The /gsd:quick command offers a fast-path alternative that preserves core guarantees (atomic commits, state tracking, fresh agent context) while skipping research, plan-checking, and verification.

Quick tasks live in .planning/quick/001-{slug}/ with sequential numbering. The --full flag enables plan-checking (max 2 iterations) and post-execution verification, creating a graduated quality spectrum:

Quick (default): just do the work with atomic commits
Quick –full: add plan-checking and verification
Full pipeline (/gsd:discuss → /gsd:plan → /gsd:execute): research, interview loop, adversarial planning, wave execution, and three-level verification

This pattern is essential for ad-hoc work that doesn’t justify the overhead of the full pipeline.

Phase 12: dynamic roadmap modification

Plans change. The system must adapt without losing traceability.

Roadmap mutation commands

/gsd:add-phase: appends a new phase to the roadmap
/gsd:insert-phase N: inserts between existing phases using decimal numbering (inserting between 3 and 4 creates phase 3.1, then 3.2, supporting multi-level decimals like 72.1.1)
/gsd:remove-phase N: removes with safety checks. Blocks removal of in-progress or completed phases; requires --force if SUMMARY files exist

Gap analysis

/gsd:plan-milestone-gaps creates entirely new phases from audit findings, updating the REQUIREMENTS.md traceability table with new phase assignments and checkbox resets.

Phase 13: model tiering and multi-runtime deployment

Match cognitive load to model capability, and deploy one source format to every runtime.

Model routing

Use the right model for the right job:

Architects (e.g., Opus/GPT-4o): Discuss and Planner phases, where deep reasoning about requirements and architecture matters
Executors (e.g., Sonnet/GPT-4o-mini): code generation and syntax work, where following a well-defined plan matters more than novel reasoning
Verifiers (e.g., Haiku/Flash): fast pattern matching on logs, file existence checks, and structured validation

The model_profile config (quality/balanced/budget) selects from these tiers. The model resolution system returns inherit rather than hardcoded model names, allowing the underlying runtime’s default model to be used. Per-agent model overrides are available beyond the three-tier profile system.

Multi-runtime transpilation

A single source format deploys to four runtimes with genuine structural transformations:

Claude Code: native format with nested commands, YAML frontmatter, PascalCase tool names
OpenCode: flat command structure, tool name mapping (e.g., AskUserQuestion → question), general-purpose subagent type mapping, JSONC handling for comments/trailing commas/BOM
Gemini CLI: TOML config conversion, MCP tool filtering, HTML tag stripping, auto-enables experimental.enableAgents
Codex: skills-first architecture where commands become skills/gsd-*/SKILL.md, slash syntax rewrites (/gsd:* → $gsd-*), argument normalisation

The installer (bin/install.js) has 11 domain modules, detects non-interactive stdin for WSL2/container environments, removes orphaned hooks from previous versions, and preserves locally modified files in gsd-local-patches/ with a manifest for /gsd:reapply-patches restoration.

Granular permission bypassing

Implement a “Safe Command Whitelist.” Automatically approve non-destructive read commands (cat, ls) and safe writes (git commit). Only pause to request human approval for destructive actions.

Cross-cutting concerns

Security patterns

Write tool enforcement: agents must always use the Write tool, never Bash(cat << 'EOF') or heredoc. Bash heredocs have corrupted configuration files in production
Scope boundary and attempt limits on executors prevent runaway loops
Strict package safety checks in /gsd:update: only the trusted package name is accepted; scoped/user-derived names are rejected; install commands are allowlisted to trusted forms
Timestamped backups before any state repair operation (health --repair creates a backup before regenerating STATE.md)
Secret commit protection: agents are prevented from committing secrets, with a recommended configuration to add .env and credential files to Claude’s deny list in .claude/settings.json so they cannot be read at all
Runtime bug workarounds: execute-phase and quick workflows spot-check actual git output before reporting agent failure, defending against the host runtime’s false failure reports (e.g., classifyHandoffIfNeeded bugs)

System integrity and configuration commands

Beyond the health --repair mode mentioned above, /gsd:health is a general system integrity check — it validates state files, directory structure, and config consistency. /gsd:settings exposes the full configuration for inspection and modification, while /gsd:set-profile switches between model profiles (quality/balanced/budget) without manually editing config.json.

Prompt engineering patterns

Several transferable techniques are embedded across all agent definitions:

Mandatory file-read blocks: <files_to_read> tags at the start of every agent force the agent to use the Read tool before any actions, preventing hallucination from assumed file contents
<project_context> injection: every agent checks for ./CLAUDE.md and .agents/skills/ at spawn time, so project-specific conventions override generic behaviour
Good/bad example contrasting: concrete examples of good specificity versus bad vagueness, calibrating the LLM’s output quality
Context pressure calibration: explicit acknowledgement that quality degrades under context pressure, with the 50% context target as a hard operational limit
Anti-enterprise framing: “If it sounds like corporate PM theatre, delete it.” Prevents ceremonial artefact generation

Configuration depth

The full config.json schema has 15+ configurable parameters across five namespaces, plus user-level defaults at ~/.gsd/defaults.json:

mode (yolo/interactive), depth (quick/standard/comprehensive), model_profile (quality/balanced/budget)
workflow.*: research, plan_check, verifier, nyquist_validation, auto_advance, each independently toggleable
planning.*: commit_docs (auto-false if .planning/ in .gitignore), search_gitignored
git.*: branching_strategy (none/phase/milestone), phase_branch_template, milestone_branch_template with {phase}, {slug}, {milestone} template variables
parallelization.enabled for wave-based parallel execution

The architecture at a glance

The system is a pipeline where each phase produces artefacts consumed by the next, with the thin orchestrator managing flow without accumulating context:

User Intent
    │
    ▼
Phase 0: Thin Orchestrator (command → workflow → agent delegation)
    │
    ▼
Phase 1: Deterministic CLI Layer (gsd-tools replaces improvised bash)
    │
    ▼
Phase 2: State Externalisation (state files + brownfield mapping)
    │
    ▼
Phase 3: Interview Loop (grey area → CONTEXT.md)
    │
    ▼
Phase 4: Domain Research (parallel researchers → RESEARCH.md)
    │
    ▼
Phase 5: Goal-Backward Planning + Nyquist Validation (6 dimensions)
    │
    ▼
Phase 6: XML Task Definition (deterministic schema for executors)
    │
    ▼
Phase 7: Checkpoint Taxonomy (4 types, automation-first)
    │
    ▼
Phase 8: Wave Execution + Self-Healing (debug agent, 3-strike rule)
    │
    ▼
Phase 9: Three-Level Verification (existence → substance → integration)
    │
    ▼
Phase 10: Session Persistence (3-level resume)
    │
    ▼
Phase 11: Quick Mode (graduated quality spectrum)
    │
    ▼
Phase 12: Dynamic Roadmap Mutation (decimal phase insertion)
    │
    ▼
Phase 13: Model Tiering + Multi-Runtime Deployment

The insight running through every phase: don’t fight context rot, architect around it. Keep orchestrators thin, make state external, pass paths not content, spawn fresh agents for real work, and validate everything adversarially.