Agentic Architecture Playbook: Patterns for Reliable LLM Workflows
A colleague recently introduced me to GSD (Get Shit Done), a meta-prompting wrapper that sits on top of native AI CLIs like Claude Code or OpenCode. What impressed me is how deliberately it has been architected, and how robust the process is when using it within a project. Rather than just bolting prompts onto an LLM, it completely hijacks the reasoning loop to prevent “context rot”.
I’m working on an in-house workflow automation project at my employer which was already doing a small portion of what GSD does. I wanted to capture the architectural patterns GSD uses so that I can implement and adapt them. What follows is a breakdown of those patterns, analysed by Gemini and Claude in several iterations.
Core philosophy: meta-prompting and eradicating context rot
GSD is not a standalone terminal emulator or file-reading sandbox. Built by a solo developer who explicitly rejected enterprise-heavy project management tools, it leverages native AI CLI execution environments but shifts the burden of memory and planning away from the LLM’s internal context window and into a highly structured external file system, adversarial sub-agents, and strict requirement traceability.
The problem it solves is context rot: the quality degradation that occurs as an LLM fills its context window. Rather than attempting to manage this through summarisation or selective pruning, GSD’s architecture avoids the problem entirely through structural separation.
Phase 0: the thin orchestrator architecture
This is the foundational design primitive that makes everything else work. Without it, every subsequent phase degrades at scale.
GSD’s architecture rests on a three-layer separation that keeps orchestrator context usage at 10–15%, even when coordinating dozens of agents across a full phase execution.
The three layers
Commands (~50–100 lines) are thin entry points with YAML frontmatter and a single delegation instruction. They’re the user-facing interface, slash commands like /gsd:plan, /gsd:execute, /gsd:quick. A command’s only job is to validate input and delegate to a workflow.
Workflows (~200–400 lines) coordinate multi-step orchestration. They read structured JSON from gsd-tools init and pass file paths (never content) to agents via Task(). This is the bit that matters: the orchestrator never reads project files itself. It passes paths and lets agents read them in their own fresh context windows.
Agents (~800–1,500 lines) are heavyweight prompt specialists spawned with fresh 200K-token contexts. All the real work (reading files, reasoning about code, writing plans) happens inside these disposable subagent windows. When an agent completes, its context is discarded. The orchestrator only receives a structured result.
Why path-passing matters
The traditional approach is to read a file, pass its content to a sub-agent, and receive the result. This burns context in the orchestrator for content it never reasons about. GSD’s path-passing pattern means the orchestrator’s context contains only metadata: file paths, status flags, and routing decisions.
Compound init commands
The gsd-tools init <workflow> command returns all needed metadata as structured JSON in a single call, reducing orchestrator round-trips from 4+ to 1. Large JSON payloads exceeding 50KB are written to tempfiles to prevent tool-call truncation. Each field stays under 50 tokens.
Context pressure awareness
GSD calibrates agent behaviour around a 50% context target. The system explicitly acknowledges that LLMs degrade when they perceive context pressure, entering a “completion mode” where quality drops. The rule: stop BEFORE quality degrades. If an agent approaches its limit, it summarises its knowledge to a temp file, the context window is wiped, and the agent reboots reading only that summary.
Phase 1: the deterministic CLI layer
Every bash pattern an LLM improvises is a potential hallucination vector. Replace them with deterministic tooling.
gsd-tools: anti-hallucination infrastructure
GSD includes gsd-tools (bin/gsd-tools.cjs), a deterministic Node.js CLI with 40+ commands that offloads mechanical operations from LLM interpretation. The .cjs extension prevents ESM conflicts in projects using "type": "module".
Key command categories:
- State management (
state get,state patch,state advance-plan): atomic operations on STATE.md with schema validation, replacing fragile manual edits - Phase lifecycle (
phase add,phase insert,phase remove,phase complete): manages the planning directory with safety checks - Frontmatter CRUD (
frontmatter get,frontmatter set,frontmatter merge,frontmatter validate): schema-validated YAML manipulation, replacing grepping - Verification suites (
verify plan-structure,verify artifacts,verify commits,verify references,verify key-links): deterministic quality gates - Roadmap analysis (
roadmap analyze,roadmap get-phase): structured phase extraction - Template operations (
template select,template fill): deterministic plan scaffolding - Atomic commits (
commit): standardised git operations with message formatting - Context-optimising parsers (
phase-plan-index,state-snapshot,summary-extract,history-digest): return structured JSON instead of raw file content, saving 5,000–10,000 tokens per workflow execution
Why this matters
Without this layer, agents must improvise bash commands like ls | sort -V for directory listings or grep through YAML for state. Each improvisation is a hallucination risk. gsd-tools replaces every such pattern with a tested, deterministic alternative.
Phase 2: context engineering and state externalisation
Stop trying to give agents infinite memory. Give them a perfect, tiny context window that is wiped clean and reloaded from text files for every single task.
Greenfield initialisation
For new projects, /gsd:new-project is the entry point that populates the entire state file hierarchy. It runs a structured flow: an interactive questionnaire captures project goals and constraints, parallel research agents investigate the domain, requirements are extracted and assigned to phases, and a roadmap is generated. The output is a fully populated .planning/ directory ready for execution.
The state file hierarchy
State files operate at four scope levels:
Living files (updated continuously):
PROJECT.md: immutable core rules, project identity, and constraintsREQUIREMENTS.md: the requirements specification with phase assignments and traceability checkboxesSTATE.md: the agent’s working memory with nested YAML frontmatter. The underlying CLI (Node/TypeScript) parses progress states, requirement completions, and boolean flags from the YAML without burning LLM tokensconfig.json: 15+ workflow settings across five namespaces covering mode (yolo/interactive), depth (quick/standard/comprehensive), model profile (quality/balanced/budget), workflow toggles (research, plan_check, verifier, nyquist_validation, auto_advance), planning settings (commit_docs, search_gitignored), git strategy (branching, templates), and parallelisation
Milestone-scoped files (archived per milestone):
ROADMAP.md: the bridge between abstract requirements and granular execution. It chunks requirements into numbered Milestones and Phases. Downstream agents never look at the “whole project”; execution commands target a specific phase, strictly bounding the contextMILESTONES.md: append-only historical record of completed milestones
Phase-scoped files (created per phase):
{phase}-CONTEXT.md: frozen preference decisions from the interview loop{phase}-DISCOVERY.md: mandatory discovery findings{phase}-RESEARCH.md: domain research output{phase}-VALIDATION.md: Nyquist validation feedback contractPLAN.mdfiles within.planning/phases/{N}-{slug}/plans/: the execution plansSUMMARY.md: post-execution verification results{phase}-UAT.md: user acceptance testing results{phase}-VERIFICATION.md: three-level verification results
Historical/tracking files:
usage.json: session history tracking model selections, costs, complexitycurrent-agent-id.txt: active agent marker for resume detectionagent-history.json: complete agent spawn audit trail.continue-here-*: mid-checkpoint handoff filestodos/pending/andtodos/done/: idea capture system, managed via/gsd:add-todoand/gsd:check-todoscommandsdebug/anddebug/resolved/: debug session persistence
Brownfield mapping: four structured artefacts
For existing codebases, the /gsd:map-codebase command runs an onboarding sweep using parallel specialist agents. It produces four distinct documents:
codebase/STACK.md: technology stack inventory (languages, frameworks, dependencies)codebase/ARCHITECTURE.md: design patterns, module structure, data flowcodebase/CONVENTIONS.md: coding conventions discovered through analysiscodebase/CONCERNS.md: technical debt, fragility points, security risks
These load automatically during subsequent planning, so /gsd:new-project asks questions focused on what the user is adding rather than rediscovering existing structure. Future agents must adhere to the established patterns captured in CONVENTIONS.md.
Phase 3: the interview loop (constraint mapping)
Do not let the AI make invisible assumptions about what to build.
Identifying grey areas
The system analyses the requested roadmap phase and flags missing parameters based on categories: Visuals, APIs, Logic, Data, Authentication, Error Handling, and more.
The preference capture loop
The AI presents logical multiple-choice options for these grey areas. For example: “For infinite scroll, do you want to use IntersectionObserver or a React library?”
It iteratively asks if you are satisfied, freezing answers into an immutable CONTEXT.md file. This state-freezing step matters because once preferences are captured, they become the contract for all downstream agents. The /gsd:list-phase-assumptions command lets you review these frozen assumptions after the fact, so you can see exactly what constraints downstream agents are operating under.
Phase 4: the domain research layer
Before any planning happens, the system must validate the user’s preferences against reality.
Parallel researchers
Once preferences are captured in CONTEXT.md, parallel Researcher Agents are spawned. They read the preferences (e.g., “Use Stripe for payments”) and investigate external API documentation, component libraries, or the existing codebase.
The output
They generate a {phase_num}-RESEARCH.md file. The heavy Planner model is forced to read this before drafting XML, which prevents it from hallucinating deprecated APIs or made-up syntax.
Phase 5: goal-backward planning and adversarial validation
Never trust the first draft of an AI plan. Use an adversarial dynamic and mathematical traceability to ensure 100% scope fulfilment.
The five-step goal-backward derivation
GSD’s planner implements a structured methodology that transforms plans from task lists into verifiable outcome contracts:
- Extract requirement IDs from ROADMAP.md and distribute across plans. Every ID must appear in at least one plan
- State the goal in outcome-shaped language (“users can send messages”) not task-shaped language (“implement message API”)
- Derive observable truths: 3–7 user-verifiable statements that must be TRUE for the goal to be achieved
- Derive required artefacts: specific files with
path,provides,min_lines, andexportsfields - Identify key links: critical connections where breakage causes cascading failures, with
from,to,via, andpattern(regex) fields
These derivations are encoded as must_haves in YAML frontmatter, forming machine-readable contracts that the verifier agent later checks against the actual codebase.
End-to-end traceability
When the roadmap is generated, every requirement is assigned strict IDs (phase_req_ids). These IDs are passed as hard parameters down the entire chain: roadmap → planner → plan XML → executor → verifier.
The six verification dimensions (Nyquist layer)
A separate “Auditor” agent (plan-checker) evaluates the proposed plan against six dimensions:
- Requirement coverage: every ROADMAP requirement appears in at least one plan’s frontmatter
- Task completeness: all tasks have
<verify>elements with<automated>commands (the Nyquist Rule, and this is a blocking failure, not a suggestion) - Dependency correctness: wave assignments are consistent with declared
depends_on - Key links: critical inter-component connections are identified and testable
- Scope sanity: plans stay within ~50% context budget, 2–3 tasks maximum
- must_haves derivation: truths, artefacts, and key_links properly derived from goals
Orphaned requirements (those mapped to a phase in REQUIREMENTS.md but unclaimed by any plan) are explicitly detected and flagged.
Wave 0 test scaffolding
When automated tests don’t yet exist, the system maps test coverage to each requirement before code is written, creating Wave 0 test scaffold tasks. This forces test-first development even when starting from scratch.
Plans as prompts
PLAN.md files are designed as direct prompt inputs for executor subagents, not human-readable documents. The XML structure is optimised for LLM consumption. This “plans-as-prompts” doctrine means the plan format itself is a prompt engineering decision.
Phase 6: deterministic task definition (XML prompting)
Format plans so rigidly that downstream coding agents cannot misunderstand them.
XML task schema
Once the plan passes the Nyquist Layer, it is formatted as strict XML:
<task type="auto" req_id="auth-01">
<n>Create login endpoint</n>
<files>src/api/auth.ts</files>
<action>Use jose for JWT. Return httpOnly cookie.</action>
<verify>
<automated>curl -X POST localhost:3000/api/auth/login returns 200</automated>
<manual>Check browser cookie inspector shows httpOnly flag</manual>
<sampling_rate>always</sampling_rate>
</verify>
<done>Valid credentials return cookie, invalid return 401</done>
</task>
The <verify> blocks have sub-tags the planner must populate: <automated> (required command, enforced by the Nyquist Rule), <manual> (optional human check), and <sampling_rate> (execution frequency).
User setup sections
Plans carry a user_setup section for actions the AI literally cannot perform (obtaining API keys, creating webhook endpoints), with structured fields for service, why, env_vars with source, and dashboard_config with task and location. This separates human prerequisites from automated work at the schema level.
Good vs bad specificity
The planner includes concrete examples to calibrate output quality:
- Good: “Create POST endpoint accepting {email, password}, validates using bcrypt, returns JWT in httpOnly cookie”
- Bad: “Implement authentication system”
Anti-enterprise framing
A deliberate prompt engineering pattern: “If it sounds like corporate PM theatre, delete it.” This prevents the LLM from generating ceremonial project management artefacts that waste tokens without adding value.
Phase 7: strategic human handoffs and checkpoint taxonomy
Manage the transition from planning to execution with a typed checkpoint system, not ad-hoc pauses.
Four checkpoint types
GSD defines four distinct checkpoint types that govern human-agent interaction boundaries:
auto: fully autonomous execution, no human involvement. The executor proceeds without pausing.
checkpoint:human-verify (~90% of checkpoints): the human confirms Claude’s automated work. Each checkpoint specifies exact test steps and expected behaviour in <how-to-verify> tags. Presented with unmissable → YOUR ACTION: box headers.
checkpoint:setup: the human configures external services before Claude continues. Used for API key provisioning, webhook creation, and other actions outside the AI’s reach.
checkpoint:human-action: authentication gates that block execution even in --auto mode. These are hard stops that cannot be bypassed.
The automation-first principle
The governing rule: If Claude CAN do it via CLI/API, Claude MUST do it. Checkpoints verify AFTER automation, not replace it. Checkpoint types are declared in plan XML, and the executor agent’s prompt enforces distinct protocols for each type.
The auto-advance escape hatch
For fully automated workflows, the --auto flag (or workflow.auto_advance setting) overrides the human firewall, chaining Discuss → Plan → Execute back-to-back across full milestones. It relies entirely on the Nyquist Validation Layer to halt on errors.
One implementation detail worth noting: the --auto flag persists workflow.auto_advance: true to config.json on disk, not just in the agent’s context. This ensures the chaining instruction survives Claude Code’s context compaction events, which can silently discard in-context state.
Auto-advance is automatically cleared on milestone completion to prevent runaway chains across milestones.
Phase 8: wave execution and self-healing
Force execution into a strict, verifiable Software Development Life Cycle.
Dependency graphing and wave execution
Independent tasks run in parallel workers (Wave 1); dependent tasks wait for Wave 1 to succeed (Wave 2). The parallelization.enabled config flag controls whether waves actually run concurrently.
Atomic commits and automated self-healing
After finishing an <action>, the executor runs the <verify> command. If it passes, it executes an atomic git commit. If it fails, a Diagnostic Agent spawns.
The debug agent: scientific methodology with circuit breakers
GSD includes a dedicated gsd-debugger agent (~990 lines) implementing:
- Scientific debugging methodology: hypothesis generation → testing → evidence gathering → root cause identification
- 7+ investigation techniques built into the prompt
- A 3-strike rule: after three failed debugging attempts, the agent dumps state and recommends a fresh session to prevent circular reasoning
- Human verification requirement: debug sessions need human confirmation before resolution (cannot self-close)
- Persistent state: maintains debugging context in
.planning/debug/, resolved sessions archived todebug/resolved/
File-system garbage collection (milestone archiving)
The milestone lifecycle has three distinct steps. First, /gsd:audit-milestone runs a pre-completion review to verify all requirements are met and flag any gaps. Once the audit passes, /gsd:complete-milestone permanently archives and zips the executed phase directories out of the active working tree. This prevents the .planning/ directory from bloating and stops the orchestrator from wasting tokens parsing outdated feature plans. Finally, /gsd:new-milestone starts the next cycle, scaffolding a fresh milestone structure ready for planning.
Phase 9: three-level verification
Trust but verify, empirically, against the actual codebase.
The verifier agent validates completed work using a three-level methodology:
Level 1, file existence: must_haves artefacts actually exist on disk.
Level 2, substantive implementation: files contain real code, not stubs, TODOs, or placeholder content. Checked via min_lines thresholds and contains assertions.
Level 3, integration wiring: components are properly connected and functional. Verified through regex pattern matching on actual source files via key_links (the from, to, via, and pattern fields declared in the plan).
The /gsd:verify-work [N] command triggers this verification for a given phase, combining automated checks with manual UAT. When automated checks fail, a diagnostic agent spawns to identify the root cause — you get an actual diagnosis, not just a pass/fail.
The verifier produces a requirements table with Source Plan, Description, and Evidence columns, cross-referencing requirement IDs from PLAN frontmatter rather than grepping REQUIREMENTS.md by phase number. This catches plans that claim to address requirements without actually implementing them.
Phase 10: session persistence and recovery
Agent work must survive crashes, context compaction, and overnight breaks.
Three-level resume
GSD implements three-level resume via /gsd:resume-work:
Agent-level: current-agent-id.txt is written before spawning each executor and deleted on completion. If the marker file exists, the system offers Task(resume="{agent_id}") to continue the interrupted subagent in its preserved context.
Mid-plan checkpoint: .continue-here-{phase}-{plan}-checkpoint-{N}.md handoff files capture the checkpoint context, allowing seamless resumption at the exact task where human input was needed.
Plan-level: if PLAN.md exists but no corresponding SUMMARY.md, the system re-runs execute-plan for the incomplete plan.
Supporting commands
/gsd:progress: read-only command that reads all state files and suggests the exact next command/gsd:pause-work: creates a structured handoff documentagent-history.json: complete audit trail of all agent spawns
The stopped_at and resume_file frontmatter fields in STATE.md enable cross-session continuity.
Phase 11: quick mode, the graduated quality spectrum
Not every change needs the full planning pipeline. But every change needs atomic commits and state tracking.
The /gsd:quick command offers a fast-path alternative that preserves core guarantees (atomic commits, state tracking, fresh agent context) while skipping research, plan-checking, and verification.
Quick tasks live in .planning/quick/001-{slug}/ with sequential numbering. The --full flag enables plan-checking (max 2 iterations) and post-execution verification, creating a graduated quality spectrum:
- Quick (default): just do the work with atomic commits
- Quick –full: add plan-checking and verification
- Full pipeline (
/gsd:discuss→/gsd:plan→/gsd:execute): research, interview loop, adversarial planning, wave execution, and three-level verification
This pattern is essential for ad-hoc work that doesn’t justify the overhead of the full pipeline.
Phase 12: dynamic roadmap modification
Plans change. The system must adapt without losing traceability.
Roadmap mutation commands
/gsd:add-phase: appends a new phase to the roadmap/gsd:insert-phase N: inserts between existing phases using decimal numbering (inserting between 3 and 4 creates phase 3.1, then 3.2, supporting multi-level decimals like 72.1.1)/gsd:remove-phase N: removes with safety checks. Blocks removal of in-progress or completed phases; requires--forceif SUMMARY files exist
Gap analysis
/gsd:plan-milestone-gaps creates entirely new phases from audit findings, updating the REQUIREMENTS.md traceability table with new phase assignments and checkbox resets.
Phase 13: model tiering and multi-runtime deployment
Match cognitive load to model capability, and deploy one source format to every runtime.
Model routing
Use the right model for the right job:
- Architects (e.g., Opus/GPT-4o): Discuss and Planner phases, where deep reasoning about requirements and architecture matters
- Executors (e.g., Sonnet/GPT-4o-mini): code generation and syntax work, where following a well-defined plan matters more than novel reasoning
- Verifiers (e.g., Haiku/Flash): fast pattern matching on logs, file existence checks, and structured validation
The model_profile config (quality/balanced/budget) selects from these tiers. The model resolution system returns inherit rather than hardcoded model names, allowing the underlying runtime’s default model to be used. Per-agent model overrides are available beyond the three-tier profile system.
Multi-runtime transpilation
A single source format deploys to four runtimes with genuine structural transformations:
- Claude Code: native format with nested commands, YAML frontmatter, PascalCase tool names
- OpenCode: flat command structure, tool name mapping (e.g.,
AskUserQuestion→question), general-purpose subagent type mapping, JSONC handling for comments/trailing commas/BOM - Gemini CLI: TOML config conversion, MCP tool filtering, HTML tag stripping, auto-enables
experimental.enableAgents - Codex: skills-first architecture where commands become
skills/gsd-*/SKILL.md, slash syntax rewrites (/gsd:*→$gsd-*), argument normalisation
The installer (bin/install.js) has 11 domain modules, detects non-interactive stdin for WSL2/container environments, removes orphaned hooks from previous versions, and preserves locally modified files in gsd-local-patches/ with a manifest for /gsd:reapply-patches restoration.
Granular permission bypassing
Implement a “Safe Command Whitelist.” Automatically approve non-destructive read commands (cat, ls) and safe writes (git commit). Only pause to request human approval for destructive actions.
Cross-cutting concerns
Security patterns
- Write tool enforcement: agents must always use the Write tool, never
Bash(cat << 'EOF')or heredoc. Bash heredocs have corrupted configuration files in production - Scope boundary and attempt limits on executors prevent runaway loops
- Strict package safety checks in
/gsd:update: only the trusted package name is accepted; scoped/user-derived names are rejected; install commands are allowlisted to trusted forms - Timestamped backups before any state repair operation (
health --repaircreates a backup before regenerating STATE.md) - Secret commit protection: agents are prevented from committing secrets, with a recommended configuration to add
.envand credential files to Claude’s deny list in.claude/settings.jsonso they cannot be read at all - Runtime bug workarounds: execute-phase and quick workflows spot-check actual git output before reporting agent failure, defending against the host runtime’s false failure reports (e.g.,
classifyHandoffIfNeededbugs)
System integrity and configuration commands
Beyond the health --repair mode mentioned above, /gsd:health is a general system integrity check — it validates state files, directory structure, and config consistency. /gsd:settings exposes the full configuration for inspection and modification, while /gsd:set-profile switches between model profiles (quality/balanced/budget) without manually editing config.json.
Prompt engineering patterns
Several transferable techniques are embedded across all agent definitions:
- Mandatory file-read blocks:
<files_to_read>tags at the start of every agent force the agent to use the Read tool before any actions, preventing hallucination from assumed file contents <project_context>injection: every agent checks for./CLAUDE.mdand.agents/skills/at spawn time, so project-specific conventions override generic behaviour- Good/bad example contrasting: concrete examples of good specificity versus bad vagueness, calibrating the LLM’s output quality
- Context pressure calibration: explicit acknowledgement that quality degrades under context pressure, with the 50% context target as a hard operational limit
- Anti-enterprise framing: “If it sounds like corporate PM theatre, delete it.” Prevents ceremonial artefact generation
Configuration depth
The full config.json schema has 15+ configurable parameters across five namespaces, plus user-level defaults at ~/.gsd/defaults.json:
mode(yolo/interactive),depth(quick/standard/comprehensive),model_profile(quality/balanced/budget)workflow.*: research, plan_check, verifier, nyquist_validation, auto_advance, each independently toggleableplanning.*: commit_docs (auto-false if .planning/ in .gitignore), search_gitignoredgit.*: branching_strategy (none/phase/milestone), phase_branch_template, milestone_branch_template with{phase},{slug},{milestone}template variablesparallelization.enabledfor wave-based parallel execution
The architecture at a glance
The system is a pipeline where each phase produces artefacts consumed by the next, with the thin orchestrator managing flow without accumulating context:
User Intent
│
▼
Phase 0: Thin Orchestrator (command → workflow → agent delegation)
│
▼
Phase 1: Deterministic CLI Layer (gsd-tools replaces improvised bash)
│
▼
Phase 2: State Externalisation (state files + brownfield mapping)
│
▼
Phase 3: Interview Loop (grey area → CONTEXT.md)
│
▼
Phase 4: Domain Research (parallel researchers → RESEARCH.md)
│
▼
Phase 5: Goal-Backward Planning + Nyquist Validation (6 dimensions)
│
▼
Phase 6: XML Task Definition (deterministic schema for executors)
│
▼
Phase 7: Checkpoint Taxonomy (4 types, automation-first)
│
▼
Phase 8: Wave Execution + Self-Healing (debug agent, 3-strike rule)
│
▼
Phase 9: Three-Level Verification (existence → substance → integration)
│
▼
Phase 10: Session Persistence (3-level resume)
│
▼
Phase 11: Quick Mode (graduated quality spectrum)
│
▼
Phase 12: Dynamic Roadmap Mutation (decimal phase insertion)
│
▼
Phase 13: Model Tiering + Multi-Runtime Deployment
The insight running through every phase: don’t fight context rot, architect around it. Keep orchestrators thin, make state external, pass paths not content, spawn fresh agents for real work, and validate everything adversarially.