How I Built 12 Custom AI Agents and 9 Hooks to Automate My Entire Dev Workflow in Claude Code

Share
How I Built 12 Custom AI Agents and 9 Hooks to Automate My Entire Dev Workflow in Claude Code

How I Built 12 Custom AI Agents and 9 Hooks to Automate My Entire Dev Workflow in Claude Code

A practitioner walkthrough from a single session that changed how I ship code every day.

Last week I sat down for what I thought would be a quick audit of my Claude Code hooks. Three hours later, I had 12 custom AI agents, 9 optimized hooks, and a 1,200-line reference doc that fundamentally changed my daily development workflow. This isn't a tutorial — it's the story of what I actually built, why I built it, and what it looks like to use Claude Code as a full operating system for software development.

The Starting Point: Four Basic Hooks

I'd been running Claude Code with four hooks that felt good enough:

  1. PostToolUse — Prettier auto-format on every Write/Edit operation
  2. PreToolUse — Safety check blocking git push --force, git reset --hard, and rm -rf
  3. Stop — macOS notification when a task finishes (so I can context-switch while waiting)
  4. SessionStart — Directory guard blocking sessions launched from the home directory

These four hooks were fine when I was running 30-40 interactions a day. But my usage had grown to about 195 interactions daily across multiple agents, and the cracks were showing.

The Incidents That Forced My Hand

Four specific incidents made me realize my hooks were dangerously insufficient:

The invisible type error. I was developing with Turbopack, which skips full tsc type checking for speed. My code worked perfectly in dev. I pushed to staging. Vercel's build ran tsc, found 3 type errors, and failed. I didn't catch it for hours because I assumed the push was fine. This happened three times in one week.

The production contamination. A feature branch accidentally got pushed to staging, then auto-promoted to production. The branch had experimental code that wasn't ready. Rolling back required a full revert commit and a Slack apology. The root cause? Nothing prevented feature/* branches from pushing to staging or main.

The scope creep commit. I asked an agent to fix a bug in one file. The agent fixed the bug, then "helpfully" reformatted imports in 8 other files, added comments to 5 unrelated files, and created a new utility function. The commit touched 20+ files. The PR review was a nightmare because the actual fix was buried in noise.

The accidental staging of secrets. git add . picked up an .env.local file that a teammate had accidentally created outside .gitignore. We caught it before push, but only by luck.

Each of these was preventable with better hooks. So I built them.

Upgrading to 9 Hooks

I added 5 new hooks, upgraded 1 existing one, and optimized all of them with the if field introduced in Claude Code v2.1.85.

Hook 1: Pre-Push Type Checking

This hook runs tsc --noEmit before every push. But there's a subtlety — stale .next/types cache can produce false type errors that block legitimate pushes. So the hook cleans the cache first:

{
  "event": "PreToolUse",
  "hook_type": "command",
  "if": "Bash(git push *)",
  "command": "rm -rf .next/types && npx tsc --noEmit 2>&1",
  "timeout_ms": 120000,
  "exit_codes": {
    "0": "pass",
    "2": "block"
  }
}

The 120-second timeout handles large codebases. Exit code 2 means hard block — no override possible. If types don't pass, you don't push.

Hook 2: Branch Isolation Enforcement

This one directly prevents the production contamination incident. Feature branches can only push to their own remote. Staging and main are protected:

{
  "event": "PreToolUse",
  "hook_type": "command",
  "if": "Bash(git push *)",
  "command": "branch=$(git branch --show-current) && if [[ \"$branch\" == feature/* ]] && echo \"$TOOL_INPUT\" | grep -qE '(staging|main)'; then echo 'BLOCKED: feature branches cannot push to staging or main' && exit 2; fi && exit 0",
  "timeout_ms": 5000,
  "exit_codes": {
    "0": "pass",
    "2": "block"
  }
}

Hook 3: git add . Blocker

This is a hard block on git add ., git add --all, and git add -A. It forces specific file staging every time:

{
  "event": "PreToolUse",
  "hook_type": "command",
  "if": "Bash(git add *)",
  "command": "if echo \"$TOOL_INPUT\" | grep -qE 'git add (\\.|--all|-A)'; then echo 'BLOCKED: Use specific file paths instead of git add . / --all / -A' && exit 2; fi && exit 0",
  "timeout_ms": 5000,
  "exit_codes": {
    "0": "pass",
    "2": "block"
  }
}

This is the simplest hook and the one I'm most grateful for. It has blocked 47 instances of lazy staging in two weeks.

Hook 4: Scope Creep Detector

This fires after every git commit and counts how many files are staged. If it's more than 10, it raises a soft warning:

{
  "event": "PostToolUse",
  "hook_type": "command",
  "if": "Bash(git commit *)",
  "command": "count=$(git diff --cached --name-only | wc -l | tr -d ' ') && if [ \"$count\" -gt 10 ]; then echo \"WARNING: $count files staged — review for scope creep\" && exit 1; fi && exit 0",
  "timeout_ms": 5000,
  "exit_codes": {
    "0": "pass",
    "1": "warn"
  }
}

Exit code 1 is key here — it's a soft warning. The user is prompted to approve or deny. Sometimes 12 files is legitimate (a migration touching many tables). But the pause forces you to actually look.

Hook 5: Post-Compaction Context Recovery

When Claude Code compacts context (either automatically or via /compact), you lose working context like your current branch, recent commits, and modified files. This hook re-injects that context automatically:

{
  "event": "PostCompact",
  "hook_type": "command",
  "command": "echo '--- Context Recovery ---' && echo \"Branch: $(git branch --show-current)\" && echo '--- Last 5 commits ---' && git log --oneline -5 && echo '--- Modified files ---' && git status --short",
  "timeout_ms": 10000,
  "exit_codes": {
    "0": "pass"
  }
}

This one is subtle but important. Before I had it, I'd lose 10-15 minutes after every compaction re-establishing context. Now it's automatic.

Upgraded: Expanded Safety Check

The original safety check only blocked force push, reset --hard, and rm -rf. I expanded it to also block:

  • git checkout . — discards all unstaged changes
  • git restore . — same thing, newer syntax
  • git clean -f — deletes untracked files permanently
  • Direct push to main — all production pushes must go through staging first

The Optimization: The if Field

The most impactful optimization was adding the if field to hooks. Before v2.1.85, every PreToolUse hook ran on every Bash command. My type-checking hook was spawning a process on every ls, cat, and echo. With the if field:

"if": "Bash(git push *)"

The hook only fires when the command starts with git push. This eliminated roughly 90% of unnecessary hook process spawns. My session felt noticeably faster.

The Exit Code Strategy

Getting exit codes right matters more than you'd think:

  • Exit 0 — Pass silently. Used for most hooks when the check passes.
  • Exit 1 — Soft warning. User is prompted to approve or deny. Used for the scope creep detector because sometimes large commits are legitimate.
  • Exit 2 — Hard block. No override possible. Used for safety checks, type checking, and branch isolation. If these fail, there is no valid reason to proceed.

The Deep Research Phase

With hooks solid, I wanted to understand everything Claude Code could do that I wasn't using. I ran two research agents in parallel:

  • Agent 1: Comprehensive power-user feature research — hooks, MCP servers, context management, agents, skills, memory, prompts, and automation patterns
  • Agent 2: Full changelog analysis from v2.0.41 through v2.1.91, extracting every feature addition and behavioral change

The combined output was a 1,200-line reference document covering 10 sections. Some discoveries that changed my workflow immediately:

/btw for side questions. When you're mid-task and want to ask Claude something unrelated, /btw asks the question without polluting the current context. I was creating new sessions for side questions. This is 10x faster.

/compact <focus> for proactive compaction. Instead of waiting for auto-compaction to fire (which happens at inconvenient moments), you can proactively compact with a focus hint: /compact keep the migration changes and test results. This preserves what matters and discards what doesn't.

Session naming. claude -n "migration-review" starts a named session. /rename renames the current one. When you're running 5+ sessions, names are the difference between productivity and chaos.

/context for token budget auditing. Shows exactly how many tokens are used by CLAUDE.md, memory files, MCP schemas, and conversation history. I discovered my MCP schemas alone were consuming 15% of my context window.

CLAUDE_CODE_NO_FLICKER=1 — An environment variable that eliminates the flickering during output rendering. Small thing, huge quality-of-life improvement.

--bare -p for scripted calls. When calling Claude Code from scripts or cron jobs, these flags skip the interactive UI and produce clean output. About 14% faster for automated tasks.

21+ hook events. Most people know about PreToolUse and PostToolUse. But there's also PostCompact, SessionStart, Notification, Stop, SubagentStop, and more. I was using 4 of 21+ available events.

Building 12 Custom Agents

With the reference doc complete, I built 12 specialized agents in the .claude/agents/ directory. Each agent has YAML frontmatter specifying its model, effort level, max turns, and allowed tools. Here's what I built and why.

Core Workflow Agents

1. deploy-monitor (Sonnet, background mode)

Runs automatically after git push. Checks Vercel deployment status with retries for in-progress builds, hits health endpoints, checks Sentry for new errors in the last 10 minutes. Returns a structured report: deployment status, health check results, Sentry errors, and build time. Before this agent, I'd push and then manually check three different dashboards. Now I push and get a single report.

2. migration-reviewer (Opus, high effort)

Reviews database migration SQL before applying. This agent exists because of bugs that took me weeks to diagnose. It checks 6 specific things:

  • SECURITY DEFINER on trigger functions that access the auth schema (without it, triggers fail silently due to RLS)
  • Foreign key ON DELETE SET NULL requiring corresponding UPDATE RLS policies (missing policies cause cascading permission denied errors)
  • Missing role-specific RLS policies
  • Realtime publication implications
  • Column name verification against the actual schema
  • Multi-statement transaction wrapping

Every check in this list corresponds to a real bug I shipped to production.

3. scope-check (Sonnet)

Pre-commit review agent. Reads git diff --cached and classifies each changed file as Related, Infrastructure, or Unrelated based on the task description. It specifically catches: new tests for unchanged code, refactored imports in untouched files, added comments to unmodified files, unused utility functions, and unexpected package.json changes. Outputs a clear verdict: CLEAN or REVIEW N FILES.

4. incident-triage (Opus, high effort)

Production fire investigator. When something breaks, this agent runs 5 investigations in parallel: Sentry errors from the last 24 hours, Vercel deployment history, runtime logs, recent commits, and targeted code inspection. It has a "known failure patterns" section built from real incidents I've experienced:

  • Environment variable trailing newlines (copy-paste from dashboards adds invisible \n)
  • Missing maxDuration on serverless wrapper routes (causing silent timeouts)
  • Stale build cache producing phantom errors
  • Module-level class instantiation that crashes during build but works in dev
  • Foreign key + RLS permission denied cascades

This agent has cut my incident response time from 45 minutes to about 10.

5. pre-push-audit (Sonnet, high effort)

Semantic code audit that hooks can't do. Hooks check syntax — this agent checks semantics. It reviews all files changed since the last push for:

  • console.log in production code (allowed only in the logger module and test files)
  • Hardcoded API keys or secrets
  • Missing timeout configuration on serverless wrapper routes
  • Missing input validation on dynamic routes
  • Module-level environment variable instantiation that crashes during build
  • Forbidden brand colors in UI files (we use design tokens)

6. context-scout (Sonnet)

Pre-feature exploration in an isolated context. Before starting any significant feature, I run this agent to explore the codebase area. It maps: related files and their sizes, existing patterns I should reuse, recent git history in the area, test coverage, dependency ripple effects, and potential conflicts with other in-progress work. Returns a 50-line brief. The main session stays clean — no wasted context on exploration.

Operations Agents

7. db-health (Opus, high effort)

Weekly database audit. Runs 8 checks against Supabase: RLS coverage across all tables, table sizes and row counts, missing indexes (high sequential scan counts), unused indexes wasting space, orphaned foreign key references, trigger functions missing SECURITY DEFINER, realtime publication gaps, and connection pool stats. Outputs a prioritized report. I run this every Monday morning.

8. pr-preparer (Sonnet, high effort)

One command from "code done" to "PR open." Pipeline: scope check, pre-push audit, tsc type check, stage specific files, write commit message, push to staging, create PR with summary. It never uses git add .. If any step fails, it stops and reports. This agent replaced a 12-step manual process.

9. extraction-debugger (Opus, high effort)

Debugs document processing pipeline failures. Takes a document ID, pulls extraction records from the database, checks for known error codes (password_protected, corrupt_file, api_error, parse_error), reads pipeline source code, checks the prompt management system for drift, and verifies the file exists in storage. Has 5 documented common failure patterns built from real incidents.

10. client-pulse (Sonnet)

Daily pipeline status brief. Runs 6 SQL queries: pipeline distribution by status, stuck items (more than 3 days in the same status), unpaid invoices, missed appointments, recent completions, and upload activity. Outputs a 20-line morning brief with recommended actions. I run this every morning before standup.

11. blog-publisher (Sonnet, high effort)

Content publishing pipeline. Takes a markdown draft, optimizes for SEO (title, meta description, slug, heading structure, internal links), formats for the CMS, publishes via API, and verifies the post is live. Includes an SEO checklist. Before this agent, publishing a post involved 8 manual steps across 3 tools.

12. prompt-auditor (Opus, high effort)

Weekly AI prompt quality audit. Inventories all prompt references in the codebase, fetches active versions from the prompt management system (Langfuse), compares for drift between managed prompts and hardcoded fallbacks. Checks that anti-regression rules are present, estimates token costs per prompt, and verifies prompt cache compatibility (dynamic content positioned incorrectly breaks caching). I run this weekly because prompt drift is silent and expensive.

Model Selection Strategy

Choosing between Opus and Sonnet for each agent wasn't arbitrary. The decision framework:

Opus for high-stakes agents where mistakes are costly or the reasoning is complex: migration-reviewer (SQL bugs are weeks to diagnose), incident-triage (production is down, accuracy matters), extraction-debugger (pipeline debugging requires deep reasoning), db-health (database recommendations need nuance), and prompt-auditor (prompt quality directly impacts product quality).

Sonnet for speed-oriented agents where iteration speed matters more than depth: deploy-monitor (checking dashboards, not reasoning), scope-check (classification task), context-scout (exploration, not decision-making), client-pulse (SQL queries and formatting), pr-preparer (orchestration, not analysis), blog-publisher (formatting and API calls), and pre-push-audit (pattern matching).

The cost difference is significant — Opus agents cost roughly 3-4x more per run. But for database migrations and production incidents, I want the best reasoning available.

Making It Compound

Building agents is useless if the knowledge doesn't persist across sessions. Three things make this compound:

1. On-demand /tips skill. The 1,200-line reference doc is loaded via a skill that only activates when you type /tips. Zero context cost until invoked. This means every session has access to the full power-user guide without paying for it in tokens.

2. Memory references. A memory entry ensures every future session — including subagents — knows the agents exist, what they do, and where the reference doc lives. No repeated discovery.

3. Environment optimization. CLAUDE_CODE_NO_FLICKER=1 in the shell profile for flicker-free rendering. Small, but it eliminates a constant visual annoyance that accumulates over 195 daily interactions.

My Daily Workflow Now

Here's what a typical day looks like with all 12 agents and 9 hooks active:

Morning

@context-scout [feature-name]     # Explore before coding
... implement the feature ...
@scope-check [task description]    # Before commit — catches scope creep
@migration-reviewer                # If SQL migration involved
@pre-push-audit                    # Before push — semantic code review
git push staging                   # Hooks run tsc + branch isolation check
@deploy-monitor                    # Watches build in background, reports when done

Weekly

@db-health                         # Monday morning database audit
@prompt-auditor                    # AI prompt drift check
@client-pulse                      # Pipeline status for standup

When Things Break

@incident-triage [symptoms]        # Instant parallel diagnosis
@extraction-debugger [document-id] # Document pipeline issues

Shipping

@pr-preparer [description]         # Code to PR in one command

Results After Two Weeks

Some numbers after running this setup for two weeks:

  • Type errors caught pre-push: 14 (previously these broke Vercel builds and cost 20-30 minutes each)
  • Branch isolation violations blocked: 6 (any one of these could have been a production incident)
  • Scope creep warnings: 23 (of which 18 led to splitting the commit into smaller, focused commits)
  • git add . blocks: 47 (the agents really want to use git add . — now they can't)
  • Incident response time: Down from ~45 minutes to ~10 minutes
  • PR preparation time: Down from ~15 minutes (manual) to ~3 minutes (one command)
  • Context recovery after compaction: Instant (was 10-15 minutes of re-establishing context)
  • False positive rate on hooks: Less than 5% — the if field optimization eliminated most noise

The compounding effect is what matters most. Each agent and hook removes a class of mistakes permanently. I'm not just faster — I'm making fewer errors, catching them earlier, and spending my cognitive energy on the actual work instead of on process.

What I'd Do Differently

If I were starting over:

  1. Build the hooks first. Agents are powerful, but hooks are the foundation. They run automatically on every interaction. Get those right before building anything else.
  2. Start with the if field. Don't build hooks without conditional execution. The performance difference is dramatic.
  3. Use exit code 2 more than exit code 1. I initially made too many hooks soft warnings (exit 1). The whole point of automation is removing the need for human judgment on routine checks. If the type checker fails, there's no valid reason to override it.
  4. Run the deep research first. I built my initial 4 hooks without knowing about the if field, PostCompact events, or session naming. Thirty minutes of research would have saved me hours of iteration.

Getting Started

If you're using Claude Code with just the defaults, here's my recommended order:

  1. Add the safety hook (blocks force push, reset hard, rm -rf) — 5 minutes
  2. Add the git add . blocker — 2 minutes
  3. Add pre-push type checking — 10 minutes
  4. Run /context to see your token budget — 1 minute
  5. Add CLAUDE_CODE_NO_FLICKER=1 to your shell profile — 1 minute
  6. Build your first agent for whatever you do most often (for me it was deploy monitoring)

The whole setup took me one session. The ROI has been measured in incidents prevented, not just time saved.


I'm building a production app with Next.js, Supabase, and Vercel — running about 195 Claude Code interactions per day across multiple agents. If you're using Claude Code for serious development work, I'd love to hear what hooks and agents you've built. The best ideas come from practitioners, not documentation.