Claude Code Parallel Agents: Fork, Serialize, Cap at 2
Claude Code parallel agents run 1.7x faster than sequential — but spawn 3+ and OOM kills everything. Decision framework, benchmarks, production examples.
Parallel Agents by Default: When to Fork, When to Serialize, and Why I Cap at 2 | AI PM Portfolio
Parallel Agents by Default: When to Fork, When to Serialize, and Why I Cap at 2
April 11, 2026 · 12 min read · Claude Code
Last Updated: 2026-04-11
Claude Code parallel agents -- subagents running concurrently on independent tasks -- complete work approximately 1.7x faster than sequential execution. The default should be parallel, not serial. But there is a hard ceiling: on a 16GB machine, 3 or more concurrent agents trigger memory exhaustion that corrupts git, kills builds, and disconnects MCP servers. Two concurrent agents is the empirical sweet spot. This post covers the decision framework for when to fork, when to serialize, and why "more agents" does not mean "faster."
Why should parallel be the default, not the exception?
Most developers using AI coding assistants work sequentially: ask a question, wait for the answer, ask the next question. This is the natural pattern because it mirrors how humans think -- one thing at a time. But AI agents are not constrained by single-threaded cognition. When two tasks share no dependencies, running them one after another wastes exactly the duration of the shorter task.
I tracked my own usage over 30 days. Before adopting parallel-by-default, I averaged 195 interactions per day, all sequential. After switching to parallel execution for independent tasks, I completed the same volume of work in approximately 70% of the time -- a consistent 1.7x throughput improvement measured across 4 weeks. The remaining 0.3x gap from a theoretical 2x speedup comes from shared I/O: both agents read and write to the same filesystem, the same git repository, and the same terminal output buffer.
According to Anthropic's Claude Code documentation, subagents can run in parallel or in background mode. The documentation recommends parallel execution for independent research, code generation, and file exploration tasks. In practice, this means the tool was designed for parallelism -- the serial workflow most people default to is leaving performance on the table. I covered the broader agent architecture in my Claude Code operating system post.
How do you decide whether to parallelize or serialize?
The decision framework is a single question: does Task B need Task A's output? If no, parallelize. If yes, serialize. Every other consideration -- complexity, duration, risk -- is secondary to this dependency check.
START: You have Tasks A and B | v Does Task B need Task A's output? | +-- YES --> Serialize: Run A, then B | | | v | Does Task B need ALL of A's output, | or just a subset? | | | +-- SUBSET --> Start B when subset is ready | +-- ALL ----> Wait for A to complete | +-- NO ---> Parallelize: Run A and B together | v Do you need B's result before proceeding? | +-- YES --> Foreground: wait for both +-- NO ---> Background: fire B, continue with A
This flowchart governs every multi-task decision in my workflow. The key insight is the second branch on the serialize path: sometimes Task B only needs a small piece of Task A's output. In that case, you can start B as soon as that piece is available, rather than waiting for A to fully complete. This partial-dependency pattern comes up frequently in practice -- for example, an agent exploring a codebase to find a function signature (Task A) while another agent writes tests (Task B) that only needs the function name to begin scaffolding.
What are concrete examples of each pattern?
Here are five real scenarios from my daily workflow across a production codebase with 189 API routes, 65+ database tables, and 206 modules:
- Parallel (no dependency): Drafting two blog posts simultaneously. Post A about agent architecture and Post B about deployment pipelines share zero content. Fork both, write both, merge both. Time saved: 45 minutes on a 90-minute total task.
- Parallel (no dependency): Running a security review agent and a frontend review agent on the same PR. The security agent checks API routes, RLS policies, and auth middleware. The frontend agent checks component patterns, accessibility, and bundle size. They examine different files with different criteria. I detailed the 4-reviewer pattern in a previous post.
- Parallel (no dependency): One agent explores the codebase to map data flow while another agent writes unit tests for an existing, well-defined function. The exploration agent's output might inform future tests, but the current test batch does not need it.
- Serialize (hard dependency): Generating a database migration (Task A) and then updating the TypeScript types that depend on the new schema (Task B). B literally cannot start until A defines the column names and types.
- Partial dependency: Agent A extracts field mappings from a PDF document. Agent B needs the field names to build a form UI, but not the full extraction logic. B can start scaffolding the form component structure as soon as A outputs the field list -- typically within the first 30 seconds of a 3-minute extraction task.
Why cap at 2 concurrent agents?
This is the lesson that cost me a corrupted git repository and a lost afternoon.
Each Claude Code agent spawns its own Node.js process tree. Within each agent, tool calls can trigger additional processes: tsc for type checking, vitest for test runs, esbuild for bundling, MCP server connections. A single active agent on a moderately complex project can consume 2-4 GB of RAM. Two agents sit comfortably within a 16GB machine's budget, leaving headroom for the OS, browser, and other applications.
At three concurrent agents, the math breaks. Here is what I measured on a 16GB MacBook Pro:
| Concurrent Agents | Peak RAM Usage | Throughput vs 1 Agent | Failure Rate | Observed Issues |
|---|---|---|---|---|
| 1 (sequential) | 3.2 GB | 1.0x (baseline) | 0% | None |
| 2 (parallel) | 6.8 GB | 1.7x | <1% | Occasional I/O contention on git |
| 3 (parallel) | 11.4 GB | 1.4x | 12% | Swap thrashing, slow responses |
| 5 (parallel) | 16+ GB (swap) | 0.6x | 45% | OOM kills, git corruption, MCP disconnects |
The 5-agent row deserves emphasis: throughput dropped below a single sequential agent. At 5 concurrent agents, the system spent more time swapping memory pages to disk than doing actual computation. The operating system's OOM killer terminated processes unpredictably -- sometimes killing tsc (exit code 138, SIGUSR1), sometimes killing npm run build (exit code 137, SIGKILL), and in one memorable incident, corrupting the .git/HEAD file by killing a git process mid-write.
The OOM corruption incident: Five background agents were running simultaneously. The system ran out of physical memory. The OS killed a git process while it was writing to .git/HEAD, leaving the file empty -- zero bytes. Every subsequent git command failed with "fatal: not a git repository." Recovery required manually writing ref: refs/heads/staging back into the HEAD file. The entire incident took 90 minutes to diagnose and fix. One 90-minute debugging session permanently convinced me: cap at 2.
What about machines with more RAM?
On a 32GB machine, you could theoretically run 3-4 agents safely. On a 64GB machine, perhaps 5-6. But RAM is not the only bottleneck. Filesystem I/O becomes contended when multiple agents write to the same repository. Git locks prevent concurrent writes to the index. MCP server connections share a limited pool. In my testing, even on a 32GB machine, the throughput improvement from 3 agents over 2 agents was marginal -- approximately 1.9x versus 1.7x -- because I/O contention became the dominant bottleneck once memory pressure was resolved.
The Amdahl's Law framing is useful here: the serial portion of the work (git operations, file writes, terminal output) limits the theoretical speedup regardless of how many parallel agents you add. For a typical Claude Code workflow, approximately 15-20% of operations are inherently serial, which caps the theoretical maximum speedup at roughly 5-6x even with infinite agents and infinite RAM. Practical gains plateau well before that.
What is the difference between foreground and background agents?
Claude Code offers two modes for parallel work, and choosing the wrong one creates unnecessary blocking or missed results.
| Dimension | Foreground Agent | Background Agent |
|---|---|---|
| When to use | You need the result before your next step | The task is genuinely independent of your current work |
| Blocking behavior | Blocks -- you wait for completion | Non-blocking -- you continue working |
| Result visibility | Immediately visible in conversation | Notification when complete; read output later |
| Best for | Code generation you will edit next, research that informs your next prompt | Test runs (>30s), web research, doc lookups, builds, linting |
| Risk | Wastes time if you could have done other work while waiting | Missed context if you needed the result and did not check |
| Memory impact | Active in current context window | Separate context, lower impact on main thread |
My rule of thumb: if a task will take more than 30 seconds and its output does not gate my next action, it goes to background. Test suites, build verification, web research, documentation lookups -- all background. Code generation I plan to review and modify immediately stays in the foreground. I wrote about the broader background task patterns in my automation post.
How does this work in a real session?
Here is a typical morning workflow from my production environment. The timestamps are real, measured across multiple sessions:
# 9:00 AM - Morning briefing
# Foreground: Load project context, review overnight CI results
# Background: Agent runs test suite (takes ~2 minutes)
# 9:03 AM - Tests running in background, I start feature work
# Foreground: Write new API route for document processing
# Background: (still running tests from 9:00)
# 9:05 AM - Test results arrive, no failures
# Foreground: Continue API route implementation
# Background: Agent explores codebase for similar patterns to reference
# 9:08 AM - Pattern exploration complete, confirms approach
# Foreground: Finish API route, write integration test
# Background: Agent runs security review on the new route
# Total elapsed: 8 minutes
# Sequential equivalent: ~14 minutes (1.75x improvement)The key pattern: background agents handle tasks that would otherwise become "dead time" -- periods where you are waiting for a result with nothing productive to do. By filling dead time with independent work, you eliminate the gaps without adding cognitive overhead.
What is the anti-pattern and why does spawning 5 agents make everything slower?
The most common mistake I see -- and one I made repeatedly before learning -- is the "shotgun" approach: spawning as many agents as possible under the assumption that more parallelism equals more speed. Here is why it fails:
- Memory pressure creates swap thrashing. Once physical RAM is exhausted, the OS pages memory to disk. Disk I/O is 100-1000x slower than RAM access. Every agent slows down, not just the one that triggered the swap.
- Context switching overhead compounds. The CPU must context-switch between agent processes. At 2 agents, this overhead is negligible. At 5 agents, it can consume 10-15% of CPU cycles -- cycles that produce zero useful output.
- Git lock contention serializes I/O. Git uses lockfiles to prevent concurrent index writes. When 5 agents all try to read or write git state, they serialize on the lock. The theoretical parallelism collapses to serial execution with extra overhead.
- MCP server connection limits. MCP servers maintain connection pools. Saturating the pool with 5 concurrent agent connections causes timeouts and retries, adding latency that would not exist with fewer agents.
- Error recovery cascades. When one agent's process gets OOM-killed, it can leave corrupted state (lockfiles, partial writes, broken pipes) that causes other agents to fail. One failure becomes five.
The throughput curve is not linear. It peaks at 2 agents (1.7x), degrades at 3 (1.4x), and inverts at 5 (0.6x). This is not theoretical -- these are measurements from a 16GB MacBook Pro running a production Next.js/Supabase codebase with 11 active MCP server connections.
How does the comparison look across all execution modes?
| Mode | Throughput | RAM Required | Best For | Risk Level |
|---|---|---|---|---|
| 1 agent, sequential | 1.0x baseline | 3-4 GB | Complex dependent tasks, debugging, initial exploration | Low |
| 2 agents, parallel | 1.7x | 6-8 GB | Independent research, review + build, draft + test | Low |
| 3 agents, parallel | 1.4x | 10-12 GB | Only on 32GB+ machines with lightweight tasks | Medium |
| 5+ agents, parallel | 0.6x (slower than 1) | 16+ GB (swap) | Never -- this is the anti-pattern | High (OOM, corruption) |
| Worktree mode | 1.7-1.9x | 8-10 GB | Working on separate branches simultaneously (hotfix + feature) | Low |
Worktree mode deserves special mention. Git worktrees allow you to check out multiple branches in separate directories simultaneously, eliminating git lock contention entirely. When two agents work on different branches (for example, a hotfix on main while feature development continues on a staging branch), worktree mode achieves closer to 1.9x throughput because each agent has its own .git working directory. The tradeoff is higher disk usage and slightly more complex branch management.
What rules should you follow when implementing parallel agents?
After two months of daily use with parallel agents, these are the operational rules I enforce:
- Default to parallel. When you have 2+ independent tasks, do not ask whether to parallelize -- just do it. The sequential default is a cognitive bias, not an optimal strategy.
- Hard cap at 2. On a 16GB machine, never exceed 2 concurrent agents. If you are on 32GB, you can experiment with 3, but measure before committing.
- Background anything over 30 seconds. Test runs, builds, web research, documentation lookups -- if it takes more than 30 seconds and does not block your next step, run it in the background.
- Review before committing. When a background agent completes, review its output before committing any changes. Background agents lack the context of your foreground conversation and can make locally-correct but globally-wrong decisions.
- Kill before spawning. If your machine was recently under heavy load, run
pkill -9 -f nodeto clear orphaned processes before starting a new batch of agents. Zombie processes from killed agents consume memory silently. - Prefer foreground for critical work. File splits, database migrations, commits, and deployments should always run in the foreground where you can observe and intervene immediately.
What does the data show after 30 days of parallel-by-default?
I tracked metrics across 30 consecutive workdays after adopting parallel-by-default execution:
- Average tasks per day: 195 (unchanged from sequential baseline -- same workload, faster completion)
- Average completion time per task batch: reduced by 30% (from sequential baseline)
- OOM incidents: 0 in 30 days (after enforcing the 2-agent cap). Previously: 3 incidents in 10 days when running 5+ agents.
- Git corruption incidents: 0 in 30 days (previously 2 incidents requiring manual HEAD file repair)
- Effective throughput: 1.7x average across all parallel sessions, consistent within +/- 0.1x
- Tasks that benefited most from parallelism: blog drafting (2 posts simultaneously), code review (security + frontend in parallel), and codebase exploration paired with active development
The most significant finding is not the speed improvement. It is the reliability. Before the 2-agent cap, I was faster on good days and catastrophically slower on bad days (OOM recovery, git corruption repair). After the cap, every day is consistently 1.7x. Predictable throughput beats peak throughput.
The practitioner's takeaway: Parallel agents are a multiplier, not a silver bullet. The maximum safe multiplier on commodity hardware is 1.7x. Attempting to exceed that by adding more agents produces negative returns. The discipline is knowing when 1.7x is enough -- and it almost always is.
Frequently Asked Questions
How many Claude Code agents can run in parallel safely?
On a 16GB machine, two concurrent agents is the safe maximum. Each agent consumes 3-4 GB of RAM including spawned processes (tsc, vitest, esbuild). At three agents, memory pressure causes swap thrashing and degrades performance. At five agents, the OOM killer terminates processes unpredictably, potentially corrupting git state. On 32GB machines, three agents are feasible but offer marginal throughput gains over two due to I/O contention.
What is the actual speedup from running 2 parallel agents versus 1?
Approximately 1.7x, measured across 30 days of production use. The gap from a theoretical 2x speedup comes from shared I/O: both agents access the same filesystem, git repository, and terminal output buffer. Amdahl's Law applies -- the serial portion of the work (approximately 15-20% of operations) limits the maximum possible speedup regardless of parallelism.
When should you use background agents versus foreground agents?
Use background agents for tasks exceeding 30 seconds whose output does not gate your next action: test suites, builds, web research, documentation lookups. Use foreground agents when you need the result before proceeding: code generation you will immediately edit, research that shapes your next prompt, or any task where the output determines your next decision.
Does the 2-agent cap apply to Claude Code Agent Teams?
Agent Teams use a coordinator pattern where a lead agent delegates to specialist subagents. The memory constraint still applies: each active subagent consumes RAM. In practice, Agent Teams manage this by serializing subagent execution or limiting concurrency internally. The 2-agent cap is a hardware constraint, not a software one -- it applies regardless of how agents are orchestrated.
How do you recover from OOM-related git corruption?
If the OOM killer corrupts .git/HEAD (zero-byte file), write the branch reference back manually: echo "ref: refs/heads/main" > .git/HEAD followed by touch .git/packed-refs. If corruption is deeper (missing objects, broken index), a fresh clone is safer than attempting repair. Prevention is better: enforce the 2-agent cap and monitor memory usage during parallel sessions.
Dinesh Challa is an AI Product Manager building production software with Claude Code. Follow him on LinkedIn.
Published April 11, 2026. Part of a series on Claude Code workflows and AI-assisted development, covering 20 custom agents, 8 skills, and 11 hooks in a production environment.