AI NewsletterSubscribe →
Resource HubReference

Sub-Agents and Parallel Execution Patterns

The Task tool spawns sub-agents with their own context windows. When to spawn them, how to brief them, the write-to-disk discipline that protects context budget, and how to run agents truly in parallel.

Larry Maguire

Larry Maguire

GenAI Skills Academy

Sub-agents are how Claude Code handles work that would otherwise overwhelm the main conversation. They are separate Claude instances spawned via the Task tool, each running in its own context window, each capable of running in parallel. Used well, they make Claude Code productive on tasks that are too large or too varied for a single session. Used badly, they fragment work into messes that take longer than doing it directly.

This article covers when to spawn sub-agents, how to brief them so they actually help, the discipline that protects your context budget, and how to run them genuinely in parallel.

For the conceptual primer on what sub-agents are versus skills versus commands, see Slash Commands, Skills, and Agents. This article assumes that's understood.

What spawning a sub-agent actually does

When the main Claude Code session calls the Task tool, behind the scenes it starts a new Claude instance. The new instance:

  • Has its own fresh context window (200K, or 1M if configured)
  • Inherits the workspace's .claude/CLAUDE.md and rules files
  • Loads any skills configured for that agent type
  • Receives the brief you wrote in the Task call as its only user message
  • Does not see the parent's conversation history
  • Does not share session-scoped permission decisions — anything sensitive prompts again

It then runs autonomously until it produces a final message. That message goes back to the parent. Any files the sub-agent wrote to disk are persisted; they were real file edits, not isolated to the agent's context.

Crucially: the sub-agent does not return mid-task progress to the parent. The parent waits for the final message and gets only that. Long sub-agent runs feel like a black box from the parent's perspective.

When to spawn a sub-agent

Heavy reading that would clutter the main context

If a task requires reading 5+ files of any size, or 1-2 large files (PDFs over 50 pages, transcripts, big spreadsheets), spawn a sub-agent to do the reading and return only the synthesis. The main context never sees the raw material; you get the conclusion.

The Explore agent type is built for this. Brief it with the question and the search scope; it returns a summary of what it found.

Parallelisable work

Five clients to research. Six PDFs to summarise. Twelve files to grade. Anything where the items are independent of each other is a candidate for parallel sub-agents — one per item, all spawned in a single response, all running concurrently.

This is the highest-leverage pattern. Sequential processing of 6 PDFs takes 6× the time. Parallel processing takes roughly the time of the slowest one.

Bounded research or planning

Tasks that have a clear, narrow goal and don't need ongoing dialogue benefit from a focused sub-agent context. The Plan agent type is good for "design a solution to X without implementing it"; the general-purpose agent handles broader tasks where you don't have a more specific agent type defined.

Work where the parent's context is already heavy

If you've been working in the same conversation for a long time and the context is filling up, delegating the next big task to a sub-agent gives that task a clean context to work in. The parent stays lean.

When NOT to spawn a sub-agent

  • Quick tasks. If the task is one or two tool calls in the main context, just do it. Sub-agent overhead (briefing, spawning, waiting, parsing return) is more than the work.
  • Tasks needing dialogue. Sub-agents are one-shot. They cannot ask follow-up questions. If you don't know exactly what you want at the start, the back-and-forth has to happen in the main context.
  • Tasks where you want every step visible. Sub-agent activity is opaque to the parent until the final message. If you need to watch what's happening (security-sensitive operations, learning a new tool), keep the work in the main context.
  • Tasks the parent has half the context for. If the parent has just read a file that the sub-agent needs, you have to brief that file's content into the sub-agent. Often quicker to keep going in the parent.

How to brief a sub-agent

The brief is the entire context the sub-agent has for its work. Skimping on the brief produces shallow generic output. Over-briefing wastes the sub-agent's context budget. The right brief includes:

  1. Goal — what the sub-agent is trying to accomplish, in one sentence
  2. Why it matters — enough context for the sub-agent to make judgement calls (otherwise it just follows literal instructions and misses the point)
  3. What you've already learned or ruled out — so the sub-agent doesn't repeat work
  4. Success criteria — how you'll know the output is good
  5. Output constraint — explicitly ask for a short return; "report in under 200 words" or "write to disk and return only the file path"
  6. Specific files, paths, or commands if the sub-agent needs to find specific things

Compare:

Weak brief:

Research competitor X.

Strong brief:

Research competitor Acme Consulting (acmeconsulting.com). Goal: understand their pricing, service mix, and target client segment so we can position our proposal differently. Already know their Dublin office is small and they focus on SaaS. Look at their blog, case studies, and pricing page; ignore press coverage. Write findings to /tmp/acme-research.md as five bullet points; return only the file path and a one-line headline. Under 5 minutes.

The write-to-disk discipline

The single most important pattern for sub-agents: have them write to disk and return only a short summary. Do not let sub-agents return long output into the parent's context.

Why: every sub-agent return becomes part of the parent's context. If you spawn 10 agents that each return 500 lines of output, you've added 5,000 lines to your main context for what could have been 10 file paths and 10 one-liners. The whole point of parallelism is wasted if the consolidation is bloated.

The pattern in every brief:

Write your full result to [path]. Return only the file path and a 1-line summary. Do not return the full content.

The parent then reads from disk only the files it needs for the next step. Often that's 1-2 of the agent outputs, not all of them.

Running agents in parallel

To run multiple sub-agents concurrently, the parent must spawn them all in a single response — multiple Task tool calls in one message. Spawning sequentially (one Task call, wait for return, next Task call) runs them serially.

Example pattern: "Research these 5 competitors" — the parent spawns 5 Task calls in one go, each with one competitor's brief. All 5 run at the same time. Total wall time is about the time of the slowest one, not the sum.

Practical limits:

  • Anthropic limits how many concurrent tool calls a single response can have. Typical batches are 3-10 agents in parallel; beyond that, batch them.
  • Each parallel agent costs separately on token usage. 10 agents in parallel costs roughly 10× one agent's tokens. The wall-time saving is real; the cost saving is not.
  • Genuinely independent work parallelises well; work where each item depends on the previous does not.

Sub-agent return shapes

Three patterns for what comes back to the parent, in increasing usefulness:

  1. Inline content — the sub-agent returns its full output as the final message. Use only for short outputs (under 100 lines). Otherwise this defeats the parallelism.
  2. File path + summary — sub-agent writes the work to disk, returns the path and a 1-2 line description. The parent decides whether to read the file. Best general pattern.
  3. Decision + supporting file — sub-agent returns a verdict (e.g. "PASS", "BLOCKED", "needs review") plus a file with the detail. Parent uses the verdict to decide whether to read the file. Good for batch grading or quality checks.

Sub-agent types and when to use each

Explore (built-in)

Read-only search agent. Use to locate code, find files matching patterns, answer "where is X defined" questions. Cannot edit anything. Does not have access to write tools. Returns succinctly.

Use when: you need to find something in a codebase or document set without taking action on it.

Plan (built-in)

Architectural planning agent. Designs implementation strategies, identifies critical files, considers trade-offs. Returns a step-by-step plan.

Use when: you need a recommendation before committing to an implementation; you want a second opinion on an approach.

general-purpose (built-in)

Catch-all agent for tasks that don't fit a more specific type. Has access to most tools.

Use when: the task is broad and doesn't match a specific agent type you've defined.

Custom agents in .claude/agents/

Define your own for recurring patterns. Common custom agent types in real workspaces:

  • researcher — web search + library lookup, returns a structured summary
  • code-reviewer — reads a diff or file, returns a punch list of issues only (no praise)
  • summariser — reads a long document, returns a structured 3-paragraph summary
  • file-organiser — sorts files according to convention rules, returns the move commands

Each custom agent has its own system prompt, restricted tool list, and optional model override. Defining them once means every future invocation is consistent without having to re-brief the role from scratch.

Anti-patterns

  • Spawning agents for one-shot tasks. If the parent could do it in 3 tool calls, the sub-agent overhead is more than the saving.
  • Letting sub-agents return long output. Without the write-to-disk constraint, agent returns flood the parent's context. Always specify what to return.
  • Spawning sequentially when parallel would work. Spawn all independent tasks in one response. The whole reason for sub-agents is parallelism; serial sub-agents are just slower than doing the work in the parent.
  • Briefing without context. Sub-agents don't share the parent's conversation. If the brief assumes context the agent doesn't have, the agent guesses, and the output is wrong.
  • Asking sub-agents for synthesis the parent should do. Don't write "based on your findings, fix the bug" or "based on the research, write the article". The parent should synthesise; sub-agents gather. Synthesis requires the agent to make decisions you can't easily review.
  • Trusting sub-agent claims without verification. An agent's return summary describes what it intended to do, not necessarily what it did. When agents write or edit files, the parent should verify the actual changes before reporting work as done.

A worked parallel example

Task: "Grade 12 student assignments against the rubric."

  1. Parent reads the rubric and one calibration sample
  2. You and Claude agree on the calibration grade ("a B+ would look like this")
  3. Parent spawns 12 sub-agents in one response — one per student. Each gets the rubric, the calibration sample's grade, and one student's submission. Each writes a feedback document to submissions/[student]-feedback.docx and returns just the file path and a letter grade.
  4. All 12 run in parallel. Wall time: about the time of the slowest one (5-10 minutes typically), not 60-120 minutes serial.
  5. Parent receives 12 lines back, one per student. Builds a grade summary table.
  6. You spot-check a few feedback documents on disk; the parent never had to load them all into context.

That pattern — calibrate centrally, parallelise the per-item work, consolidate from disk — generalises to almost any batch task. It's the highest-leverage use of sub-agents.

GenAI Skills Academy

Achieve Productivity Gains With AI Today

Send me your details and let’s book a 15 min no-obligation call to discuss your needs and concerns around AI.