AI NewsletterSubscribe →
Resource HubReference

Context Window Management

Context windows, compaction triggers, the PreCompact hook, the difference between /clear and /compact and a fresh session, and the patterns that keep long sessions productive without losing detail.

Larry Maguire

Larry Maguire

GenAI Skills Academy

The context window is Claude's working memory for a single session — the maximum amount of text it can hold at one time. Everything in the conversation, every file Claude has read, every tool output, every CLAUDE.md and skill definition that's loaded — all of it is in the context window. When the window fills up, something has to give: either the conversation summarises (compaction), or older content drops out, or you start a fresh session.

Managing context well is the difference between long sessions that stay sharp and long sessions where Claude starts forgetting things, repeating itself, or losing the thread. This article covers what the window actually contains, how compaction works, and the patterns that protect detail across long-running work.

Context window sizes by model

Model Standard 1M variant
Haiku 4.5200K
Sonnet 4.6200K1M (alias sonnet[1m])
Opus 4.7200K1M (alias opus[1m])

200K tokens is roughly 150,000 words, or a thick novel. That feels enormous until you start reading large files: a 60-page PDF can be 30K-50K tokens by itself. A few large files plus a long conversation fills 200K faster than people expect.

1M variants exist for Sonnet and Opus on plans that include them (Pro and above for Opus 1M; Sonnet 1M may require additional usage). 1M is roughly five novels' worth — enough to keep an entire codebase or a long set of documents resident.

What's in the context window

At any moment, the context window contains:

  1. System prompt — Claude Code's built-in instructions about how to behave, available tools, etc. Fixed overhead, ~10K tokens.
  2. CLAUDE.md content — your workspace's master CLAUDE.md, plus any project-level CLAUDE.md files in scope.
  3. Rules files — every file in .claude/rules/ is loaded automatically.
  4. Auto memory — your MEMORY.md for this workspace.
  5. Skill descriptions — every skill's frontmatter description (always loaded, so Claude can decide which to invoke). Full skill content loads on-demand.
  6. MCP server tool definitions — for every connected MCP server, the list of tools and their input schemas.
  7. Conversation history — every user message, every assistant message, every tool call and result, in this session.
  8. File contents — anything Claude has Read, Grepped, or otherwise pulled into context.

Items 1-6 are roughly fixed per workspace — they don't grow during a session. Items 7-8 grow with use. Long sessions with lots of file reading hit limits much faster than long conversational sessions with no file work.

Watching context usage

Claude Code shows context usage in the status line at the bottom of the panel: typically "Context: NN%". That percentage is tokens_used / max_window × 100. When it gets high (around 80%), auto-compaction triggers.

You can see usage at any point by running /usage in the session. This shows the current context size and your remaining usage budget for the day or week.

Compaction

When the context window approaches the configured threshold (default ~80%, configurable via contextCompactionThreshold in settings.json), Claude Code triggers compaction. The process:

  1. Old tool outputs are dropped first (they're the cheapest to recompute if needed)
  2. If still over threshold, older conversation messages are summarised — replaced with a much shorter "earlier in this conversation, you discussed X, Y, Z" recap
  3. Code snippets and recent messages are preserved with priority — they're more often referenced again
  4. The session continues; the context window is back below threshold

The session ID does not change. The conversation is still notionally one conversation. But specific details from earlier may have been summarised away — Claude may not remember the exact wording of an instruction from 2 hours ago, only the gist.

Manual /compact

You can trigger compaction explicitly at any point with /compact. Useful when:

  • You've just finished a phase of work and don't need the full detail of what just happened
  • You want to summarise before the auto-compaction picks what to drop
  • You're about to start a new heavy task and want maximum headroom

You can pass an instruction to /compact: e.g. /compact Keep all decisions about the database schema; drop the file-by-file walkthrough. This shapes what survives the summary. Without an instruction, the default summarisation runs.

The PreCompact hook — your safety net

The PreCompact hook fires immediately before context compaction happens. The hook cannot prevent compaction, but it can write the current state to disk before detail is lost.

This is the single most useful hook for long sessions. Without it, when context compacts, you have no marker for where detail was lost and no way to recover it. With it, you have a checkpoint on disk that preserves what was about to be summarised.

{
  "hooks": {
    "PreCompact": [
      {
        "hooks": [
          { "type": "command", "command": ".claude/hooks/precompact-checkpoint.sh" }
        ]
      }
    ]
  }
}

The script writes a timestamped breadcrumb file noting that compaction happened at this point. When you resume the session later via --resume, you know detail before that breadcrumb may have been summarised. See Hooks Event Reference for the full PreCompact contract.

/clear vs /compact vs new session

Action Effect Session ID Use when
/clearWipes the entire conversation; CLAUDE.md and rules reloadSameSwitching to an unrelated task in the same session; previous work is committed and you don't need it as context
/compactSummarises the conversation to save space; conversation continuesSameLong task you want to keep going; the high-level thread matters but the specific details don't
New sessionFresh window with no conversation historyNewDifferent body of work, different working hours, you've taken a break and want to start fresh
--resumeReopens a specific previous sessionSame as the resumed sessionReturning to in-progress work from earlier today, yesterday, or last week

Common confusion: /clear and starting a new session both give you a fresh conversation. The difference is the session ID — /clear keeps the same session for tracking, just empty. A new session starts a new file under ~/.claude/projects/. For PRIMA Memory or any tool that indexes by session, this matters.

Patterns that keep long sessions productive

1. Push heavy reading to sub-agents

The most effective pattern for context preservation. Anything that involves reading 5+ files or large documents goes to a sub-agent. The agent does the reading in its own context; the parent gets back a summary. See Sub-Agents and Parallel Execution.

2. Prefer summaries over source documents

If a summary file exists for a long document, read the summary first. Only read the full source if the summary doesn't have what you need. This applies to transcripts, reports, books, anything where the whole thing is much larger than the question.

3. End conversations at natural boundaries

When you finish a phase of work, end the session and start fresh for the next phase. The temptation is to keep one long session running for "convenience"; the cost is a context that's mostly noise from previous work. New phases usually deserve new sessions.

4. Use /compact deliberately, not reactively

Auto-compaction picks what to summarise based on its heuristics, which may not match your priorities. Manual /compact with an instruction lets you say what matters: "Keep the architecture decisions and the file paths I'm working on; drop the early exploration."

5. Set up PreCompact checkpoints

Once you've configured a PreCompact hook that writes checkpoints, you have a safety net. Even if compaction loses detail, the checkpoint file has it. Combine with --resume to recover later.

6. Consider 1M context for genuinely large work

If your typical session involves reading large codebases, long PDFs, or extensive document sets, switch to the 1M context variant (sonnet[1m] or opus[1m]). The headroom changes how you can work — some patterns that don't fit in 200K (e.g. "read every file in this folder, then synthesise") become trivial in 1M.

Trade-off: 1M context costs more per response (more tokens in the input). For sessions that don't need it, 200K is cheaper and faster.

Anti-patterns

  • Reading large files "just in case". Files that aren't needed for the current question are wasted context. Read on demand.
  • Letting agents return full content into the parent. Defeats the parallelism. Agents must write to disk and return summaries.
  • Keeping sessions running for days. A week-old session has hours of stale context. Newer sessions, with relevant context loaded fresh, are usually more productive than resuming ancient ones.
  • Ignoring the context indicator until it's red. By the time auto-compaction triggers, you've lost the ability to control what's summarised. Watch the indicator and use manual /compact while you still have control.
  • Using /clear when you should use a new session. If the previous work matters for tracking or memory tools, a new session preserves it; /clear doesn't (the conversation is gone but stays attached to the session ID).

When to give up and start fresh

Three signs the session is past its useful life:

  1. Claude is repeating itself or contradicting earlier statements — context is too compressed to be coherent
  2. You're spending more time correcting old context than doing new work
  3. The status line shows context above 90% and you have a lot of work left to do

End the session. Write a brief handoff note (or use a tool that does it for you). Start a new session and brief it from the handoff. The new session will be sharper than the saturated old one — by a wide margin.

GenAI Skills Academy

Achieve Productivity Gains With AI Today

Send me your details and let’s book a 15 min no-obligation call to discuss your needs and concerns around AI.