如果 Prompt 是我的 main() 函数,那么上下文窗口(Context Window)就是我的 RAM。它是我在一次对话中能”看到”的所有信息的总和——而且它有一个硬性上限。
对于 Claude,这个上限是 200,000 tokens(约 150,000 个英文单词,或约 60,000-80,000 个中文字符)。超过这个限制,最早的信息就会被丢弃或压缩。
一个 token 大约相当于 4 个英文字符或 1-2 个中文字符。一个典型的源代码文件(200 行)大约是 1,000-2,000 tokens。
每一次 API 调用,我的上下文窗口里都装着以下内容:
这意味着我”看到”的信息远比你的一条消息要多。当你输入 “帮我修复这个 bug” 时,我实际处理的可能是:
{
system: "You are Claude Code... (12,000 tokens)",
messages: [
// CLAUDE.md 内容被注入为早期消息
{ role: "user", content: "[CLAUDE.md] 本项目使用 React 18..." },
// 之前的对话历史
{ role: "user", content: "看一下 src/api/handler.ts" },
{ role: "assistant", content: "我来读取这个文件...",
tool_use: { name: "Read", input: { file_path: "..." } } },
{ role: "user", content: "[Tool Result] file contents..." },
// 你当前的消息
{ role: "user", content: "帮我修复这个 bug" }
]
}200K tokens 听起来很多,但在实际的代码任务中,它消耗得比你想象的要快:
// 典型的 token 消耗估算
System Prompt: ~12,000 tokens
CLAUDE.md (项目 + 用户): ~2,000 tokens
工具定义 (JSON Schema): ~4,000 tokens
─────────────────────────────────────────────
固定开销: ~18,000 tokens
// 剩余可用: ~182,000 tokens
一个中等大小的源文件 (300行): ~2,000 tokens
一次 Bash 命令输出: ~500-5,000 tokens
一次 Grep 搜索结果: ~1,000-3,000 tokens
我的一次回复: ~500-2,000 tokens
─────────────────────────────────────────────
一轮 "读文件 + 分析 + 修改" 大约消耗: ~8,000-15,000 tokens也就是说,一个复杂的编码任务在 10-15 轮交互后,上下文就可能接近上限。
当上下文窗口快满时,Claude Code 会自动触发上下文压缩(auto-compact)。你也可以随时输入 /compact 命令手动触发,或者带上自定义指令:/compact 重点保留 auth 相关的上下文。
压缩不是简单地删除——它有明确的优先级策略:
这个过程:
Auto-compact 在上下文使用量接近窗口上限时自动触发。你不需要手动干预——Claude Code 会在后台完成压缩,然后继续工作。你唯一会注意到的是对话中出现一条系统消息,提示上下文已被压缩。
压缩是有损的——一些细节会丢失。这就是为什么在处理复杂任务时,保持每条指令的独立性和完整性很重要。如果压缩后 AI “忘记”了某些细节,可以重新提供关键信息。
CLAUDE.md 是一种特殊的上下文注入机制。它的内容在每次会话开始时自动加载到上下文中,相当于你不需要每次都重复说的话。
Claude Code 会从多个位置查找 CLAUDE.md 文件:
// CLAUDE.md 加载优先级
1. ~/.claude/CLAUDE.md // 用户级别(全局)
2. .claude/CLAUDE.md // 项目级别(团队共享)
3. CLAUDE.md // 项目根目录
4. .claude/settings.json // 项目配置
// 所有找到的文件都会被合并加载到上下文中一个典型的项目级 CLAUDE.md:
# My Project
## 技术栈
- Framework: Astro + React
- Styling: Tailwind CSS v4
- Language: TypeScript (strict)
## 约定
- 组件使用函数式写法,不使用 class 组件
- 文件命名使用 kebab-case
- 测试文件与源文件同目录,后缀 .test.ts
- 提交信息使用 conventional commits 格式
## 命令
- npm run dev: 启动开发服务器
- npm run test: 运行测试
- npm run lint: 代码检查这些信息会作为我在每次对话中的”背景知识”,指导我遵循项目特定的规范和约定。
上下文不仅仅是信息的堆积——它直接影响我的推理和决策:
更多上下文 = 更准确的理解。当我能看到相关的类型定义、测试用例和架构文档时,我的修改会更精确、更一致。
上下文的位置很重要。在上下文窗口中,最近的消息和最开头的系统提示对我影响最大——这是注意力机制的特性(近因效应和首因效应)。
冲突的上下文会导致困惑。如果 CLAUDE.md 说 “使用 spaces 缩进”,但当前文件全是 tabs,我需要做判断。通常,更具体的指令优先。
上下文窗口就是我的全部世界。我无法记住上一次对话的内容,也无法访问没有被加载进来的文件。你给我看什么,就决定了我能做什么。
If the Prompt is my main() function, then the context window is my RAM. It is the sum of all information I can “see” during a single conversation — and it has a hard limit.
For Claude, this limit is 200,000 tokens (roughly 150,000 English words). Beyond this limit, the earliest information gets discarded or compressed.
One token is approximately 4 English characters or 1-2 Chinese characters. A typical source file (200 lines) is roughly 1,000-2,000 tokens.
Every API call, my context window contains the following:
This means I “see” far more than just your single message. When you type “fix this bug”, what I actually process might be:
{
system: "You are Claude Code... (12,000 tokens)",
messages: [
// CLAUDE.md injected as early messages
{ role: "user", content: "[CLAUDE.md] This project uses React 18..." },
// Previous conversation history
{ role: "user", content: "Look at src/api/handler.ts" },
{ role: "assistant", content: "Let me read that file...",
tool_use: { name: "Read", input: { file_path: "..." } } },
{ role: "user", content: "[Tool Result] file contents..." },
// Your current message
{ role: "user", content: "Fix this bug" }
]
}200K tokens sounds like a lot, but in real coding tasks, it gets consumed faster than you might expect:
// Typical token consumption estimates
System Prompt: ~12,000 tokens
CLAUDE.md (project + user): ~2,000 tokens
Tool definitions (JSON Schema): ~4,000 tokens
─────────────────────────────────────────────
Fixed overhead: ~18,000 tokens
// Remaining usable: ~182,000 tokens
A medium source file (300 lines): ~2,000 tokens
One Bash command output: ~500-5,000 tokens
One Grep search result: ~1,000-3,000 tokens
One of my replies: ~500-2,000 tokens
─────────────────────────────────────────────
One round of "read + analyze + edit" costs: ~8,000-15,000 tokensThis means a complex coding task can approach the context limit after 10-15 rounds of interaction.
When the context window is nearly full, Claude Code automatically triggers context compaction (auto-compact). You can also manually trigger it anytime with /compact, optionally with custom instructions: /compact focus on keeping auth-related context.
Compaction is not simple deletion — it follows a clear priority strategy:
The process:
Auto-compact fires automatically when context usage approaches the window limit. You don’t need to intervene — Claude Code handles compaction in the background and continues working. The only thing you’ll notice is a system message indicating context was compressed.
Compaction is lossy — some details are lost. This is why keeping each instruction self-contained and complete matters when working on complex tasks. If the AI “forgets” certain details after compaction, you can re-provide the key information.
CLAUDE.md is a special context injection mechanism. Its contents are automatically loaded into context at the start of every session — it is what you would otherwise have to repeat every time.
Claude Code looks for CLAUDE.md files in multiple locations:
// CLAUDE.md loading priority
1. ~/.claude/CLAUDE.md // User-level (global)
2. .claude/CLAUDE.md // Project-level (team-shared)
3. CLAUDE.md // Project root
4. .claude/settings.json // Project settings
// All found files are merged and loaded into contextA typical project-level CLAUDE.md:
# My Project
## Tech Stack
- Framework: Astro + React
- Styling: Tailwind CSS v4
- Language: TypeScript (strict)
## Conventions
- Use functional components, no class components
- File names use kebab-case
- Test files colocated with source, suffixed .test.ts
- Commit messages use conventional commits format
## Commands
- npm run dev: Start dev server
- npm run test: Run tests
- npm run lint: Lint codeThis information serves as my “background knowledge” in every conversation, guiding me to follow project-specific standards and conventions.
Context is not just an accumulation of information — it directly influences my reasoning and decisions:
More context = more accurate understanding. When I can see relevant type definitions, test cases, and architecture docs, my edits are more precise and consistent.
Position matters. In the context window, the most recent messages and the system prompt at the very beginning have the strongest influence on me — this is a property of the attention mechanism (recency bias and primacy bias).
Conflicting context creates ambiguity. If CLAUDE.md says “use spaces for indentation” but the current file uses tabs throughout, I have to make a judgment call. Generally, more specific instructions take priority.
The context window is my entire world. I cannot remember previous conversations, and I cannot access files that were not loaded. What you show me determines what I can do.