Skip to content

Performance Optimization

Maximize speed and minimize costs when using Claude Code.

What You'll Learn

  • Context management strategies
  • Model selection for tasks
  • Prompt efficiency
  • Parallel execution
  • Cost optimization

Understanding Context

What Uses Context

Every conversation consumes context (tokens):

Element Token Usage
Your prompts Medium
Claude's responses Medium-High
File contents read High
Tool results Variable
Conversation history Cumulative

The Context Window

Claude has a limited context window. As it fills: - Response quality may decrease - Older context may be summarized - Costs increase

Context Management Strategies

Use /compact Regularly

The /compact command summarizes the conversation:

> [Long exploration session...]
> [Many files read...]
> [Extensive debugging...]

> /compact

Conversation compacted. Key context preserved.

When to compact: - After completing a major task - When exploring many files - Before starting a new topic - Every 15-20 exchanges

Targeted File Reading

# Inefficient: Reading entire files
> Show me the auth module

# Efficient: Reading specific parts
> Show me the login function in auth.js

Clear Context Signals

Help Claude avoid re-reading:

# Less efficient
> What does getUserById do?
> [Claude reads user.js]
> What about createUser?
> [Claude might re-read user.js]

# More efficient
> Explain getUserById and createUser in user.js
> [Claude reads once, explains both]

Model Selection

Task-Appropriate Models

Different models for different tasks:

Task Recommended Model
Simple edits Haiku (fastest, cheapest)
Code review Sonnet (balanced)
Architecture design Opus (most capable)
Quick questions Haiku
Complex debugging Sonnet or Opus

Switching Models

> /config

# Set default model
# Or use for specific tasks

Cost Comparison

Relative costs (approximate): - Haiku: 1x (baseline) - Sonnet: 5x - Opus: 15x

Use Haiku for routine tasks to save costs.

Prompt Efficiency

Concise Prompts

# Verbose (more tokens)
> I was wondering if you could possibly take a look at the
  authentication module and give me your thoughts on whether
  there might be any security issues that we should be aware of.

# Concise (fewer tokens)
> Review auth module for security issues.

Batch Requests

# Inefficient: Multiple round trips
> Add error handling to login()
> [wait]
> Add error handling to logout()
> [wait]
> Add error handling to refresh()

# Efficient: Single request
> Add error handling to login(), logout(), and refresh() in auth.js

Structured Requests

> Do these in order:
  1. Read the User model
  2. Add email validation
  3. Add a migration for the email_verified field
  4. Update the tests

Parallel Execution

Independent Tasks

Claude can run independent searches in parallel:

> Find all deprecated functions AND check test coverage

Two parallel operations, single response.

Subagent Parallelism

For complex exploration:

> Analyze the authentication system, database layer,
  and API structure of this project.

Claude may spawn multiple agents working simultaneously.

When Parallel Helps

  • Searching multiple directories
  • Analyzing independent modules
  • Fetching from multiple sources

When Sequential is Necessary

  • Changes that depend on previous changes
  • Operations requiring confirmation
  • State-dependent commands

Session Optimization

Plan Before Executing

# Get oriented first
> Give me an overview of this codebase

# Then make targeted changes
> Based on what I see, update the auth middleware

Reduces back-and-forth.

Use Continue Wisely

claude -c  # Continue previous session
  • Good: Resuming work on the same task
  • Bad: Starting unrelated work (start fresh instead)

Fresh Starts

claude  # New session

Start fresh for: - New unrelated tasks - When context is polluted - After major topic changes

Reducing Token Waste

Avoid Redundant Reads

# Don't repeat yourself
> Read package.json
> [Claude reads it]
> Read package.json again and show me the scripts
> [Claude reads it AGAIN]

# Better: Ask for what you need upfront
> Show me the scripts section of package.json

Reference Previous Context

> Using the auth pattern we just discussed, implement
  the same for the payment module.

Claude uses existing context instead of re-deriving.

Limit Output When Possible

# May produce lengthy output
> Show me all the API routes

# More focused
> List API routes in routes/user.js with one-line descriptions

Non-Interactive Mode

For scripts and automation:

# Quick question, minimal overhead
claude -p "What testing framework does this project use"

# Pipe to file, avoid session costs
claude -p "Generate API documentation for routes/" > api-docs.md

Caching Considerations

MCP Server Caching

If using MCP servers that fetch external data:

// In your MCP server
const cache = new Map();
const TTL = 300000; // 5 minutes

server.tool({
  name: 'get_weather',
  handler: async ({ city }) => {
    const cached = cache.get(city);
    if (cached && Date.now() - cached.time < TTL) {
      return cached.data;
    }
    const data = await fetchWeather(city);
    cache.set(city, { data, time: Date.now() });
    return data;
  }
});

Avoiding Repeated Exploration

# If you'll reference files multiple times
> Read and summarize the key configuration files.
  I'll need to reference these settings throughout our session.

Claude keeps the summary in context, avoiding re-reads.

Measuring Performance

Track Token Usage

Monitor your usage in the Anthropic console or via API responses.

Time Operations

time claude -p "what files are in src/"

Compare Approaches

Test different prompting strategies: - Specific vs general prompts - Single vs batch requests - Different models

Quick Reference

┌─────────────────────────────────────────────────────────────┐
│ PERFORMANCE CHECKLIST                                        │
├─────────────────────────────────────────────────────────────┤
│ □ Use /compact after major tasks                            │
│ □ Read specific parts, not whole files when possible        │
│ □ Batch related requests                                    │
│ □ Use Haiku for simple tasks                                │
│ □ Use non-interactive mode for scripts                      │
│ □ Start fresh sessions for new topics                       │
│ □ Reference previous context instead of re-reading          │
│ □ Request focused output                                    │
└─────────────────────────────────────────────────────────────┘

Try It Yourself

Exercise: Optimize a Session

  1. Start a session and work on a task
  2. Note how many exchanges it takes
  3. Start fresh and try to complete the same task in fewer exchanges
  4. Compare the efficiency

Exercise: Model Comparison

  1. Take a simple task (e.g., "explain this function")
  2. Try with Haiku
  3. Try with Sonnet
  4. Note speed, quality, and when each is appropriate

What's Next?

Learn how to integrate Claude Code with n8n for powerful workflow automation in 04-n8n-integration.


Summary: - Context is precious: Use /compact, be specific, batch requests - Right model for the job: Haiku for simple, Sonnet for complex - Efficient prompts: Concise, batched, structured - Parallel when possible: Independent tasks can run simultaneously - Fresh starts for new topics: Don't pollute context


Learning Resources

Edmund Yong: 800+ Hours Claude Code - Optimization (Popular tech channel)

Performance optimization - context management, model selection, and token efficiency.

Additional Resources

Type Resource Description
🎬 Video Claude Code Tips All About AI - Performance tips
📚 Official Docs Memory Management Context and memory documentation
📖 Tutorial Builder.io Guide Optimization techniques
🎓 Free Course Anthropic Academy Performance courses
💼 Commercial Prompt Engineering Token optimization course