Claude Opus 4.6 adds adaptive thinking, 128K output, compaction API, and more
Published on by Paul Redmond
Anthropic released Claude Opus 4.6 with adaptive thinking, doubled output tokens (128K), a new Compaction API for long conversations, and data residency controls. The release also brings the effort parameter and fine-grained tool streaming to general availability.
- Adaptive thinking mode
- 128K max output tokens (up from 64K)
- Effort parameter GA with new
maxlevel - Compaction API (beta) for server-side context summarization
- Fine-grained tool streaming GA
- Data residency controls via
inference_geo
What's New
Adaptive Thinking Mode
A new thinking: {type: "adaptive"} mode lets Claude decide when and how much to think based on the problem. At the default high effort level, Claude will almost always think. At lower effort levels, it may skip thinking for simpler problems. This replaces the previous budget_tokens approach, which is now deprecated.
response = client.messages.create( model="claude-opus-4-6", max_tokens=16000, thinking={"type": "adaptive"}, messages=[{"role": "user", "content": "Solve this complex problem..."}])
Adaptive thinking also automatically enables interleaved thinking, removing the need for the interleaved-thinking-2025-05-14 beta header.
128K Output Tokens
Opus 4.6 supports up to 128K output tokens, double the previous 64K limit. This allows for longer thinking budgets and more detailed responses. The SDKs require streaming for requests with large max_tokens values to avoid HTTP timeouts.
Effort Parameter GA
The effort parameter no longer requires a beta header. A new max effort level provides the highest capability on Opus 4.6. Combine it with adaptive thinking for cost-quality tradeoffs.
Compaction API (Beta)
A new server-side context summarization feature that enables long-running conversations. When context approaches the window limit, the API automatically summarizes earlier parts of the conversation instead of truncating.
Fine-Grained Tool Streaming GA
Fine-grained tool streaming is now generally available on all models and platforms with no beta header required.
Data Residency Controls
A new inference_geo parameter lets you specify where model inference runs — "global" (default) or "us". US-only inference is priced at 1.1x on Opus 4.6 and newer models.
Breaking Changes
Prefill removal: Prefilling assistant messages is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error. Use structured outputs or system prompt instructions instead.
output_format renamed: The output_format parameter has moved to output_config.format. The old parameter still works but is deprecated.
# Beforeresponse = client.messages.create( output_format={"type": "json_schema", "schema": {...}}, ...) # Afterresponse = client.messages.create( output_config={"format": {"type": "json_schema", "schema": {...}}}, ...)
Deprecations
thinking: {type: "enabled", budget_tokens: N}— use adaptive thinking insteadinterleaved-thinking-2025-05-14beta header — no longer needed with adaptive thinkingoutput_format— useoutput_config.format
References