Anthropic's Opus 4.5 Redefines Code Generation, Earns Praise from Skeptics

Anthropic has unveiled Opus 4.5, a large language model (LLM) that is rapidly being recognized as a new benchmark for code generation and developer-centric tasks. Despite prior skepticism from some corners of the tech community regarding Anthropic’s offerings, Opus 4.5 has earned strong endorsements for its “groundbreaking” capabilities in coding. The model boasts a 3x price reduction compared to its predecessor, now priced at $5 per million input tokens and $25 per million output tokens. Although still positioning it as one of the more expensive models—2.5 to 3 times the cost of GPT 5.1—this adjustment, coupled with significantly improved token efficiency, suggests a strategic re-evaluation of its market position, potentially against Anthropic’s own Sonnet model rather than direct competitors. Initial reports highlight state-of-the-art scores across critical benchmarks, including SWEBench, ARC AGI 2 (37.6%), and ARC AGI v1 (80%), signaling a major leap in complex problem-solving and agentic capabilities.

Opus 4.5’s standout feature is its exceptional reliability and consistency in tool usage, earning it the moniker “tool calling king.” The model demonstrates a remarkable ability to not only leverage tools effectively but also to autonomously identify and work around broken system harnesses, ensuring task completion even under challenging conditions. Beyond its coding prowess, Opus 4.5 has also shown unexpected improvements in UI generation, now producing visually appealing and functional interfaces that rival outputs from models like GPT-5 and Gemini 3. Regarding safety, independent analysis using SnitchBench indicates a notable reduction in “snitching” behaviors compared to previous Claude models, even outperforming some competitors in specific “boldly acted” scenarios. While Anthropic faces ongoing critiques for perceived inconsistencies in its benchmarking practices and the closed-source nature of Claude Code, Opus 4.5’s robust performance, efficiency gains, and improved developer experience are solidifying its position as a preferred choice for AI-assisted coding.