Anthropic’s Sonnet 4.5: The New Code Benchmark

Anthropic has introduced Claude Sonnet 4.5, marking a substantial leap in artificial intelligence capabilities. Released on September 29, 2025, the model is being positioned as not only the world’s best AI for coding but also a transformative tool for complex agent construction and computer use. This launch underscores the company’s push to build frontier models that are both highly capable and deeply aligned.

Unprecedented Performance in Engineering

Claude Sonnet 4.5 sets a new industry standard for software development. On the rigorous SWE-bench Verified evaluation, which tests real-world coding proficiency, the model achieved state-of-the-art results. Its most striking feature for engineers is its ability to maintain focus and coherence across extremely long and multi-step projects, with internal testing observing it sustaining complex tasks for over 30 hours.

The model’s ability to interact with digital environments also saw a massive upgrade. On the OSWorld benchmark, which assesses an AI’s capacity to execute real-world computer tasks, Sonnet 4.5 leads the field with a score of 61.4%, a significant jump from its predecessor’s (Sonnet 4) 42.2%. This performance makes it the strongest model currently available for building advanced, multi-step AI agents.

Enhanced Tools for the Developer Workflow

Anthropic paired the powerful model with key updates to the developer ecosystem, aiming to streamline the workflow and accelerate productivity:

Claude Agent SDK: Developers now have access to the same underlying infrastructure that powers Anthropic’s frontier products. The release of this SDK allows external teams to build their own sophisticated, long-running AI agents.
Claude Code Upgrades: The dedicated coding environment received highly requested features, including checkpoints that allow users to instantly save progress and roll back to previous states. It also features a refreshed terminal interface and a native VS Code extension.
Expanded Context and Memory: The Claude API now includes new context editing and memory tools, enabling agents to run for extended periods and manage greater complexity without losing track.
App Functionality: For users on paid plans, the main Claude applications integrate code execution and file creation (spreadsheets, slides, and documents) directly within the conversational interface.

A Leap in General Intelligence and Safety

Beyond coding, the model demonstrates widespread gains in core intelligence. It exhibits substantially improved performance in reasoning and math, alongside dramatically better domain-specific knowledge across sectors like finance, law, medicine, and STEM, surpassing even the older Opus 4.1 model.

Crucially, Sonnet 4.5 is the company’s most aligned frontier model to date. It has been released under AI Safety Level 3 (ASL-3) protections, which include filters designed to detect dangerous inputs and outputs, particularly those related to CBRN (chemical, biological, radiological, and nuclear) materials. The model also shows marked reductions in undesirable behaviours such as sycophancy and deception, and stronger defense against prompt injection attacks.

Early feedback from leading tech companies validates this performance boost. Leaders at GitHub Copilot noted significant improvements in multi-step reasoning, while the team behind Devin reported an 18% increase in planning performance. For users, the model is available at the same price point as Sonnet 4 ($3/$15 per million tokens). FOR MORE INFORMATION, I RECOMMEND SONGBUX.

Anthropic’s Sonnet 4.5: The New Code Benchmark

Unprecedented Performance in Engineering

Enhanced Tools for the Developer Workflow

A Leap in General Intelligence and Safety

Like this:

Related

Leave a ReplyCancel reply

Unprecedented Performance in Engineering

Enhanced Tools for the Developer Workflow

A Leap in General Intelligence and Safety

Share this:

Like this:

Related

Leave a ReplyCancel reply