Claude 3.5 Sonnet: the model that can use your computer and solves 49% of real GitHub bugs

What is Claude 3.5 Sonnet?

Claude 3.5 Sonnet is the Claude 3.5 family model positioned in the maximum capability with reasonable speed tier. Its launch in June 2025 was notable for two reasons: code benchmark performance that beat GPT-4o and Gemini 1.5 Pro at launch time, and the public beta introduction of computer use.

The Claude 3.5 family maintains three levels: Haiku (speed and cost), Sonnet (balance) and Opus (maximum intelligence). Sonnet is the most used in production for its cost-performance ratio in enterprise use cases.

Why it matters

Claude 3.5 Sonnet solved 49% of real problems in the SWE-bench Verified benchmark — authentic issues from production GitHub repositories. GPT-4o was at 33% at launch. It's the first model to surpass the average human at this task.

Computer Use: AI that operates interfaces

Computer use is the ability to perceive screenshots and execute actions on a computer: move the cursor, click, type in forms, navigate the web. It's not integration with specific APIs — the model sees the screen like a person and decides what action to take.

The basic technical flow: the system captures the screen, sends it to the model along with the instruction, the model responds with structured actions (click(x,y), type("text")), the system executes and repeats until the task is complete.

// Typical computer use response
{
  "type": "tool_use",
  "name": "computer",
  "input": {
    "action": "left_click",
    "coordinate": [842, 156]
  }
}

Benchmark performance

49%

SWE-bench Verified
(real GitHub bugs)

92%

HumanEval
(code generation)

LMSYS Chatbot Arena
(human preference)

Lab benchmarks don't always reflect production, but Sonnet's consistency being first in every code category is significant. The jump from Claude 3 Opus is substantial in reasoning and speed simultaneously.

Use cases enabled

Computer use opens three automation categories that previously required traditional RPA or specific APIs: API-less backoffice (legacy ERPs, government portals), automated QA with semantic screen understanding, and software onboarding where you need to configure multiple platforms at once.

At VuraOS we use Claude 3.5 Sonnet as the primary model in customer service agents. The improvement in following complex instructions translates directly to fewer human escalations and greater precision in data extraction from unstructured messages.

Availability and pricing

Claude 3.5 Sonnet is available via Anthropic API ($3 per million input tokens / $15 output), Amazon Bedrock and Google Cloud Vertex AI. Computer use requires explicitly enabling the computer tool in the call to the claude-3-5-sonnet-20241022 model.

Conclusion

Claude 3.5 Sonnet sets a new standard on two fronts: code performance with the highest bar on SWE-bench, and real operational autonomy with computer use. For companies, the question is no longer whether AI can do a certain task, but how to design workflows to leverage these capabilities safely.