Claude Opus 4.6 Spends $20K Trying to Write a C Compiler: SoylentNews Submission

Claude Opus 4.6 Spends $20K Trying to Write a C Compiler

Accepted submission by upstart at 2026-02-12 05:08:23

upstart [soylentnews.org] writes:

████ # This file was generated bot-o-matically! Edit at your own risk. ████

Claude Opus 4.6 spends $20K trying to write a C compiler [theregister.com]:

An Anthropic researcher's efforts to get its newly released Opus 4.6 model to build a C compiler left him "excited," "concerned," and "uneasy."

It also left many observers on GitHub skeptical, to say the least.

Nicholas Carlini, a researcher on Anthropic's Safeguards team, detailed the experiment with what he called "agent teams" in a blog [anthropic.com] that coincided with the official release of Opus 4.6.

He said he "tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. After nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V."

With agent teams, he said, "multiple Claude instances work in parallel on a shared codebase without active human intervention."

One key task was getting round the need for "an operator to be online and available to work jointly," which we presume means removing the need for Claude Code to wait for a human to tell it what to do next.

"To elicit sustained, autonomous progress, I built a harness that sticks Claude in a simple loop... When it finishes one task, it immediately picks up the next." Imagine if humans took that sort of approach.

Carlini continued: "I leave it up to each Claude agent to decide how to act. In most cases, Claude picks up the 'next most obvious' problem." This threw up a number of lessons, including the need to "write extremely high quality tests."

Readers were also advised to "put yourself in Claude's shoes." That means the "test harness should not print thousands of useless bytes" to make it easier for Claude to find what it needs.

Also, "Claude can't tell time and, left alone, will happily spend hours running tests instead of making progress."

Which might make you feel working with Claude is closer to working with a regular human than you might have thought. But what was the upshot of all of this?

"Over nearly 2,000 Claude Code sessions across two weeks, Opus 4.6 consumed 2 billion input tokens and generated 140 million output tokens, a total cost just under $20,000."

This made it "an extremely expensive project" compared to the priciest Claude Max plans, Carlini said. "But that total is a fraction of what it would cost me to produce this myself – let alone an entire team."

Other lessons? "The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler." Moreover, "the generated code is not very efficient."

He added that the Rust code quality is "reasonable but... nowhere near the quality of what an expert Rust programmer might produce."

OpenClaw reveals meaty personal information after simple cracks [theregister.com]
Anthropic apes OpenAI with cheeky chatbot commercials [theregister.com]
Anthropic cements its position as the not-OpenAI with no-ads pledge [theregister.com]
Rise of AI means companies could pass on SaaS [theregister.com]
Carlini concluded: "Agent teams show the possibility of implementing entire, complex projects autonomously."

But as a former pen-tester, he said fully autonomous development posed real risks. "The thought of programmers deploying software they've never personally verified is a real concern." Ultimately, the experiment "excites me, [but] also leaves me feeling uneasy."

Comments on GitHub were less equivocal, not least because they felt the $20K price tag ignored a few other elements, such as the vast amount of other programmers' code the model was trained on in the first place.

As mohswell [github.com] put it: "If I went to the supermarket, stole a bit of every bread they had, and shoved it together, no one would say I made bread from scratch. They'd say I'm a thief. If this is 'from scratch,' then my cooking is farm-to-table."

While Sambit003 [github.com] opined: "The comment section and the issue itself is 'absolute cinema' moment everyone living through😂... the longer the AI generated codes I see... the safer I feel. 😂 Still we have the jobs (for long enough years)... just enjoy the overhyping bruh."

Serkosal [github.com] added plaintively: "okay, nice, could @claude find gf for me? No? I'm not interested." ®

Get ourTech Resources [theregister.com]ShareMore about

Anthropic
Claude
Programming
More like these×More about

Anthropic
Claude
Programming
Software
Narrower topics

AdBlock Plus
Analytics
App
Application Delivery Controller
Audacity
Computer Science
Confluence
Database
FOSDEM
FOSS
Grab
Graphics Interchange Format
IDE
Image compression
Jenkins
Legacy Technology
LibreOffice
llvm
Low code
Map
Microsoft 365
Microsoft Office
Microsoft Teams
Mobile Device Management
OpenOffice
Programming Language
QR code
Retro computing
Search Engine
Software bug
Software License
Text Editor
User interface
Visual Studio
Visual Studio Code
WebAssembly
Web Browser
WordPress
Broader topics

Large Language Model
More about ShareMore about

Anthropic
Claude
Programming
More like these×More about

Anthropic
Claude
Programming
Software
Narrower topics

AdBlock Plus
Analytics
App
Application Delivery Controller
Audacity
Computer Science
Confluence
Database
FOSDEM
FOSS
Grab
Graphics Interchange Format
IDE
Image compression
Jenkins
Legacy Technology
LibreOffice
llvm
Low code
Map
Microsoft 365
Microsoft Office
Microsoft Teams
Mobile Device Management
OpenOffice
Programming Language
QR code
Retro computing
Search Engine
Software bug
Software License
Text Editor
User interface
Visual Studio
Visual Studio Code
WebAssembly
Web Browser
WordPress
Broader topics

Large Language Model
TIP US OFF

Send us news [theregister.com]

Anthropic: Latest Claude model finds more than 500 vulnerabilities [scworld.com]:

Anthropic claims its latest Claude large language model (LLM), Claude Opus 4.6, discovered more than 500 validated high-severity vulnerabilities, according to a report [anthropic.com] published Thursday.

Claude Opus 4.6, which was released to the public Feb. 5, 2026, reportedly discovered hundreds of vulnerabilities in open-source software while working in a virtual machine with access to utilities and tools, such as coreutils and fuzzers.

Anthropic said Claude worked “out-of-the-box” without custom harnesses to search for memory corruption vulnerabilities in the latest versions of several open-source projects. The bugs discovered by Claude were then reviewed and validated by human researchers to weed out hallucinations and false positives, the company said.

Vulnerabilities reported and resolved by project maintainers include a stack buffer underflow vulnerability in GhostScript, and buffer overflow vulnerabilities in OpenSC and CGIF.

For the vulnerability in GhostScript, which is a utility for processing PostScript and PDF files, Claude first attempted fuzzing and manual analysis, which yielded no results. However, it ultimately identified the vulnerability by examining previous security-related commits and searching for flaws that were similar to one that was already fixed.

Related reading:

Honeypots detect threat actors mass scanning LLM infrastructure [scworld.com]

AI code is creating security bottlenecks faster than it’s solving them [scworld.com]

Why AI breaks the traditional security stack — and how to fix it [scworld.com]

Claude also identified the flaw in OpenSC, a command line utility used for processing smart card data, not through fuzzing or manual analysis, but by searching for frequently vulnerable function calls and identifying an area of code with multiple successive srtcat operations.

In CGIF, a library for processing GIF images, Claude discovered an edge case involving LZW compression where a compressed image could be larger than its corresponding uncompressed image. Anthropic noted that validating this flaw required an understanding of LZW compression and a specific sequence of steps that would not be possible to achieve through traditional fuzzing.

Anthropic said it is continuing to validate, report and develop patches for Claude-discovered bugs, noting, “Many of these projects are maintained by small teams or volunteers who don’t have dedicated security resources, so finding human-validated bugs and contributing human-reviewed patches goes a long way.”

The company said it is also introducing additional safeguards to prevent the misuse of Claude by cyber threat actors, including cyber-specific “probes” that monitor model activations during response generation to detect potentially harmful responses.

Anthropic said it may also expand the actions it takes in response to misuse, including by implementing real-time intervention to block potentially malicious traffic.

“This will create friction for legitimate research and some defensive work, and we want to work with the security research community to find ways to address it as it arises,” Anthropic said.

The company reported last year [scworld.com] that its Claude Code tool was used in a sophisticated cyberattack campaign by suspected China-sponsored threat actors, targeting about 30 organizations worldwide.

Anthropic joins several companies that have touted the ability of LLMs to speed vulnerability research, with Google debuting its “Big Sleep” agent [scworld.com] in 2024 and Microsoft announcing in April 2025 [scworld.com] that its Security Copilot aided in the discovering of 20 open-source bootloader flaws.

Data leak exposes over 200M Telegram user records [scworld.com]

SC Staff [scworld.com]February 11, 2026

Telegram had more than 200 million user records purportedly obtained from a trio of databases exposed in BreachForums on Jan. 24, according to Cybernews. Conduent case breaks open after Volvo reports third-party compromise [scworld.com]

Steve Zurier [scworld.com]February 11, 2026

Conduent incidents reveal how third-parties must move to a disclosure-first model. Modernizing SaaS security for the agentic AI era [scworld.com]

Paul Wagenseil [scworld.com]February 10, 2026

The rise of agentic AI means that old-school AppSec tools can't protect SaaS applications. What's needed is a new paradigm of SaaS security.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Submission Preview

Claude Opus 4.6 Spends $20K Trying to Write a C Compiler