Stories
Slash Boxes
Comments

SoylentNews is people

Submission Preview

Link to Story

Claude's New AI File Creation Feature Ships With Deep Security Risks Built in

Accepted submission by hubie at 2025-09-28 13:02:04
Security

Expert calls security advice "unfairly outsourcing the problem to Anthropic's users" [arstechnica.com]

On Tuesday [September 9, 2025], Anthropic launched [anthropic.com] a new file creation feature for its Claude AI assistant that enables users to generate Excel spreadsheets, PowerPoint presentations, and other documents directly within conversations on the web interface and in the Claude desktop app. While the feature may be handy for Claude users, the company's support documentation also warns [anthropic.com] that it "may put your data at risk" and details how the AI assistant can be manipulated to transmit user data to external servers.

The feature, awkwardly named "Upgraded file creation and analysis," is basically Anthropic's version of ChatGPT's Code Interpreter [arstechnica.com] and an upgraded version of Anthropic's "analysis" tool [anthropic.com]. It's currently available as a preview for Max, Team, and Enterprise plan users, with Pro users scheduled to receive access "in the coming weeks," according to the announcement.

The security issue comes from the fact that the new feature gives Claude access to a sandbox computing environment, which enables it to download packages and run code to create files. "This feature gives Claude Internet access to create and analyze files, which may put your data at risk," Anthropic writes in its blog announcement. "Monitor chats closely when using this feature."

According to Anthropic's documentation, "a bad actor" manipulating this feature could potentially "inconspicuously add instructions via external files or websites" that manipulate Claude into "reading sensitive data from a claude.ai connected knowledge source" and "using the sandbox environment to make an external network request to leak the data."

This describes a prompt injection attack [arstechnica.com], where hidden instructions embedded in seemingly innocent content can manipulate the AI model's behavior—a vulnerability that security researchers first documented [arstechnica.com] in 2022. These attacks represent a pernicious, unsolved security flaw of AI language models, since both data and instructions in how to process it are fed through as part of the "context window" to the model in the same format, making it difficult for the AI to distinguish between legitimate instructions and malicious commands hidden in user-provided content.

[...] Anthropic is not completely ignoring the problem, however. The company has implemented several security measures for the file creation feature. For Pro and Max users, Anthropic disabled public sharing of conversations that use the file creation feature. For Enterprise users, the company implemented sandbox isolation so that environments are never shared between users. The company also limited task duration and container runtime "to avoid loops of malicious activity."

[...] Anthropic's documentation states the company has "a continuous process for ongoing security testing and red-teaming of this feature." The company encourages organizations to "evaluate these protections against their specific security requirements when deciding whether to enable this feature."

[...] That kind of "ship first, secure it later" philosophy has caused frustrations among some AI experts like Willison, who has extensively documented prompt injection vulnerabilities (and coined [arstechnica.com] the term). He recently described [simonwillison.net] the current state of AI security as "horrifying" on his blog, noting that these prompt injection vulnerabilities remain widespread "almost three years after we first started talking about them."

In a prescient warning from September 2022, Willison wrote that "there may be systems that should not be built at all until we have a robust solution." His recent assessment in the present? "It looks like we built them anyway!"


Original Submission