Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
posted by hubie on Sunday December 28, @03:49PM   Printer-friendly

"The vast majority of Codex is built by Codex," OpenAI told us about its new AI coding agent writing code:

With the popularity of AI coding tools rising among some software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. "I think the vast majority of Codex is built by Codex, so it's almost entirely just being used to improve itself," said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user's code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT's web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The "Codex" name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot's tab completion feature. Embiricos said the name is rumored among staff to be short for "code execution." OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

"For many people, that model powering GitHub Copilot was the first 'wow' moment for AI," Embiricos said. "It showed people the potential of what it can mean when AI is able to understand your context and what you're trying to do and accelerate you in doing that."

It's no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic's agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex's design, Embiricos parried the question but acknowledged the competitive dynamic. "It's a fun market to work in because there's lots of great ideas being thrown around," he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic's tool.

OpenAI's customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn't just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI's engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. "I really love this about our team," Embiricos said. "The version of Codex that we use is literally the open source repo. We don't have a different repo that features go in."

[...] The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a "teammate" and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute "decisions" or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

[...] Despite OpenAI's claims of success with Codex in house, it's worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. "You can add Codex, and you can basically assign issues to Codex now," Bayes told Ars. "Codex is literally a teammate in your workspace."

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. "It's basically approximating this kind of coworker and showing up wherever you work," Bayes said.

[...] Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between "vibe coding," where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls "vibe engineering," where humans stay in the loop. "We see a lot more vibe engineering in our code base," he said. "You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you're in the loop with the model and carefully reviewing its code."

He added that vibe coding still has its place for prototypes and throwaway tools. "I think vibe coding is great," he said. "Now you have discretion as a human about how much attention you wanna pay to the code."

Over the past year, "monolithic" large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. "I think we're very far from plateauing," he said. "If you look at the velocity on the research team here, we've been shipping models almost every week or every other week." He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

[...] But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and "there's always a human in the loop because the human can actually read the code." Similarly, the two men don't project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI's walls. Embiricos said the company's long-term vision involves making coding agents useful to people who have no programming experience. "All humanity is not gonna open an IDE or even know what a terminal is," he said. "We're building a coding agent right now that's just for software engineers, but we think of the shape of what we're building as really something that will be useful to be a more general agent."


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Insightful) by aafcac on Sunday December 28, @09:30PM (2 children)

    by aafcac (17646) on Sunday December 28, @09:30PM (#1428093)

    Can they at least pretend like their favorite food items aren't lead paint and paste? Nothing good can possibly come of allowing AI to help improve itself. One of the few advantages that humans have over AI up to this point is that the source of the training informatoin has been us. They've burned through more or less all of human history in a matter of a few years, and further advancements would require more of it to be generated. Which was great. It meant that in a sense humans were potentially always a step ahead, or at least there were a few that were. I don't even know what to expect the result of this is going to be and how would we even properly evaluate the output of such systems?

    • (Score: 2) by gnuman on Monday December 29, @04:22PM (1 child)

      by gnuman (5013) on Monday December 29, @04:22PM (#1428161)

      I'd say these tools were useless until last year. This year they are to a point of being somewhat useful for some things. Today, they are starting to be useful.

      I've recently used gemini-cli and created a VueJS SPA that I wanted to do (it's for my own usage for a video game) but I never found the time or motivation. So, I've had Gemini create 99% of it within 2 days of part-time typing instructions and testing the results. The old gemini-2.5-flash model was a bit ... wonky.... it reintroduced bugs I fixed. It could not do simple things that require any sort of logic. But the current gemini-3.0-flash is much more up to the task. I've run into the token limits after few requests, but then I've signed up for the free trial (it's part of the Google Cloud and they gives you 90days, $300 free credit. Needless to say, I've burned about $5 worth of credit in 2 days and have a working prototype.

      Code re-use? Well, no, it generates different code for same functionality everywhere... I'll try to get it to refactor it later. But the app works.

      Now I'm using it to make my own photo gallery implementation... another SPA. Mode of operation seems to be small changes. It cannot handle complete application being written from scratch, at least not in my tests so far. But this makes it simpler to review.

      It's useful for what it is. Basically, you can use a hammer and screwdriver and handsaw and make your house with that. Or you can use a nail gun and electrical drill and saw to help you do these things for you. Heck, you can outsource the design of trusses to experts and just install them, reducing the wood usage -- and how do they do that? Automation! For load calculations, etc. These AI tools are finally some automation for basic coding. You still need a human to look over them and actually put *context* into the program. But not using AI to help you these days is like not using a language server to help you -- your productivity will probably suffer. Until today, we basically had templates ... today we have tools that have *some* understandings in what they are doing.

      As to being replaced? Yes, I'm now like 5 years into being replaced in next 6 months. I'm not worried. But I think productivity can be increased with these tools. What I worry a little about is next generation of engineers may be hobbled by inability to do what AI can do -- but I guess most of us don't really implement sorting algorithms today anyway -- we just run `sortFunc()` and maybe implement a comparison for the pre-implemented sort algo. So not sure if much will be lost here.

      • (Score: 2) by aafcac on Monday December 29, @09:16PM

        by aafcac (17646) on Monday December 29, @09:16PM (#1428188)

        Part of the issue is that they're not going to improve much more without a radical change to how they work. And, they're destroying the potential careers of most of the people that would create the future materials to train on. Anything that's in the chain of things needed to train these AI on really shouldn't be replaced by AI as you get this nasty circular dependency problem where the thing is essentially training itself and you need somebody around to verify what's going on. But, the people that could have done that mostly don't exist because they either lost their jobs or were never able to get the jobs needed to capable of doing the job in the first place.

        As much as people like to think that when the robots come to get us it's going to be something out of a James Cameron action movie, the reality is more likely to play out like something out of Douglas Adams where it's more banal things like losing our ability to produce safe food or know the difference, or get redirected via GPS to some place where we get stuck. Things of that nature where the AI just sabotages the bureaucracy to the point where things look OKish, but are so badly broken that everything starts to slowly breakdown as more and more heavy metals make their way back into the food supply and people's brains just generally rot into nothingness.

  • (Score: 3, Insightful) by Anonymous Coward on Sunday December 28, @09:35PM (1 child)

    by Anonymous Coward on Sunday December 28, @09:35PM (#1428094)

    It is shit all the way down

    • (Score: 0) by Anonymous Coward on Monday December 29, @04:24PM

      by Anonymous Coward on Monday December 29, @04:24PM (#1428162)

      It always was!

(1)