In October, OpenAI launched its newest AI image generator—DALL-E 3—into wide release [openai.com] for ChatGPT subscribers. DALL-E can pull off media generation tasks that would have seemed absurd just two years ago—and although it can inspire delight with its unexpectedly detailed creations, it also brings trepidation for some. Science fiction forecast tech like this long ago, but seeing machines upend the creative order feels different when it's actually happening before our eyes.
"It’s impossible to dismiss the power of AI when it comes to image generation," says Aurich Lawson [arstechnica.com], Ars Technica's creative director. "With the rapid increase in visual acuity and ability to get a usable result, there’s no question it’s beyond being a gimmick or toy and is a legit tool."
ChatGPT and DALL-E 3 currently work hand-in-hand, making AI art generation into an interactive and conversational experience. You tell ChatGPT (through the GPT-4 [arstechnica.com] large language model) what you'd like it to generate, and it writes ideal prompts for you and submits them to the DALL-E backend. DALL-E returns the images (usually two at a time), and you see them appear through the ChatGPT interface, whether through the web or via the ChatGPT app [arstechnica.com].
scraped captions—written by humans—aren't always detailed or accurate, which leads to some faulty associations that reduce an AI model's ability to follow a written prompt.
To get around that problem, OpenAI decided to use AI to improve itself. As detailed in the DALL-E 3 research paper [openai.com], the team at OpenAI trained this new model to surpass its predecessor by using synthetic (AI-written) image captions generated by GPT-4V [arstechnica.com], the visual version of GPT-4. With GPT-4V writing the captions, the team generated far more accurate and detailed descriptions for the DALL-E model to learn from during the training process.