https://arstechnica.com/information-technology/2024/09/openais-new-reasoning-ai-models-are-here-o1-preview-and-o1-mini/ [arstechnica.com]
OpenAI finally unveiled its rumored "Strawberry" AI language model on Thursday, claiming significant improvements in what it calls "reasoning" and problem-solving capabilities over previous large language models (LLMs). Formally named "OpenAI o1 [openai.com]," the model family will initially launch in two forms, o1-preview [openai.com] and o1-mini [openai.com], available today for ChatGPT Plus and API users.
[...]
In a rare display of public hype-busting, OpenAI product manager Joanne Jang tweeted [x.com], "There's a lot of o1 hype on my feed, so I'm worried that it might be setting the wrong expectations. what o1 is: the first reasoning model that shines in really hard tasks, and it'll only get better. (I'm personally psyched about the model's potential & trajectory!) what o1 isn't (yet!): a miracle model that does everything better than previous models. you might be disappointed if this is your expectation for today's launch—but we're working to get there!"
[...]
AI benchmarks are notoriously unreliable [themarkup.org] and easy to game; however, independent verification and experimentation from users will show the full extent of o1's advancements over time. On top of that, MIT Research showed [fastcompany.com] earlier this year that some of OpenAI's benchmark claims it touted with GPT-4 last year were erroneous or exaggerated.One of the examples of o1's abilities that OpenAI shared [x.com] is perhaps the least consequential and impressive, but it's the most talked about due to a recurring meme where people ask LLMs to count the number of Rs in the word "strawberry." Due to tokenization, where the LLM processes words in data chunks called tokens, most LLMs are typically blind to character-by-character differences in words.
[...]
It's no secret that some people in tech have issues with anthropomorphizing AI models and using terms like "thinking [www.techpolicy.press]" or "reasoning" to describe the synthesizing and processing operations that these neural network systems perform.Just after the OpenAI o1 announcement, Hugging Face CEO Clement Delangue wrote [x.com], "Once again, an AI system is not 'thinking', it's 'processing', 'running predictions',... just like Google or computers do. Giving the false impression that technology systems are human is just cheap snake oil and marketing to fool you into thinking it's more clever than it is."