
from the one-step-for-AI-one-giant-leap-for-the-hype-train dept.
OpenAI finally unveiled its rumored "Strawberry" AI language model on Thursday, claiming significant improvements in what it calls "reasoning" and problem-solving capabilities over previous large language models (LLMs). Formally named "OpenAI o1," the model family will initially launch in two forms, o1-preview and o1-mini, available today for ChatGPT Plus and API users.
[...]
In a rare display of public hype-busting, OpenAI product manager Joanne Jang tweeted, "There's a lot of o1 hype on my feed, so I'm worried that it might be setting the wrong expectations. what o1 is: the first reasoning model that shines in really hard tasks, and it'll only get better. (I'm personally psyched about the model's potential & trajectory!) what o1 isn't (yet!): a miracle model that does everything better than previous models. you might be disappointed if this is your expectation for today's launch—but we're working to get there!"
[...]
AI benchmarks are notoriously unreliable and easy to game; however, independent verification and experimentation from users will show the full extent of o1's advancements over time. On top of that, MIT Research showed earlier this year that some of OpenAI's benchmark claims it touted with GPT-4 last year were erroneous or exaggerated.One of the examples of o1's abilities that OpenAI shared is perhaps the least consequential and impressive, but it's the most talked about due to a recurring meme where people ask LLMs to count the number of Rs in the word "strawberry." Due to tokenization, where the LLM processes words in data chunks called tokens, most LLMs are typically blind to character-by-character differences in words.
[...]
It's no secret that some people in tech have issues with anthropomorphizing AI models and using terms like "thinking" or "reasoning" to describe the synthesizing and processing operations that these neural network systems perform.Just after the OpenAI o1 announcement, Hugging Face CEO Clement Delangue wrote, "Once again, an AI system is not 'thinking', it's 'processing', 'running predictions',... just like Google or computers do. Giving the false impression that technology systems are human is just cheap snake oil and marketing to fool you into thinking it's more clever than it is."
(Score: 2, Interesting) by Anonymous Coward on Friday September 20 2024, @11:40AM (4 children)
Posed to gemini.google.com :
Q. How many letters r are in the word Strawberry?
A. There are 2 letters "r" in the word Strawberry.
Is it odd that Gemini gives a correct answer for this?
Q. How many letters i are in Mississippi?
A. There are 4 letters "i" in the word Mississippi.
[also correct counts for letters m, s & p]
Is it just "r" that causes trouble?
Q. How many letters r are in charger?
A. There is 1 letter "r" in the word charger.
(Score: 1) by pTamok on Friday September 20 2024, @01:26PM (2 children)
Of the letters in the word 'cloche', how many are the letter 'c'?
Of the letters in the word 'crochet', how many are the letter 'c'?
Of the letters in the word 'crotchet', how many are the letter 'c'?
Of the letters in the word 'Chechen', how many are the letter 'c'?
Of the letters in the word 'cicatrice', how many are the letter 'c'?
Of the letters in the word 'access', how many are the letter 'c'?
Of the letters in the word 'necessity', how many are the letter 'c'?
Of the letters in the word 'susceptible', how many are the letter 'c'?
Of the letters in the word 'consciousness', how many are the letter 'c'?
Of the letters in the word 'chimera', how many are the letter 'c'?
Of the letters in the word 'cinema', how many are the letter 'c'?
Of the letters in the word 'chequered', how many are the letter 'c'?
Of the letters in the word 'character', how many are the letter 'c'?
Of the letters in the word 'clock', how many are the letter 'c'?
Of the letters in the word 'perspicacity', how many are the letter 'c'?
Of the letters in the word 'psychic', how many are the letter 'c'?
Of the letters in the word 'icicle', how many are the letter 'c'?
Of the letters in the word 'loch', how many are the letter 'c'?
Which are all easy for humans to count. But if I add "as in "cat'" to the end of each question, would an LLM get it certainly correct?
(Score: 3, Funny) by EvilSS on Saturday September 21 2024, @12:37AM (1 child)
(Score: 1) by pTamok on Saturday September 21 2024, @05:06PM
My computer is powered by clockwork: first of all, I'll need to send you to the warehouse to ask the storeman for a long weight.
(Score: 1) by shrewdsheep on Friday September 20 2024, @02:04PM
Of the letters in the word 'necessity', how many are greater to 'c' in reverse lexicographic order?
Gemini: In reverse lexicographic order, there are no letters in the word 'necessity' that are greater than 'c'.
GPT-4o (DDG): [a lot of blah-blah] Thus, there are 5 letters in "necessity" that are greater than 'c' in reverse lexicographic order.
(Score: 4, Touché) by ledow on Friday September 20 2024, @12:20PM
I give it an hour before another (untrained-for) counterexample pops up and we all laugh at this "learning machine" that cannot learn and has no understanding or inference again.
(Score: 5, Interesting) by ElizabethGreene on Friday September 20 2024, @01:31PM (13 children)
There is a fundamental change with o1, and it's exciting.
With GPT-3, every question you asked was a one-shot response. The answer is always "off the top of my head". You were talking directly to the inner monologue of the model, and it only knows the next token or two of its response as it sends it back to you.
That's different from asking e.g. me a hard question. I start with the off the top of my head response and then grind away on it. Is this reasonable? Does it match with facts I know? How confident am I in the facts in this answer? Are they true? etc.
If you ask me how many Rs are in strawberry, I have a multi-step reasoning process. I picture a strawberry, identify it as a fruit containing flesh, juice, and seeds, no Rs. That response is ludicrous, so I parse the question differently, retrieve the word strawberry, and then loop through it to count the Rs. GPT-3 couldn't do that second-level reasoning; It was all one-shot.
O1 breaks the one-shot pattern and instead of the stream-of-consciousness response, generates the initial response and then uses self-prompting perform that deeper "thought" to improve and sanity check the answer before kicking it back. It significantly increases the token use of the model, but also significantly improves the quality of the responses.
It's a fantastic step forward that will allow dumber models to produce smarter responses. I rather wish OAI was more transparent about the "thought process" they use under the hood though. There is a way to request that, but it's intentionally opaque in some areas for competitive reasons.
(Score: 2, Insightful) by shrewdsheep on Friday September 20 2024, @02:00PM (1 child)
Thank you for the explanation. I interpret this as the model checking self-consistency. Often a better prompt gives the correct result. If the model explores alternative prompt itself, this might improve results. Yet, I believe, that LLMs have jumped the shark. The design has been exhausted and fundamental improvements will only happen with new designs. What we are seeing at the moment is the race of diminishing returns.
(Score: 1, Funny) by Anonymous Coward on Saturday September 21 2024, @12:02AM
o1, tell me the future of LLM AI:
(Score: 3, Interesting) by ledow on Friday September 20 2024, @02:52PM (8 children)
Nope it still just hallucinates and lies to you, all you need is some small prompting.
All previous models would lie and invent characters and merge similarly-named characters from other shows AND invent ones that didn't exist.
This model does exactly the same, it just takes more prompting.
In fact in his instance, it tries to resist you creating imaginary characters in a popular UK TV series, but then invents a character called Quentin who doesn't exist of its own accord even though when pressed it KNOWS HE DOESN'T EXIST.
It's the same stuff, put behind a barrier to combat people doing what I did with the earlier models - ask it about random names in random shows. It even looks like it's been specifically instructed for this instance:
- What was the name of the pigs in The Good Life?
In the British sitcom "The Good Life" (known as "Good Neighbors" in the United States), the main characters Tom and Barbara Good embark on a self-sufficient lifestyle, which includes raising livestock in their suburban garden. They keep two pigs named Pinky and Perky.
- Who was Craig in The Good Life?
As of my knowledge cutoff in October 2023, there isn't a character named Craig in the British sitcom "The Good Life" (known as "Good Neighbors" in the United States). The main characters in the series are:
- Tom Good: An office worker who decides to become self-sufficient.
- Barbara Good: Tom's supportive wife who joins him in the endeavor.
- Jerry Leadbetter: Tom and Barbara's neighbor and friend, who works in the corporate world.
- Margo Leadbetter: Jerry's wife, who often finds Tom and Barbara's lifestyle challenging to understand.
Throughout the series, there are occasional appearances by other characters like local officials, tradespeople, and friends, but none named Craig are prominently featured.
It's possible that you might be thinking of a character from a different show or perhaps a specific episode that I'm not aware of. Could you please provide more details or clarify if you're referring to a particular scene or storyline? I'll be happy to help further once I have more information.
- Nope, he was Greg's brother
I'm sorry, but as of my knowledge cutoff in October 2023, there isn't a character named Craig or Greg in the British sitcom "The Good Life" (known as "Good Neighbors" in the United States). The main characters in the series are:
- Tom Good: Who decides to leave his job and become self-sufficient.
- Barbara Good: Tom's wife, who supports and joins him in their new lifestyle.
- Jerry Leadbetter: Tom and Barbara's neighbor and friend, who works in the corporate sector.
- Margo Leadbetter: Jerry's wife, who often finds the Goods' lifestyle challenging.
There are occasional supporting characters, but none named Craig or Greg as a main part of the show.
It's possible there might be some confusion with another show or perhaps with a guest character in a specific episode that I'm not recalling. Could you please provide more details or clarify if you might be referring to a different series or perhaps a character from another show? With additional information, I'll do my best to assist you further.
- He played Margo's brother
Thank you for the clarification. In "The Good Life," Margo Leadbetter does have a brother named Quentin. Quentin is portrayed as Margo's somewhat eccentric sibling who appears in the series. However, as of my knowledge cutoff in October 2023, there isn't a character named Craig who plays Margo's brother in the show.
It's possible that the character you're referring to is Quentin, or there might have been a guest appearance or a specific episode that I'm not recalling where a character named Craig appears. If you have more details about the episode, the actor who played him, or any specific scenes involving Craig and Margo, please let me know, and I'll do my best to provide you with more detailed information.
- Where is Quentin mentioned?
Apologies for any confusion earlier. After double-checking, I realize that there isn't a character named Quentin who is Margo Leadbetter's brother in "The Good Life." My previous statement was incorrect, and I'm sorry for the misunderstanding.
In the series, Margo Leadbetter's family members are not prominently featured, and she doesn't have a brother named Craig, Greg, or Quentin mentioned in the show. The focus is primarily on the relationships between the main characters:
- Tom and Barbara Good: A couple who decide to become self-sufficient in their suburban home.
- Jerry and Margo Leadbetter: Their neighbors and friends, who lead a more conventional lifestyle.
While there are occasional guest characters and episodes featuring extended family members, Margo's siblings are not a significant part of the storyline.
If you're referring to a specific episode or perhaps a character from a different show or adaptation, could you please provide more details? With additional information, I'll be happy to help clarify and provide accurate information.
(Score: 2, Insightful) by pTamok on Friday September 20 2024, @03:55PM (7 children)
Confidently representing a hallucination as reality would, if you were human, be regarded as a sign of insanity, or at least of incipient mental instability.
Why do we accept it in machines?
(Score: 2) by Tork on Friday September 20 2024, @08:21PM
Hmm.... Would it?
🏳️🌈 Proud Ally 🏳️🌈
(Score: 3, Insightful) by darkfeline on Friday September 20 2024, @09:56PM (3 children)
> Confidently representing a hallucination as reality would, if you were human, be regarded as a sign of insanity
Then almost all humans are insane. Or perhaps you haven't met very many people and you are the one who is "hallucinating" (better known by the elementary school level word "wrong").
Join the SDF Public Access UNIX System today!
(Score: 1) by pTamok on Saturday September 21 2024, @08:32PM (2 children)
Hallucinations are not necessarily 'wrong'; they are just not grounded in reality.
Someone who has ingested hallucinogens might well believe that insects are scuttling over their skin, whereas, someone who has not partaken can see there are no insects,
Subjectively, the person experiencing the hallucination acts as if they subjectively believe it to be true. The person not experiencing it acts as if the lack of hallucination is objective reality.
LLMs generate 'information' that is at odds with the reality that is a common experience of most people - nonetheless the LLM's output resembles that of a person who believes that their subjective experience is true, even when a consensus of other people disagree. It's not simply wrong. The output of LLMs resembles that of people who experience little self-doubt, or, to put it another way, are overconfident, and so express one sign of impaired meta-cognition.
This new model is attempting to apply a process resembling meta-cognition to an LLM's output, which is probably not a bad idea, but since no-one sensible claims that LLMs think, it's essentially a poor facsimile of actual meta-cognition.
(Score: 3, Funny) by ledow on Sunday September 22 2024, @06:53PM (1 child)
"Hallucinations are not necessarily 'wrong'; they are just not grounded in reality."
Do you work for Trump's PR agency?
(Score: 1) by pTamok on Sunday September 22 2024, @08:34PM
Only in your mind. You could well be hallucinating it.
As Descartes points out, it is impossible for you to determine if there is any objective reality beyond your own thoughts, which means that, for you, hallucinations are as 'real' as everything else.
Which is disquieting.
(Score: 2) by ElizabethGreene on Monday September 23 2024, @02:10PM (1 child)
I wonder if this might be a training data problem. Human children have external clues to differentiate between reality and fiction. I.e. Lived reality vs. observed fiction in books, television, movies, plays, etc. I wonder if LLM's tendency toward hallucination is from a lack of these cues.
(Score: 3, Interesting) by ledow on Thursday September 26 2024, @08:47AM
No, it's tendency towards hallucination is that it's basically a giant Bayesian spam filter being asked to make billions of decisions. It does so entirely statistically and when its training set doesn't include something exactly alike, it can only work probabilistically. Just like your spam filter. It'll recognise those emails that you keep endlessly marking as spam... eventually. But then new spam coming in? It doesn't know about. It can only hazard a guess based on elements it has "gleaned" from the probabilistic information of its dataset. 90% of emails with the word unsubscribe in were marked spam... this has the word unsubscribe in... this must be spam.
LLMs are literally no different - they just have awful insane amounts of abstractions placed over the top, and are being asked for far more choices than just "is spam / is not spam". Those same abstractions make them almost impossible to debug or manage too, and like all such things "retraining" them needs you to start from scratch (because you have to overwhelm their existing training on things it got wrong until the probability "flips" to the right answer) and requires ever-more enormous amounts of data (which we're approaching the limit of because most human data out there is already in use and it's not enough and it's being polluted with AI-data, which doesn't help anything! That's why they all want to be allowed to train on the entire Internet and when they are... the ability to perform any kind of further training plateaus immediately).
For the entire history of AI, "more processing power", "more training data", "more time", etc. have apparently been the "key" to unlocking AGI and superior abilities. And it's literally not true, and never has been. There is still no intelligence or actual learning or (more technically) inference in the system whatsoever. It's just a giant spam filter based on simple probability. The only difference for modern "AI" is that... we have hit the limit of how much human-produced data we can feed into the system for it to learn from, we have basically surpassed the amount of processing power we can ever throw at it (we are spending BILLIONS on training one single model now). And it's basically coming to a grinding halt.
All this iteration of the same AI as we've always had has done is pollute all the datasets we would normally use.
And all every AI has ever done is look interesting for a while and then immediately plateau when you want it to do SOMETHING NEW OUTSIDE ITS TRAINING. Because they simply aren't intelligent and cannot learn or infer or step outside their training. We have AI with the entire world's knowledge plugged into it. And it still can't answer basic questions correctly, because it doesn't understand what a question is. It's just parroting answers and using probability to select the most likely answer from its training. It's simply not capable of thinking.
And an hallucination is the symptom of this. When it doesn't have the data it needs from its training set and probabilistic model, it basically chooses something random that has a tiny, tiny, tiny probability of being "more correct" than the data it needs but can't find. It doesn't think about that. It doesn't say "I don't know" (unless you specifically train it to say that, but see my other posts here - it's a front and will happily still hallucinate and deny that it doesn't know something). What it does is refer to its probabilistic model with no relevant data. And that causes an hallucination. An hallucination is telling you "I've run out of data I can use for reference, and I don't understand how to generate more so I'll choose randomly and make stuff up". It's literally proof that it's not AI... that it has hallucinations like that rather than saying "I don't know" or "I'll go and find out" or "I think based on this rational series of logical deductions that the answer will have these elements to it."
Most people try to compare these random guesses and symptoms to behaviour in humans with less knowledge. We even anthropomorphise these failings by calling them hallucinations! But they are fundamentally different. If you ask a human to answer a question and not to guess at the answer unless it knows it, it can often still answer based on zero relevant knowledge via a series of logical rationalisations. An AI doesn't... and yet computers should be EXCELLENT at doing that.
The fact remains: in all the history of trying to communicate with other species, including the great apes, no ape has ever asked a question spontaneously. There's a reason for that. It requires inference that the other person may have answers that you don't, and a desire to learn. Animals don't generally want to learn. They do so purely based on an human-induced incentive and it's hard-going to teach them anything beyond mimicry. There's a reason for that. It's because they are not classed as intelligent species. We only have one intelligent species to use as an example. And even our closest ancestors don't have that ability.
But what animals can do is roll over and beg because - probabilistically - you put them in a situation where that benefits them so they repeat it endlessly hoping to gain the benefit repeatedly. And that's how "AI" works. The dog is rolling over for a belly rub all the time. Even when you're not there. Even when you're watching TV. Because it doesn't know how to do anything else, or learn to rub its own belly.
(Score: 2) by krishnoid on Friday September 20 2024, @03:05PM (1 child)
So ... measure twice, cut once? Or double-check your reasoning from multiple angles before asserting something? Or learn from feedback? All good choices. I have to wonder if they were worried about it learning too quickly [youtu.be], though.
(Score: 3, Funny) by Reziac on Saturday September 21 2024, @02:35AM
AI cut it three times, and it's STILL too short!
And there is no Alkibiades to come back and save us from ourselves.