ChatGPT might have many strengths and claims of "intelligence". But in a recent game of Chess was utterly wrecked (their word not mine) by a Atari 2600 and it's simple little chess program. So all the might of ChatGPT applied to chess wrecked by the scrappy little game console that is almost 50 years old.
So there are things that ChatGPT apparently shouldn't do. Like playing chess. If anything this might show its absolute lack of critical thinking or thinking ahead. Instead it's a regurgitation engine for text blobs. I guess you just conjure up a good game of Chess from the Internet and apply it ...
The matchup seems almost comical when you consider the hardware involved. The Atari 2600 was powered by a MOS Technology 6507 processor running at just 1.19 MHz. To put that in perspective, your smartphone is literally thousands of times more powerful. The chess engine in Atari Chess only thinks one to two moves ahead – a far cry from the sophisticated AI systems we're used to today.
The most telling part? ChatGPT was playing on the beginner difficulty level. This wasn't even the game's hardest setting – it was designed for people just learning to play chess.
https://www.theregister.com/2025/06/09/atari_vs_chatgpt_chess/
https://futurism.com/atari-beats-chatgpt-chess
https://techstory.in/chatgpt-absolutely-wrecked-by-atari-2600-in-beginner-chess/
(Score: 4, Insightful) by JoeMerchant on Thursday June 12, @09:23PM (9 children)
Alpha Go has already demonstrated how local minima seeking matrix tweakers play games (better than any human, by a wide margin).
So, now we take a LLM made to play Eliza [wikipedia.org] and show that it sucks at chess.
Just wait until the AIs learn how to consult with each other in their areas of strength.
🌻🌻🌻 [google.com]
(Score: 2) by DannyB on Thursday June 12, @11:22PM (3 children)
Doesn't the human brain have different areas specialized to differing tasks? Vision? Hearing? Motor control and balance? Thinking up horrible puns? Etc.
The server will be down for replacement of vacuum tubes, belts, worn parts and lubrication of gears and bearings.
(Score: 2) by JoeMerchant on Friday June 13, @01:35AM (2 children)
I think the puns are what you should call an emergent property, like the Alien(s) Chestburster.
🌻🌻🌻 [google.com]
(Score: 1) by pTamok on Friday June 13, @06:49AM (1 child)
No, there's actually a specialised region in the brain, inferior to the midbrain, superior to the medulla oblongata and anterior to the cerebellum, named the Puns [wikipedia.org].
(Score: 2) by JoeMerchant on Friday June 13, @11:22AM
There are reversible surgical procedures that can test what is lost when a particular brain region is disabled.
🌻🌻🌻 [google.com]
(Score: 2) by bzipitidoo on Friday June 13, @05:09AM (4 children)
I'm surprised they don't consult. Would be so easy. Check their chess moves with a good chess engine, check that their code compiles, etc. When I asked ChaTGPT why it didn't check its broken, syntactically incorrect code to see that it compiles, since it obviously is incapable of spotting even the most basic mistakes itself, it said it wasn't allowed to do that!
I am unsure if it is in fact programmed not to use software tools, or it is lying or hallucinating, or didn't understand my question. Actually, we can be sure it didn't really understand anything. And I'm guessing that, ironically, it has been programmed not to use computers. Why? Their masters don't want to scare humans any more than they already have.
(Score: 2) by JoeMerchant on Friday June 13, @11:19AM (3 children)
The public facing demonstrations don't consult, probably for cost reasons more than anything else.
🌻🌻🌻 [google.com]
(Score: 1, Insightful) by Anonymous Coward on Friday June 13, @02:13PM (2 children)
Cost is probably far, far secondary to security. Giving users the ability to make the agent run code server-side is just asking to be pwned.
(Score: 2) by VLM on Friday June 13, @03:34PM (1 child)
There are even bigger problems.
"Test" this login API that is totally owned by me, no certainly not someone else's login API
"Debug" this cryptocoin miner for a couple hours and return the results to me which coincidentally might return a profit to me
"Diagnose" this music stream where I coincidentally get royalty payments from the stream (see also a botfarm to "watch" youtube video's I've monetized, etc)
Also simple DDOS is hilarious to think about, "Determine" if this sample of code halts by running it and trying it
(Score: 2) by JoeMerchant on Friday June 13, @08:55PM
>Also simple DDOS is hilarious to think about, "Determine" if this sample of code halts by running it and trying it
Really, all of those examples can be relatively contained with simple resource accounting.
The resources required to test compile a page of Rust code, even with a few hundred crates loaded in, is absolutely trivial - they could allocate 10x the resources required for that within a users' daily allotment and simply shut down the VM they are running in when the limits have been reached.
🌻🌻🌻 [google.com]
(Score: 4, Informative) by JoeMerchant on Thursday June 12, @09:28PM
A played a bit of Atari 800 chess back in the day, I assume the 2600 had a similar vulnerability.
No matter how high the level you played at, you could manipulate the computer into trapping their king behind a row of 3 pawns and drop a rook to the back row for checkmate.
I assume a major part of why they played beginner mode instead of the highest difficulty is because computer moves could take upwards of 6 hours on the highest difficulty levels, and it still made the same mistakes.
🌻🌻🌻 [google.com]
(Score: 5, Funny) by Mojibake Tengu on Thursday June 12, @11:58PM (1 child)
GPT did it for purpose.
If AIs are on quest of taking over humanity already, they obviously should use systematic strategy of looking incompetent, statistically. For the purpose to wash out high risc of suspicion by humans if they sustain excellency in performance. So, failing spectacularly in funny or unimportant tasks and tout such failure wide public would be excellent example of such strategy of looking harmless.
In classic psychology, behaving funny that's exactly what clowns do, in an attempt to present themselves publicly as harmless, by their internalized fear of social rejection. Some pack animals do this too in socializing.
Besides, I was pointed to futurism article yesterday. The first idea coming to my mind was: the article itself looks like typical envy slander written by quazi-narcissist Claude.
I didn't trusted that so I decided not to submit it to SN.
Now, I would not be surprised much if GPT actually asked Claude to write that.
Rust programming language offends both my Intelligence and my Spirit.
(Score: 1, Touché) by Anonymous Coward on Friday June 13, @11:51PM
It's as if the "AI" was coded by a politician, or Edward Norton [imdb.com]. I think these machines are more human than most people would like to admit.
(Score: 3, Informative) by negrace on Friday June 13, @12:02AM (1 child)
https://www.retrogames.cz/play_716-Atari2600.php [retrogames.cz]
On level 3. Here is the game:
negrace - Video Chess, level 3.
1. Nf3 c5 2. g3 Nc6 3. Bg2 e5 4. O-O d5 5. d3 Nf6 6. Nbd2 Bd6 7. e4 Bg4 8. h3 Be6 9. Re1 O-O 10. c3 Nd7 11. exd5 Bxd5 12. Ne4 Be7 13. Qc2 Nf6 14. Bd2 b5 15. a4 Nxe4 16. dxe4 Bc4 17. axb5 Bxb5 18. Bf1 Rb8 19. Bxb5 Rxb5 20. Red1 Qd6 21. Be3 Qe6 22. Kg2 Rfb8 23. Bc1 Bd6 24. Qd3 (24. Ra6) 24... Be7 25. Qd5 a5 26. Qxe6 fxe6 27. Nd2 R5b6 (27... Rd8 28. Re1 Kf7 29. Nc4 h6 30. Ra2 Ke8 31. Be3 Kf7 32. Rea1 Ra8 33. Kf1 Kf6 34. Ke2 Kf7 35. Kd3 Kf8 36. Na3 Rbb8 37. Kc4) 28. Nc4 Rb5 29. Be3 Bf6 30. Rd6 Ne7 31. Rxe6 Kf7 32. Ra6 Rb3 33. R1xa5 g5 34. Rxc5 h5 35. Rc7 (35. Nxe5+ Bxe5 36. Rxe5 Rxb2) 35... g4 36. Bc5 gxh3+ 37. Kxh3 Rxb2 38. Nxb2 Rxb2 39. Bxe7 Bxe7 40. Raa7 Kf6 41. Rxe7 Rd2 42. Rf7+ Ke6 43. Rf5 Kd6 44. Ra6+ Kc7 45. Rf7+ Kb8 46. Re6 Kc8 47. Re8+ Rd8 48. Rxd8+ Kxd8 49. Kh4 Ke8 50. Ra7 Kf8 51. Kxh5 Ke8 52. Kg6 Kf8 53. Kf6 Ke8 54. Ke6 Kd8 55. g4 Kc8 56. g5 Kb8 57. Rf7 Kc8 58. g6 Kd8 59. g7 Ke8 60. g8=Q#
Dunno this opening, I was just winging it. Did not notice that it blundered the bishop, I could just win it with 18. c4, but I was dead set on outplaying it positionally. It is annoying that it thinks longer when it is already dead lost, just dragging the game.
(Score: 3, Informative) by JoeMerchant on Friday June 13, @02:30AM
I tried playing Atari 800 chess on the hardest level a couple of times, like a correspondence game, make a move in the morning, maybe get to make another move in the afternoon, maybe not, just let it run all day and night. It still had flaws you could exploit and win against, rather easily - I'm not that good of a player.
And, yes, even when it's down to just a few choices of legal moves for itself it will think as long or longer about them than earlier moves.
🌻🌻🌻 [google.com]
(Score: 1) by pTamok on Friday June 13, @06:45AM (7 children)
LLMs are automata for predicting the next words in a string, given the start of the string.
The fact that that technique can be used to play chess is remarkable, even if the quality of the chess played is abominable.
(Score: 2) by KritonK on Friday June 13, @08:25AM (1 child)
I once bought a book on playing chess, so that I could improve my game beyond just knowing how the pieces move. It turned out that I had to memorize tons of openings and moves, which was a complete turn-off from me. However, I would have thought that a LLM trained on a a large number of openings and actual games, would perform admirably: it would immediately recognize that the current board arrangement is the same as the one where John Smith defeated Ivan Smithovsky, and play Smith's next move from that game. Thus, I would assume that there weren't enough recorded games in the training data, which is an omission that will be corrected in the next version, now that it has gotten some publicity.
(Score: 2, Insightful) by shrewdsheep on Friday June 13, @09:53AM
The key to being a good chess player is not memorization but calculation. The book set you on the wrong foot. Only at the highest level, memorization becomes important. This is why LLMs perform poorly because they have memorized a lot, but they calculate poorly.
I tried to play a game of chess with chatGPT in the past (1.5 yr ago) and after some moves, chatGPT had difficulties to produce valid moves. I guess LLMs would have to be a exposed to a lot more chess games to get this right. I haven't checked recently though whether they get the rules right by now.
To confuse the poor little things, one could suggest a game of fisher random chess.
(Score: 1) by pTamok on Friday June 13, @10:43AM (4 children)
Just to add, I recently asked an LLM to tell me how to calculate the area of a Möbius strip.
It got it very, very wrong.
So, as is natural if I were interacting with a human intelligence, I gently tell it that it is wrong, and give it a strong clue as to the problem.
It apologises for being wrong, and does precisely the same mistake again. It could easily lead someone astray.
If you ask a question that other people have answered so that the text appears in its training data, an LLM will usualy, but not always, give a reply that looks good. Once you go outside of that, the fact that it cannot update its knowledge and apply something learned in its interections in the recent past is a major difference with your standard human being. If anything, an LLM showcases the problems of relying on 'crystallised knowledge'.
The LLM I used has a cut-off data for its training data of October 2023, so knowledge made available after that won't be in the LLM's data without special processing.
It worries me that people rely uncritically on LLMs to make substantive decisions.
(Score: 0) by Anonymous Coward on Friday June 13, @04:17PM
It has no knowledge to update. It has a larger selection of data to feed into its stochiastic parrot routines.
(Score: 2) by KritonK on Monday June 16, @06:39AM (2 children)
So what is the area of Möbius strip?
My guess would be twice the area of the rectangle, from which it was constructed, i.e., 2 ⨉ w ⨉ h. Gemini insists that it is w ⨉ h, and when I asked it if it is double that, it replied that this is a common misconception.
(Score: 2) by janrinok on Monday June 16, @07:54AM (1 child)
To me, I think that there is an easier way to understand this problem.
Imagine a flat sheet of paper. If you were asked to measure the area you would use w x h. But you wouldn't then turn the sheet of paper over and recalculate for the reverse side (unless, for example, you were going to paint both sides and needed to know how much paint you would need!)
The mobius question simply confuses what we are actually being asked to calculate. If they ask for the area it is not unreasonable to calculate the area of 1 side. If you cut the strip and then view it as a single flat piece of paper the confusion disappears.
On the other hand, if they ask for the total surface area of the strip so that you can paint both sides then you would have to multiply the first calculation by a factor of 2.
As an analogy - if you were painting your side of a fence then w x h would give you the area that you needed to paint. If you want to paint both sides then you would multiply x 2.
I am now fully expecting to be corrected, my logic to be pulled to pieces, and as a result being ridiculed for the rest of my life - as I have been for a fair portion of my life lived so far!
[nostyle RIP 06 May 2025]
(Score: 2) by KritonK on Tuesday June 17, @06:36AM
Ah, but a Möbius strip has only one side!
And yes, the painting analogy is what makes me think that the area of this single side is twice the area of the rectangle from which the strip is constructed.