ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

submitted by

www.tomshardware.com/tech-industry/artificial-i…

164

Log in to comment

63 Comments

Attempting to badly quote someone on another post: « How can people honestly think a glorified word autocomplete function could be able to understand what is a logarithm? »

You can make external tools available to the LLM and then provide it with instructions for when/how to use them.
So, for example, you'd describe to it that if someone asks it about math or chess, then it should generate JSON text according to a given schema and generate the command text to parametrize a script with it. The script can then e.g. make an API call to Wolfram Alpha or call into Stockfish or whatever.

This isn't going to be 100% reliable. For example, there's a decent chance of the LLM fucking up when generating the relatively big JSON you need for describing the entire state of the chessboard, especially with general-purpose LLMs which are configured to introduce some amount of randomness in their output.

But well, in particular, ChatGPT just won't have the instructions built-in for calling a chess API/program, so for this particular case, it is likely as dumb as auto-complete. It will likely have a math API hooked up, though, so it should be able to calculate a logarithm through such an external tool. Of course, it might still not understand when to use a logarithm, for example.

My 2€ calculator obliterates a 200.000€ ferrari doing multiplications.

Is this just because gibbity couldn't recognize the chess pieces? I'd love to believe this is true otherwise, love my 2600 haha.

At first it blamed its poor performance on the icons used, but then they switched to chess notation and it still failed hard

That's on them for taking on the Atari 2600, where "the games don't get older, they get better!"

If llms are statistics based, wouldn't there be many many more losing games than perfectly winning ones? It's like Dr strange saying 'this is the only way'.

It's not even that. It's not a chess AI or a AGI (which doesn't exist). It will speak and pretend to play, but has no memory of the exact position of the pieces nor the capability to plan several steps ahead. For ask intended and porpoises, it's like asking my toddler what's the time (she always says something that sounds like a time, but doesn't understand the concept of hours or what the time is)

The fact that somebody posted this on LinkedIn and not only wasn't shamed out of his job but there are several articles about it is truly infuriating.

In other news: My toaster makes better toast than my vacuum.

If ChatGPT were marketed as a toaster nobody would bat an eye. The reason so many are laughing is because ChatGPT is marketed as a general intelligence tool.

Do you have any OpenAI stuff (ad, interview, presentation...) That claims it's AGI? Because I've never seen such thing, only people hyping it for clicks and ad revenue

I was very careful not to use the term AGI for this reason. General intelligence tool isn’t the same thing. It’s a much weaker claim, yet it’s also a far stronger claim than any purpose-built software. The ambiguity is part of their marketing strategy.

Question remains. Any marketing about it being general intelligence? Not general use, but general intelligence.

No, though there’s been plenty of marketing where they claim “we know how to build AGI.”

They have marketed ChatGPT as a general purpose AI from the very beginning, though the question of how to leverage that has remained open.

Your vacuum uses more power than a 150,000-person city just to clean an 8’ square rug?

That does suck.

Heh.

True AI does not and will not exist

This is so stupid and pointless...

"Thing not made to solve spesific task fails against thing made for it.."

This is like saying that a really old hand pushed lawn mower is better then a SUV at cutting grass...

SUVs aren't marketed as grass mowers. LLMs are marketed as AI with all the answers.

I'd be interested in seeing marketing of ChatGPT as a competitive boardgame player. Is there any?

Not necessarily that AI is marketed as a competitive board game player, but that AI is marketed as intelligence. This helps illustrate how clueless it really is.

There are plenty of geniuses out there who aren't great at board games. Using a tool not fit for task is more of an issue with the person using the wrong tool than an issue with the tool itself.

I do get where you're coming from though. There are definitely people who don't understand why a ChatBot wouldn't be good at chess.

Do you expect rocket scientists to be good at chess?

Intelligence doesn’t mean it’s blanket smart. This is entirely on individual people for this asinine assumption. It’s never been marketed that way, so why in this singular case is the definition suddenly different? The general public understands this isn’t some be all end all. This assumptive attitude that Lemmy has is fucking weird.

The general public understands this isn’t some be all end all.

I disagree.

Got a link? Because people just think it’s cool, not that it’s gonna be this thing that can do everything.

So there must be some place people are getting this “it can do everything” idea from? It’s more an anti-ai propaganda angle, and that’s prevalent mainly on Lemmy. So a source to back up this “ai can do anything” please.

I would expect anyone claiming to be intelligent to be able to beat an Atari 2600 set to its very lowest difficulty. This is a task on par with counting the number of Rs in the word 'strawberry', something the intelligent ChatGPT also famously cannot do.

It’s actually not that easy. Fire up an emulator and take it for a spin. Like, you won’t get away with obvious mistakes.

Do you think being good at chess is equivalent to intelligence…?

Those are also vastly different tasks, a toddler can count, while they likely can’t play chess.

You have a very strange notion of what “intelligence” means.

A toddler untrained at counting and untrained at chess would be good at neither. Same goes for adults, you are untrained in rocket physics, so you won’t be good at it either. Why are you holding an ai at some weird ungodly bar that doesn’t apply to anything else? No one’s claimed it to be good at these things. Adults who can’t swim and go in water drown, why? Because they weren’t taught. Notice a pattern yet?

These tools are marketed as replacing lots of jobs that are a hell of a lot more complex than a simple board game.

These tools are marketed as replacing lots of jobs that are a hell of a lot more complex than a simple board game.

There isn't really a single sliding scale of "complexity" when it comes to certain tasks.

Given the appropriate input, a calculator can divide two numbers. But it can't count the number of R's in the word "strawberry".

Meanwhile, a script that could count the number of instances of a letter in a word could count those R's, but it couldn't divide any two numbers.

Similarly, we didn't complain that a typewriter couldn't put pepperoni slices onto a pizza.

Made people click though didnt it.

Man all these people coping, I thought chatgpt was supposed to be a generic one able to do anything?

It depends. Have you used it? If not - Yes! It does do . . . all the things.

If you have used it, I’m sorry that was incorrect. You simply need to pay for the upgraded subscription. Oh, and as a trusted insider now we can let you in on a secret - the next version of this thing is gonna be, like, wow! Boom shanka! Everyone else will be so far behind!

You know, when you put it like that, it kind of sounds like Scientology...

in other words, a hammer "got absolutely wrecked" by a handsaw in a board-halving competition

When all you have (or you try to convince others that all they need) is a hammer, everything looks like a nail. I guess this shows that it isn't.

Clearly you didn't swing the hammer hard enough

One of those Fisher-Price plastic hammers with the hole in the handle?

What happens if you ask ChatGPT to code you a chess AI though?

It doesn’t work without 200 hours of un-fucking

It probably consumes as much energy as a family house for a day just to come up with that program. That's what happens.

In fact, I did a Google search and didn't have any choice but to have an "AI" answer, even if I don't want it. Here's what it says:

Each ChatGPT query is estimated to use around 10 times more electricity than a traditional Google search, with a single query consuming approximately 3 watt-hours, compared to 0.3 watt-hours for a Google search. This translates to a daily energy consumption of over half a million kilowatts, equivalent to the power used by 180,000 US households. 

Average daily energy consumption for a family in the US is said to be around 30.000 wh per day.

That would be about 10.000 chatgpt queries per day to equal that.

To have more references, average energy consumption of an hour playing a AAA computer game can easily be 600-1000 wh. Depending on the graphic card.

That must be why Google’s greenhouse emissions went up 50% in five years. ChatGPT's legendary efficiency.

Keep defending those power wasting glorified autocomplete. In no way are we doomed as a species.

We can just continue tu pump more and more into the air. "AI" will surely find a solution for that anyway.

Google is not related with chatgpt. Chatgpt parent company is openAI which is a competitor with google.

A more rational explanation is that technology and digital services on general have been growing and are on the rise. Both because more and more complex services are being offered, and more importantly more people are requesting those services. Whole continents that used not to be cover by digital services are now covered. Generative AI is just a very small part of all that.

The best approach to reduce CO2 emissions is to ask for a reduction in human population. From my point of view is the only rational approach, as with a growing population there's only to solutions, pollute until we die, or reduce quality of life until life is not worth living. Reducing population allows for fewer people to live better loves without destroying the planet.

clop - clop - clop - clop - clop - clop

. . .

*bloop*

. . .

[screen goes black for 20 minutes]

. . .

Hmmmmm.

clop - clop - clop - clop - clop - clop - clop - clop - clop - clop

*bloop*

Little disappointed more people didn’t get this.

Hey I don't mean to ruin your day, but maybe you should Google what you just commented...

There is 100% no chance google knows what that is

Comments from other communities

Anyone even believing that a generic word auto completer would beat classic algorithms wherever possible probably belongs into a psychiatry.

There are a lot of people out there that think LLM's are somehow reasoning. Even reasoning models aren't really doing it. It important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.

It is important to do demonstrations like this in the hopes that the general public will understand the limitations of this tech.

THIS is the thing. The general public's perception of ChatGPT is basically whatever OpenAI's marketing department tells them to believe, plus their single memory of that one time they tested out ChatGPT and it was pretty impressive. Right now, OpenAI is telling everyone that they are a few years away from Artificial General Intelligence. Tests like this one demonstrate how wrong OpenAI is in that assertion.

It's almost as bad as the opposition's comparison of it to Skynet. People are never going to understand technology without applying some fucking nuance.

Stop hyping new technology... in either direction.

I think the problem is that, while the model isn't actually reasoning, it's very good at convincing people it actually is.

I see current LLMs kinda like an RPG character build with all ability points put into Charisma. It's actually not that good at most tasks, but it's so good at convincing people that they start to think it's actually doing a great job.

But the general public (myself included) doesn’t really understand how our own reasoning happens.

Does anyone, really? i.e., am I merely a meat computer that takes in massive amounts of input over a lifetime, builds internal models of the world, tests said models through trial-and-error, and outputs novel combinations of data when said combinations are useful for me in a given context in said world?

Is what I do when I “reason” really all that different from what an LLM does, fundamentally? Do I do more than language prediction when I “think”? And if so, what is it?

This is definitely part of the issue, not sure why people are downvoting this. That's also why tests like this are important, to illustrate that thinking in the way we know it isn't happening in these models.

not sure why people are downvoting this

downvotes are not allowed on beehaw fyi

Downvotes aren't federated but you still see all the downvotes sent from just your own instance

Interesting. I figured since this post is in a Beehaw community they would be invisible to everyone, but good to know.

We understand reasoning enough to know humans (and other animals with complex brains) reason in a way that LLMs cannot.

While our reasoning also works with pattern matching it incorporates immeasurably more signals than language - language is almost peripheric to it even in humans. And more importantly we experience things, everything we do acts as a small training round not just in language but on every aspect of the task we are performing, and gives us a miriad of patterns to match later.

Until AI can match a fragment of this we are not going to have an AGI. And for the experience aspect there's no economic incentive under capitalism to achieve, if it happens it will come out of an underfunded university.

I think I remember some doge goon asking online about using an LLM to parse JSON. Many people don't understand things.

Jesus Christ software’s about to get far, far worse innit?

For us? Not as much, luckily most have the sentiment of rejecting anything LLM made and supported. But externals still have a lot of impact unfortunately, just ask @bagder@mastodon.social

That’s too much critical thinking for most people

A simple calculator will also beat it at math.

Atari game programmed to know chess moves: knight to B4

Chat-GPT: many Redditors have credited Chesster A. Pawnington with inventing the game when he chased the queen across the palace before crushing the king with a castle tower. Then he became the king and created his own queen by playing "The Twist" and "Let's Twist Again" at the same time.

This article makes ChatGPT sound like a deranged blowhard, blaming everything but its own ineptitude for its failure.

So yeah, that tracks.

In a quite unexpected turn of events, it is claimed that OpenAI’s ChatGPT “got absolutely wrecked on the beginner level” while playing Atari Chess.

Who the hell thought this was "unexpected"?

What's next? ChatGPT vs. Microwave to see which can make instant oatmeal the fastest? 😂

Considering how much heat the servers probably generate, ChatGPT might have a decent chance in that competition 😁

This article buries the lede so much that many readers probably miss it completely: the important takeaway here, which is clearer in *The Register's* version of the story, is that ChatGPT **cannot actually play chess**:

“Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were."

To actually use an LLM as a chess engine without the kind of manual intervention that this person did, you would need to combine it with some other software to automate continuing to ask it for a different next move every time it suggests an invalid one. And, if you did that, it would still mostly lose, even to much older chess engines than Atari's Video Chess.

edit: i see now that numerous people have done this; you can find many websites where you can "play chess against chatgpt" (which actually means: with chatgpt and also some other mechanism to enforce the rules). and if you know how to play chess you should easily win :)

You probably could train an AI to play chess and win, but it wouldn't be an LLM.

In fact, let's go see...

  • Stockfish: Open-source and regularly ranks at the top of computer chess tournaments. It uses advanced alpha-beta search and a neural network evaluation (NNUE).

  • Leela Chess Zero (Lc0): Inspired by DeepMind’s AlphaZero, it uses deep reinforcement learning and plays via a neural network with Monte Carlo tree search.

  • AlphaZero: Developed by DeepMind, it reached superhuman levels using reinforcement learning and defeated Stockfish in high-profile matches (though not under perfectly fair conditions).

Hmm. neural networks and reinforcement learning. So non-LLM AI.

you can play chess against something based on chatgpt, and if you're any good at chess you can win

You don't even have to be good. You can just flat out lie to ChatGPT because fiction and fact are intertwined in language.

"You can't put me in check because your queen can only move 1d6 squares in a single turn."

Isn’t this kind of like ridiculing that same Atari for not being able to form coherent sentences? It’s not all that surprising that a system not designed to play chess loses to a system designed specifically for that purpose.

Pretty much, but the marketers are still trying to tell people it can totally do logic anyway. Hopefully the apple paper opens some eyes

A PE teacher got absolutely wrecked by a former Olympic sprinter at a sprint competition.

Change "PE teacher" to "stack of health magazines" and it's a more accurate equivalence.

Well... yeah. That's not what LLMs do. That's like saying "A leafblower got absolutely wrecked by 1998 Dodge Viper in beginner's drag race". It's only impressive if you don't understand what a leafblower is.

People write code with LLMs. Programming language is just a language specialised at precise logic. That’s what „AI” is advertised to be good at. How can you do that an not the other?

It's not very good at it though, if you've ever used it to code. It automates and eases a lot of mundane tasks, but still requires a LOT of supervision and domain knowledge to not have it go off the rails or hallucinate code that's either full of bugs or will never work. It's not a "prompt and forget" thing, not by a long shot. It's just an easier way to steal code it picked up from Stackoverflow and GitHub.

Me as a human will know to check how much data is going into a fixed size buffer somewhere and break out of the code if it exceeds it. The LLM will have no qualms about putting buffer overflow vulnerabilities all over your shit because it doesn't care, it only wants to fulfill the prompt and get something to work.

I’m not saying it’s good at coding, I’m saying it’s specifically advertised as being very good at it.

"Precise logic" is specifically what AI is not any good at whatsoever.

AI might be able to write a program that beats an A2600 in chess, but it should not be expected to win at chess itself.

I shall await the moment when AI pretends to be as confident about communicating not being able to do something as it is with the opposite because it looks like it’s my job somehow.

Yeah, LLMs seem pretty unlikely to do that, though if they figure it out that would be great. That's just not their wheelhouse. You have to know enough about what you're attempting to ask the right questions and recognize bad answers. The thing you're trying to do needs be within your reach without AI or you are unlikely to be successful.

I think the problem is more the over-promising what AI can do (or people who don't understand it at all making assumptions because it sounds human-like).

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever

Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as "AI" and attributing every ML win ever to "AI".

ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that "AI helps cure cancer", it makes it sound like it was a lone researcher who spent a few minutes engineering the right prompt for Copilot.

Yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it "AI" and bundling it together with the latest Gemini or Claude iteration's "reasoning capabilities" is intentionally misleading. That's why articles like this one are needed. ML is a useful tool but far from the "super-human general intelligence" that is meant to replace half of human workers by the power of wishful prompting

Yeah its like judging how great a fish is at climbing a tree. But it does show that it's not real intelligence or reasoning

Don't call my fish stupid.

Well, can it climb trees?

An LLM is a poor computational/predictive paradigm for playing chess.

This just in: a hammer makes a poor screwdriver.

LLMs are more like a leaf blower though

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).

I'm impressed, if that's true! In general, an LLM's training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.

Oh yes, cost of training are ofc a great loss here, it's not optimized at all, and it's stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you're interested.

Absolutely interested. Thank you for your time to share that.

My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don't really mess about with fine-tuning LLMs.

However, I do self-host my own LLMs as code assistants. Thus, I'm only tangentially involved with the current LLM craze.

But it does interest me, nonetheless!

Here is the main blog post that i remembered : it has a follow up, a more scientific version, and uses two other articles as a basis, so you might want to dig around what they mention in the introduction.

It is indeed a quite technical discovery, and it still lacks complete and wider analysis, but it is very interesting for the fact that it kinda invalidates the common gut feeling that llms are pure lucky random.

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.

Yes, I agree wholeheartedly with your clarification.

My career path, as I stated in a different comment in regards to neural networks, is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.

Thus, large language models are well out of my area of expertise in terms of the architecture of their models.

However, fundamentally it boils down to the fact that the specific large language model used was designed to predict text and not necessarily solve problems/play games to "win"/"survive".

(I admit that I'm just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to practice explaining to laymen and, dare I say, clients. It helps me feel as if I don't come off too pompously when talking about this subject to others; forgive my tedium.)

Yeah, a lot of them hallucinate illegal moves.

Can ChatGPT actually play chess now? Last I checked, it couldn't remember more than 5 moves of history so it wouldn't be able to see the true board state and would make illegal moves, take it's own pieces, materialize pieces out of thin air, etc.

and still lose to stockfish even after conjuring 3 queens out of thin air lol

It can't, but that didn't stop a bunch of gushing articles a while back about how it had an ELO of 2400 and other such nonsense. Turns out you could get it to have an ELO of 2400 under a very very specific set of circumstances, that include correcting it every time it hallucinated pieces or attempted to make illegal moves.

ChatGPT must adhere honorably to the rules that its making up on the spot. Thats Dallas

There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven't tested them, but if they work, I'd say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.

As for why we need ChatGPT then when the result comes from Stockfish anyway, it's for the natural language prompts and responses.

It's not an LLM, but Stockfish does use AI under the hood and has been since 2020. Stockfish uses a classical alpha-beta search strategy (if I recall correctly) combined with a neural network for smarter pruning.

There are some engines of comparable strength that are primarily neural-network based. lc0 comes to mind. lc0 placed 2nd in the Top Chess Engine Championships in 9 out of the past 10 seasons. By comparison, Stockfish is currently on a 10-season win streak in the TCEC.

It could always play it if you reminded it of the board state every move. Not well, but at least generally legally. And while I know elites can play chess blind, the average person can't, so it was always kind of harsh to hold it to that standard and criticise it not being able to remember more than 5 moves when most people can't do that themselves.

Besides that, it was never designed to play chess. It would be like insulting Watson the Jeopardy bot for losing against the Atari chess bot, it's not what it was designed to do.

Ah, you used logic. That's the issue. They don't do that.

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.

I think the biggest problem is it's very low ability to "test time adaptability". Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it's a collaboration of more than one person. I suspect the same is true of LLMs.

Like humans are way better at answering stuff when it’s a collaboration of more than one person. I suspect the same is true of LLMs.

It is.

It's really common for non-language implementations of neural networks. If you have an NN that's right some percentage of the time, you can often run it through a bunch of copies of the NNs and take the average and that average is correct a higher percentage of the time.

Aider is an open source AI coding assistant that lets you use one model to plan the coding and a second one to do the actual coding. It works better than doing it in a single pass, even if you assign the the same model to planing and coding.

Because it doesn't have any understanding of the rules of chess or even an internal model of the game state, it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal moves, trying to move or capture pieces that don't exist, incorrectly declaring check/checkmate, or any number of nonsensical things.

Hallucinating 100% of the time 👌

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.

So they are both masters of troll chess then?

See: King of the Bridge

In this case it's not even bad prompts, it's a problem domain ChatGPT wasn't designed to be good at. It's like saying modern medicine is clearly bullshit because a doctor loses a basketball game.

I imagine the "author" did something like, "Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it."

It's not even as bad as the article claims.

Atari isn't great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186

Wouldn't surprise me if an LLM trained on records of chess moves made good chess moves. I just wouldn't expect the deployed version of ChatGPT to generate coherent chess moves based on the general text it's been trained on.

I wouldn't either but that's exactly what lmsys.org found.

That blog post had ratings between 858 and 1169. Those are slightly higher than the average rating of human users on popular chess sites. Their latest leaderboard shows them doing even better.

https://lmarena.ai/leaderboard
has one of the Gemini models with a rating of 1470. That's pretty good.

by
[deleted]

Deleted by moderator

 reply
288

AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.

Something marketed as AGI should be treated as AGI when proving it isn't AGI.

Not to help the AI companies, but why don't they program them to look up math programs and outsource chess to other programs when they're asked for that stuff? It's obvious they're shit at it, why do they answer anyway? It's because they're programmed by know-it-all programmers, isn't it.

From a technology standpoint, nothing is stopping them. From a business standpoint: hubris.

To put time and effort into creating traditional logic based algorithms to compensate for this generic math model would be to admit what mathematicians and scientists have known for centuries. That models are good at finding patterns but they do not explain why a relationship exists (if it exists at all). The technology is fundamentally flawed for the use cases that OpenAI is trying to claim it can be used in, and programming around it would be to acknowledge that.

why don't they program them to look up math programs and outsource chess to other programs when they're asked for that stuff?

They will, when it makes sense for what the AI is designed to do. For example, ChatGPT can outsource image generation to an AI dedicated to that. It also used to calculate math using python for me, but that doesn't seem to happen anymore, probably due to security issues with letting the AI run arbitrary python code.

ChatGPT however was not designed to play chess, so I don't see why OpenAI should invest resources into connecting it to a chess API.

I think especially since adding custom GPTs, adding this kind of stuff has become kind of unnecessary for base ChatGPT. If you want a chess engine, get a GPT which implements a Stockfish API (there seem to be several GPTs that do). For math, get the Wolfram GPT which uses Wolfram Alpha's API, or a different powerful math GPT.

why don't they program them

AI models aren't programmed traditionally. They're generated by machine learning. Essentially the model is given test prompts and then given a rating on its answer. The model's calculations will be adjusted so that its answer to the test prompt will be closer to the expected answer. You repeat this a few billion times with a few billion prompts and you will have generated a model that scores very high on all test prompts.

Then someone asks it how many R's are in strawberry and it gets the wrong answer. The only way to fix this is to add that as a test prompt and redo the machine learning process which takes an enormous amount of time and computational power each time it's done, only for people to once again quickly find some kind of prompt it doesn't answer well.

There are already AI models that play chess incredibly well. Using machine learning to solve a complexe problem isn't the issue. It's trying to get one model to be good at absolutely everything.

Because they’re fucking terrible at designing tools to solve problems, they are obviously less and less good at pretending this is an omnitool that can do everything with perfect coherency (and if it isn’t working right it’s because you’re not believing or paying hard enough)

Or they keep telling you that you just have to wait it out. It’s going to get better and better!

I think they're trying to do that. But AI can still fail at that lol

This is where MCP comes in. It's a protocol for LLMs to call standard tools. Basically the LLM would figure out the tool to use from the context, then figure out the order of parameters from those the MCP server says is available, send the JSON, and parse the response.

...or a simple counter to count the r in strawberry.
Because that's more difficult than one might think and they are starting to do this now.

why don't they program them to look up math programs and outsource chess to other programs when they're asked for that stuff?

Because the AI doesn't know what it's being asked, it's just a algorithm guessing what the next word in a reply is. It has no understanding of what the words mean.

"Why doesn't the man in the Chinese room just use a calculator for math questions?"

Because the LLMs are now being used to vibe code themselves.

They are starting to do this. Most new models support function calling and can generate code to come up with math answers etc

If you pay for chatgpt you can connect it with wolfrenalpha and it's relays the maths to it

I don't pay for ChatGPT and just used the Wolfram GPT. They made the custom GPTs non-paid at some point.

I don't think ai is being marketed as awesome at everything. It's got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It's a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.

What the tech is being marketed as and what it’s capable of are not the same, and likely never will be. In fact all things are very rarely marketed how they truly behave, intentionally.

Everyone is still trying to figure out what these Large Reasoning Models and Large Language Models are even capable of; Apple, one of the largest companies in the world just released a white paper this past week describing the “illusion of reasoning”. If it takes a scientific paper to understand what these models are and are not capable of, I assure you they’ll be selling snake oil for years after we fully understand every nuance of their capabilities.

TL;DR Rich folks want them to be everything, so they’ll be sold as capable of everything until we repeatedly refute they are able to do so.

I think in many cases people intentionally or unintentionally disregard the time component here. Ai is in development. I think what is being marketed here, just like in the stock market, is a piece of the future. I don't expect the models I use to be perfect and not make mistakes, so I use them accordingly. They are useful for what I use them for and I wouldn't use them for chess.
I don't expect that laundry detergent to be just as perfect in the commercial either.

Marketing does not mean functionality. AI is absolutely being sold to the public and enterprises as something that can solve everything. Obviously it can't, but it's being sold that way. I would bet the average person would be surprised by this headline solely on what they've heard about the capabilities of AI.

I don't think anyone is so stupid to believe current ai can solve everything.

And honestly, I didn't see any marketing material that would claim that.

You are both completely over estimating the intelligence level of "anyone" and not living in the same AI marketed universe as the rest of us. People are stupid. Really stupid.

I don't understand why this is so important, marketing is all about exaggerating, why expect something different here.

The Zoom CEO, that is the video calling software, wanted to train AIs on your work emails and chat messages to create AI personalities you could send to the meetings you're paid to sit through while you drink Corona on the beach and receive a "summary" later.

The Zoom CEO, that is the video calling software, seems like a pretty stupid guy?

Yeah. Yeah, he really does. Really.. fuckin'.. dumb.

Same genius who forced all his own employees back into the office. An incomprehensibly stupid maneuver by an organization that literally owes its success to people working from home.

Really then why are they cramming AI into every app and every device and replacing jobs with it and claiming they’re saving so much time and money and they’re the best now the hardest working most efficient company and this is the future and they have a director of AI vision that’s right a director of AI vision a true visionary to lead us into the promised land where we will make money automatically please bro just let this be the automatic money cheat oh god I’m about to

Those are two different things.

1) they are craming ai everywhere because nobody wants to miss the boat and because it plays well in the stock market.

2) the people claiming it's awesome and that they are doing I don't know what with it, replacing people are mostly influencers and a few deluded people.

Ai can help people in many different roles today, so it makes sense to use it. Even in roles that is not particularly useful, it makes sense to prepare for when it is.

it makes sense to prepare for when it is.

Pfft, okay.

Most people do. It's just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.

Yet even on Lemmy people can't seem to make sense of these terms and are saying things like "LLM's are not AI"

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like "can chatgpt prove the Riemann hypothesis"

Even the models that pretend to be AGI are not. It's been proven.

Google Maps doesn't pretend to be good at chess. ChatGPT does.

A toddler can pretend to be good at chess but anybody with reasonable expectations knows that they are not.

Plot twist: the toddler has a multi-year marketing push worth tens if not hundreds of millions, which convinced a lot of people who don't know the first thing about chess that it really is very impressive, and all those chess-types are just jealous.

Have you tried feeding the toddler gallons of baby-food? Maybe then it can play chess

They’ve been feeding the toddler everybody else’s baby food and claiming they have the right to.

"If we have to ask every time before stealing a little baby food, our morbidly obese toddler cannot survive"

You're not wrong, but keep in mind ChatGPT advocates, including the company itself are referring to it as AI, including in marketing. They're saying it's a complete, self-learning, constantly-evolving Artificial Intelligence that has been improving itself since release... And it loses to a 4KB video game program from 1979 that can only "think" 2 moves ahead.

In all fairness. Machine learning in chess engines is actually pretty strong.

AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).

https://www.chess.com/terms/alphazero-chess-engine

Sure, but machine learning like that is very different to how LLMs are trained and their output.

OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.

https://openai.com/index/planning-for-agi-and-beyond/

https://openai.com/index/elon-musk-wanted-an-openai-for-profit/

Not to even mention all the hype created by the techbros around it.

I think that’s generally the point is most people thing chat GPT is this sentient thing that knows everything and… no.

Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.

Okay I maybe exaggerated a bit, but a lot of people think it actually knows things, or is actually smart. Which… it’s not… at all. It’s just pattern recognition. Which was I assume the point of showing it can’t even beat the goddamn Atari because it cannot think or reason, it’s all just copy pasta and pattern recognition.

Articles like this are good because it exposes the flaws with the ai and that it can't be trusted with complex multi step tasks.

Helps people see that think AI is close to a human that its not and its missing critical functionality

by
[deleted]

Deleted by moderator

 reply
4

People already think chatGPT is a general AI. We need more articles like this showing is ineffectiveness at being intelligent. Besides it helps find a limitations of this technology so that we can hopefully use it to argue against every single place

I agree with your general statement, but in theory since all ChatGPT does is regurgitate information back and a lot of chess is memorization of historical games and types, it might actually perform well. No, it can't think, but it can remember everything so at some point that might tip the results in it's favor.

Regurgitating an impression of, not regurgitating verbatim, that's the problem here.

Chess is 100% deterministic, so it falls flat.

I'm guessing it's not even hard to get it to "confidently" violate the rules.

I mean, open AI seem to forget it isn’t.

I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.

LLM are not built for logic.

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.

A lot of writing code is relatively standard patterns and variations on them. For most but the really interesting parts, you could probably write a sufficiently detailed description and get an LLM to produce functional code that does the thing.

Basically for a bunch of common structures and use cases, the logic already exists and is well known and replicated by enough people in enough places in enough languages that an LLM can replicate it well enough, like literally anyone else who has ever written anything in that language.

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like "does this language want join as a method on a list with a string argument, or vice versa?"

Problem is this can be sometimes more annoying than it's worth, as miscompletions are annoying.

Fair point.

I liked the "upgraded autocompletion", you know, an completion based on the context, just before the time that they pushed it too much with 20 lines of non sense...

Now I am thinking of a way of doing the thing, then I receive a 20 lines suggestion.

So I am checking if that make sense, losing my momentum, only to realize the suggestion us calling shit that don't exist...

Screw that.

The amount of garbage it spits out in autocomplete is distracting. If it's constantly making me 5-10% less productive the many times it's wrong, it should save me a lot of time when it is right, and generally, I haven't found it able to do that.

Yesterday I tried to prompt it to change around 20 call sites for a function where I had changed the signature. Easy, boring and repetitive, something that a junior could easily do. And all the models were absolutely clueless about it (using copilot)

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.

I'd say that those details that vary tend not to vary within a language and ecosystem, so a fairly dumb correlative relationship is enough to generally be fine. There's no way to use logic to infer that it's obvious that in language X you need to do mylist.join(string) but in language Y you need to do string.join(mylist), but it's super easy to recognize tokens that suggest those things and a correlation to the vocabulary that matches the context.

Rinse and repeat for things like do I need to specify type and what is the vocabulary for the best type for a numeric value, This variable that makes sense is missing a declaration, does this look to actually be a new distinct variable or just a typo of one that was declared.

But again, I'm thinking mostly in what kind of sort of can work, my experience personally is that it's wrong so often as to be annoying and get in the way of more traditional completion behaviors that play it safe, though with less help particularly for languages like python or javascript.

All these comments asking "why don't they just have chatgpt go and look up the correct answer".

That's not how it works, you buffoons, it trains off of datasets long before it releases. It doesn't think. It doesn't learn after release, it won't remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don't understand that it generates statistical representations of an answer based on answers given in the past.

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.

is like using a power tool as a table leg.

Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.

And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.

If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.

It's not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they're being applied to far more use cases than they are likely to be good at -- either because they're overhyped or our corporate lords and masters are just experimenting to find out what they're good at and what not. Some of these cases will be like chess, but others will be like code*.

(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)

I believe LLMs are good at some things

The problem is that they're being used for all the things, including a large number of tasks that thwy are *not* well suited to.

yeah, we agree on this point. In the short term it's a disaster. In the long-term, assuming AI's capabilities don't continue to improve at the rate they have been, our corporate overlords will only replace people for whom it's actually worth it to them to replace with AI.

I'm often impressed at how good chatGPT is at generating text, but I'll admit it's hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate

ChatGPT is playing Anarchy Chess

Yeah! I’ve loved watching Gothem Chess’ videos on these. Always have been good for a laugh.

It can be bad at the very thing it's designed to do. It can repeat phrases often, something that isn't great for writing. But why wouldn't it, it's all about probability so common things said will pop up more unless you adjust the variables that determine the randomness.

I mean, that 2600 Chess was built from the ground up to play a good game of chess with variable difficulty levels. I bet there's days or games when Fischer couldn't have beaten it. Just because a thing is old and less capable than the modern world does not mean it's bad.

A strange game. How about a nice game of Global Thermonuclear War?

No thank you. The only winning move is not to play

I've heard the only way to win is to lock down your shelter and strike first.

Lmao! 🤣 that made me spit!!

Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

There was a chess game for the Atari 2600? :O

I wanna see them W I D E pieces.

by
[deleted]

Deleted by moderator

 reply
18

I bet that's a slightly unfair representation of what it actually looked like. Graphics back then were purposely designed for how they would look on CRT tvs which add a lot of specific distortions to images. So taking a screenshot of a game running in an emulator without using a high quality crt filter added to the image will be a very untrue representation of what the game actually looked like.

(Don't get me wrong, I'm not saying it actually looked great when displayed correctly, but i am saying it would've looked considerably better than this emulator screenshot)

Those are some funky looking knights lol

There's some very odd pieces on high dollar physical chess sets too.

I'm annoyed the pieces are bottom adjusted...

Can confirm.

And if you play it on expert mode, you can leave for college and get your degree before it’s your turn again.

Here you go (online emulator): https://www.retrogames.cz/play_716-Atari2600.php

WTF? I just played that just long enough for my queen to take over their queen, and it turned my queen into a rook?

Is that even a legit rule in any variation of chess rules?

I wasn't aware of that either, now I'm kinda curious to try to find it in my 512 Atari 2600 ROMs archive..

I suppose it's an interesting experiment, but it's not that surprising that a word prediction machine can't play chess.

Because people want to feel superior because they don't know how to use a ChatBot can count the number of "r"s in the word "strawberry", lol

Yeah, just because I can't count the number of r's in the word strawberry doesn't mean I shouldn't be put in charge of the US nuclear arsenal!

That is more a failure of the person who made that decision than a failing of ChatBots, lol

Agreed, which is why it's important to have articles out in the wild that show the shortcomings of AI. If all people read is all the positive crap coming out of companies like OpenAI then they will make stupid decisions.

Anyone who puts a chatbot anywhere is definitely a failure, yeah.

2025 Mazda MX-5 Miata 'got absolutely wrecked' by Inflatable Boat in beginner's boat racing match — Mazda's newest model bamboozled by 1930s technology.

ChatGPT has been, hands down, the worst AI coding assistant I've ever used.

It regularly suggests code that doesn't compile or isn't even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

I don't use it for coding. I use it sparingly really, but want to learn to use it more efficiently. Are there any areas in which you think it excels? Are there others that you'd recommend instead?

Use Gemini (2.5) or Claude (3.7 and up). OpenAI is a shitshow

I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it

I've found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.

Still not perfect, but night and day difference.

I feel like ChatGPT didn't focus on coding and instead focused on mainstream, but I am not an expert.

Gemini will get basic C++, probably the best documented language for beginners out there, right about half of the time.

I think that might even be the problem, honestly, a bunch of new coders post bad code and it's fixed in comments but the LLM CAN'T realize that.

I like tab coding, writing small blocks of code that it thinks I need. Its On point almost all the time. This speeds me up.

Bingo. If anything what you're finding is the people bitching are the same people that if given a bike wouldn't know how to ride it, which is fair. Some people understand quicker how to use the tools they are given.

Edit - a poor carpenter blames his tools.

It's the ideal help for people who shouldn't be employed as programmers to start with.

I had to explain hexadecimal to somebody the other day. It's honestly depressing.

my favorite thing is to constantly be implementing libraries that don't exist

You're right. That library was removed in ToolName [PriorVersion]. Please try this instead.

*makes up entirely new fictitious library name*

Oh man, I feel this. A couple of times I've had to field questions about some REST API I support and they ask why they get errors when they supply a specific attribute. Now that attribute never existed, not in our code, not in our documentation, we never thought of it. So I say "Well, that attribute is invalid, I'm not sure where you saw to do that". They get insistent that the code is generated by a very good LLM, so we must be missing something...

It's even worse when AI soaks up some project whose APIs are constantly changing. Try using AI to code against jetty for example and you'll be weeping.

All AIs are the same. They're just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They're super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.

I've used agents for implementing entire APIs and front-ends from the ground up with my own customizations and nuances.

I will say that, for my pedantic needs, it typically only gets about 80-90% of the way there so I still have to put fingers to code, but it definitely saves a boat load of time in those instances.

One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:

  • It left database secrets in the code
  • The design of the website meant that it was impossible to operate securely
  • The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
  • It did not break the code into multiple files. It piled everything into a single file

That's because it doesn't know what it's saying. It's just blathering out each word as what it estimates to be the likely next word given past examples in its training data. It's a statistics calculator. It's marginally better than just smashing the auto fill on your cell repeatedly. It's literally dumber than a parrot.

Parrots are actually intelligent though.

Yeah, but not when it comes to understanding human speech. There's a reason that repeating words without really understanding them is called parroting. Gray parrots are the smartest and some can actually understand language a little bit, making them smarter than chat, which is just high tech guessing without comprehension

I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself

I wouldn’t use it to generate stuff though

Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.

That's exactly what thinking is, though.

An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It's like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness ("temperature") when choosing the next token so the responses are not identical each time.

But it is not thinking. Not even remotely so. It's a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.

ollama run gemma3:4b
>>> /set parameter temperature 0
>>> what is a leaf

You will get the same answer every single time.

I know what an LLM is doing. You don't know what your brain is doing.

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."

Well, the first and obvious thing to do to show that AI is bad is to show that AI is bad. If it provides that much of a low-hanging fruit for the demonstration... that just further emphasizes the point.

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.

It's also from a company claiming they're getting closer to create morphing shape that can match any hole.

And yet the company offers no explanation for how, exactly, they're going to get wood to do that.

You get 2 triangles in a single square mate...

CHECKMATE!

The press release where OpenAI said we'd never need chess players again

That's just clickbait in general these days lol

Next, pit ChatGPT against 1K ZX Chess in a ZX81.

If you don't play chess, the Atari is probably going to beat you as well.

LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says "this answer is better than that one" enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.

If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.

It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn't compete well with a serious chess solver.

This made my day

Get your booty on the floor tonight.

Okay, but could ChatGPT be used to vibe code a chess program that beats the Atari 2600?

no.

the answer is always, no.

The answer might be no today, but always seems like a stretch.

The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!

Neither of those things are marketed as being artificially intelligent.

Marketers aren't intelligent either, so I see no reason to listen to them.

You’re not going to slimeball investors out of three hundred billion dollars with that attitude, mister.

Are either of those marketed as powerful AI?

It's not that hard to beat dumb 6 year old who's only purpose is mine your privacy to sell you ads or product place some shit for you in future.

You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

No, more like "Your marketing team, sales team, the news media at large, and random hype men all insist your orange machine works amazing on any fruit if you know how to use it right. It didn't work my strawberries when I gave it all the help I could, and was outperformed by my 40 year old strawberry machine. Please stop selling the idea it works on all fruit."

This study is specifically a counter to the constant hype that these LLMs will revolutionize absolutely everything, and the constant word choices used in discussion of LLMs that imply they have reasoning capabilities.

this is because an LLM is not made for playing chess

Is anyone actually surprised at that?

This isn't the strength of gpt-o4 the model has been optimised for tool use as an agent. That's why its so good at image gen relative to other models it uses tools to construct an image piece by piece similar to a human. Also probably poor system prompting. A LLM is not a universal thinking machine its a a universal process machine. An LLM understands the process and uses tools to accomplish the process hence its strengths in writing code (especially as an agent).

Its similar to how a monkey is infinitely better at remembering a sequence of numbers than a human ever could but is totally incapable of even comprehending writing down numbers.

Do you have a source for that re:monkeys memorizing numerical sequences? What do you mean by that?

Llms useless confirmed once again

So, it fares as well as the average schmuck, proving it is human

/s

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.

Edit: When comparing reasoning models to existing algorithmic solutions.

Deleted by moderator

 reply
-8

If you're writing a novel simulation for a non-trivial system, it might be best to learn to code so you can identify any issues in the simulation later. It's likely that LLMs do not have the information required to generate good code for this context.

You're right. I'm not relying on this shit. It's a tool. Fucking up the gui is fine, but making any changes I don't research to my simulator core could fuck up my whole project. It's a tool that likes to cater to you, and you have to work around that - really, not too different from how much pressure you put on a grinder. You gotta learn how to work it. And, you're sentiment is correct. My lack of programming experience is a big hurdle I have to account for and make safeguards against. It would be a huge help if I started from the basics. But, I mean, I also can't rub two sticks together to heat my home. Doesn't mean I can't use this tool to produce reliable results.

The tough guys and sigma males of yester-year used to say things like "If I were homeless, I would just bathe in the creek using the natural animal fats from the squirrel I caught for dinner as soap, win a new job by explaining my 21-days-in-7 workweek ethos, and buy a new home using my shares in my dad's furniture warehouse as collateral against the loan. It's not impossible to get back on your feet."

But with the advent of AI, which, actually, is supposed to do things for you, it's completely different now.

I also can't rub two sticks together to heat my home.

Dude, that fucking sucks. What is wrong with you?

You're so fucking silly. You gonna study cell theory to see how long you should keep vegetables in your fridge? Go home. Save science for people who understand things.

Save science for people who understand things.

Does this not strike you as the least bit ironic?

Isn't the Atari just a game console, not a chess engine?

Like, Wikipedia doesn't mention anything about the Atari 2600 having a built-in chess engine.

If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.

Like this, it's just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn't. No matter what you think of ChatGPT, that's not a fair comparison.


Edit: Given the existing replies and downvotes, I think this comment is being misunderstood. I would like to try clarifying again what I meant here.

First of all, I'd like to ask if this article is satire. That's the only way I can understand the replies I've gotten that critized me on grounds of the marketing aspect of LLMs (when the article never brings up that topic itself, nor did I). Like, if this article is just some tongue in cheek type thing about holding LLMs to the standards they're advertised at, I can understand both the article and the replies I've gotten. But the article never suggests so itself. So my assumption when writing my comment was that this is not the case and it is serious.

The Atari is hardware. It can't play chess on its own. To be able to, you need a game for it which is inserted. Then the Atari can interface with the cartridge and play the game.

ChatGPT is an LLM. Guess what, it also can't play chess on its own. It also needs to interface with a third party tool that enables it to play chess.

Neither the Atari nor ChatGPT can directly, on their own, play chess. This was my core point.

I merely pointed out that it's unfair that one party in this comparison is given the tool it needs (the cartridge), but the other party isn't.
Unless this is satire, I don't see how marketing plays a role here at all.

GPTs which claim to use a stockfish API

Then the actual chess isn't LLM. If you are going stockfish, then the LLM doesn't add anything, stockfish is doing everything.

The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as "reasoning" models, which are roughly "similar to 'pre-reasoning', but forcing use of more tokens on disposable intermediate generation steps". With this facet of LLM marketing, the promise would be that the LLM can "reason" itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn't even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.

Then the actual chess isn't LLM.

And neither did the Atari 2600 win against ChatGPT. Whatever game they ran on it did.

That's my point here. The fact that neither Atari 2600 nor ChatGPT are capable of playing chess on their own. They can only do so if you provide them with the necessary tools. Which applies to both of them. Yet only one of them was given those tools here.

Fine, a chess engine that is capable of running with affordable even for the time 1970s electronics will best what marketing folks would have you think is an arbitrarily capable "reasoning" model running on top of the line 2025 hardware.

You can split hairs about "well actually, the 2600 is hardware and a chess engine is the software" but everyone gets the point.

As to assertions that no one should expect an LLM to be a chess engine, well tell that to the industry that is asserting the LLMs are now "reasoning" and provides a basis to replace most of the labor pool. We need stories like this to calibrate expectations in a way common people can understand..

The Atari 2600 is just hardware. The software came on plug-in cartridges. Video Chess was released for it in 1979.

I can't help but think of this:

Conversation with ChatGPT version 3.5. Human: "Best chess player of all time whose name starts with B". ChatGPT: "The greatest chess player of all time whose name starts with 'B' is Bagnus Barlsen."

(There’s a guy named “Magnus Carlsen” who is arguably the best chess player of all time.)

ChatGPT, thinking to itself: "Got'em!"

Edit, because it's a personal peeve: anthropomorphizing statistical models is silly, they don't think nor feel, but I thought this was funny.

However, an area where "AI" can beat Atari is is on energy consumption. "AI" will consume much more energy only to be mostly wrong. What a feat!

The news flow regarding artificial intelligence seems to swing between extremes. Sometimes AI can astound with its capabilities, and other times it might be laughable, or even dangerously inadequate.

It's funny how all the news outlets (incl. this one) go for the sensational take rather than this somber one. They should just rewrite this paragraph every time and not tack it on at the end.