Thread | Brutkey

I had the Blueberry talk with GPT5. https://kieranhealy.org/blog/archives/2025/08/07/blueberry-hill/

@kjhealy@mastodon.social

The funny thing is that models that are orders of magnitude smaller than GPT5 get this question right. I'll paste in some responses below. They're sorted by size, starting with the smallest. The only outliers are codegemma, which is hopelessly and persistently wrong, and Deepseek-R1, which gets the answer right but endlessly second-guesses itself.

🧵

ekari
@ekari@ieji.de

@kjhealy@mastodon.social Shit. GPT5 is actually Shodan, that's why it's counting letters funny.

"L-l-l-look at you, hacker..."
"B-blueberry"

We cannot challenge a perfect immortal machine.

Noodlemaz
@noodlemaz@med-mastodon.com

@kjhealy@mastodon.social now why can't people imagine this problem in everything else you prompt it for
It doesn't do facts
It doesn't reproduce info
It's just a probability machine that makes linguistic sense. People need to stop trusting it yesterday

And ideally stop using it.

Sablebadger
@Sablebadger@dice.camp

@kjhealy@mastodon.social The secret is.... it's not intelligent. It's not intelligence artificial or otherwise. It's a big giant spreadsheet of connections that allows it to make references between commonly aligned things. it doesn't "know" anything.

Tech bros keep claiming otherwise, but it's just a bunch of connections. it can do some pretty cool things for sure, I use it at work from time to time, but I don't pretend it's thinking real thoughts.

The hype is what is killing us. Hype for dollars.

Le Monolecte 😷

🤬

🐧

:oc:
@Monolecte@framapiaf.org

@kjhealy@mastodon.social

L’IA conversationnelle est juste un gros beauf inculte, menteur et évidemment totalement sûr de lui dans sa mauvaise foi.

On voit qui sont les modèles.

themushroomsound
@themushroomsound@mstdn.social

@kjhealy@mastodon.social

gallaugher
@gallaugher@mastodon.world

@kjhealy@mastodon.social @b3ll@mastodon.social Curiously 4o got it right, 5 got it wrong.

sashk 🇺🇦

@sashk@masto.nyc

@kjhealy@mastodon.social well, probably i'm too lazy to type a lot of words in prompts, but it found mistake from second attempt.

CodingPanic
@codingpanic@mastodon.social

@kjhealy@mastodon.social I wonder if this was fixed? I can’t reproduce it….

Justin Derrick
@JustinDerrick@mstdn.ca

@kjhealy@mastodon.social I mean... It had to happen, especially after the strawberry incident.

How can hundreds of billions of dollars be invested in this tripe?

George Liquor, American
@liquor_american@universeodon.com

@sashk@masto.nyc @kjhealy@mastodon.social This is the way the world ends
Not with a bang but a "my bad"

Justin Derrick
@JustinDerrick@mstdn.ca

@kjhealy@mastodon.social I don't know what I was expecting when I tried this for myself with a different LLM...

A screen capture of a conversation with the o4-mini LLM from OpenAI...

Me: How many times does the letter b appear in the word 'blueberry'?

o4-mini: The letter 2 appears in “blueberry.”

I am dumbfounded by the fact that it thinks "2" is a letter.

trenchworms
@trenchworms@eldritch.cafe

@JustinDerrick@mstdn.ca @kjhealy@mastodon.social I suspect that GPT-5 has been explicitly trained to defeat the embarrassing "There are 2 R's in Strawberry", but as a result it is now trained the answer questions of the format "How many _'s are there in _____BERRY?" with "3".

The unintended side effect of this is, of course, that it will answer 3 incorrectly and if then asked to justify itself it will pattern match an attempt to do so, leading to this kind of incomprehensible nonsense. Because these things are fundamentally just matching patterns in their training data -- nothing more. That's why GPT-4 gets the blueberry question right.

Another example of this was if you asked it the question: "What weighs more, 1kg of feathers or 10kg of bricks?" -- older models would say "They weigh the same" because the majority of their training data that was of the form "What weighs more, X of feathers or X of bricks" was the riddle it's pretending to answer.