@mttaggart@infosec.exchange
Brutal:
The findings across task, length, and format generalization experiments converge on a conclusion: [Chain-of-Thought reasoning] is not a mechanism for genuine logical inference but rather a sophisticated form of structured pattern matching, fundamentally bounded by the data distribution seen during training. When pushed even slightly beyond this distribution its performance degrades significantly, exposing the superficial nature of the "reasoningβ it produces.
@NosirrahSec@infosec.exchange
@mttaggart@infosec.exchange The thing is, this isn't even surprising to anyone that studies these.
It's just a simple fact. You can only force math to look like it's "thinking" on the surface, any deeper glance and it's just a mess.