Thread | Brutkey

Someone asks them to explain what they mean, and they follow up with some mealy-mouthed acknowledgement that they are the same in principle, but LLMs are more complicated because they are the result of "more than 50 years of research in neural architecture" and because obscene amounts of compute/data are thrown at them.

It's a bit like saying that multiplying 1Mx1M matrices is orders of magnitude more complex than multiplying 10x10. (Or substitute your favorite linear algebraic operation)

SnoopJ
@SnoopJ@hachyderm.io

I would argue precisely the opposite: Markov language models are more complex because one generally means using a more corpus representation than what's done with LLMs.

I.e. when you build a Markov n-(word)gram model, you explicitly bake the concept of "word" into your representation. With LLMs, you do no such transformation and it's generally considered a faux paux to do "feature engineering" with convolutional networks in general.

The LLM's data model is less complex in this sense.

SnoopJ
@SnoopJ@hachyderm.io

The commenter also says that Markov models are a good way to talk about what LLMs are doing, but even this is something I disagree with.

You don't need to talk about another kind of model at all, you only need to establish that language has statistical patterns (I like Zipf's Law for this), and then point out that with obscene amounts of compute and data you can brute-force solutions to statistical problems. It's one of the original reasons for building computers, for crying out loud.

SnoopJ
@SnoopJ@hachyderm.io

3-panel meme, edits shown in square brackets.

1) Donnie Darko tells his therapist "I made a new [ML technique]"

2) The therapists asks "Real or [massive stochastic gradient descent?]"

3) Donnie responds "[massive stochastic gradient descent]"

SnoopJ
@SnoopJ@hachyderm.io

3-panel meme, edits in square brackets, Scooby Doo and the gang have captured a cloth ghost, onto whose hood is pasted a diagram of a deep non-recurrent neural architecture

1) Fred says "Okay gang, let's see [what deep learning] really is."

2) Fred removes the hood

3) Underneath the hood, a surface plot of a "valley" is shown. The gang says "[Convex Optimization??]"