Thread | Brutkey

I would argue precisely the opposite: Markov language models are more complex because one generally means using a more corpus representation than what's done with LLMs.

I.e. when you build a Markov n-(word)gram model, you explicitly bake the concept of "word" into your representation. With LLMs, you do no such transformation and it's generally considered a faux paux to do "feature engineering" with convolutional networks in general.

The LLM's data model is less complex in this sense.

SnoopJ
@SnoopJ@hachyderm.io

The commenter also says that Markov models are a good way to talk about what LLMs are doing, but even this is something I disagree with.

You don't need to talk about another kind of model at all, you only need to establish that language has statistical patterns (I like Zipf's Law for this), and then point out that with obscene amounts of compute and data you can brute-force solutions to statistical problems. It's one of the original reasons for building computers, for crying out loud.

SnoopJ
@SnoopJ@hachyderm.io

3-panel meme, edits shown in square brackets.

1) Donnie Darko tells his therapist "I made a new [ML technique]"

2) The therapists asks "Real or [massive stochastic gradient descent?]"

3) Donnie responds "[massive stochastic gradient descent]"

SnoopJ
@SnoopJ@hachyderm.io

3-panel meme, edits in square brackets, Scooby Doo and the gang have captured a cloth ghost, onto whose hood is pasted a diagram of a deep non-recurrent neural architecture

1) Fred says "Okay gang, let's see [what deep learning] really is."

2) Fred removes the hood

3) Underneath the hood, a surface plot of a "valley" is shown. The gang says "[Convex Optimization??]"