Brutkey

SnoopJ
@SnoopJ@hachyderm.io

Someone asks them to explain what they mean, and they follow up with some mealy-mouthed acknowledgement that they are the same in principle, but LLMs are more complicated because they are the result of "more than 50 years of research in neural architecture" and because obscene amounts of compute/data are thrown at them.

It's a bit like saying that multiplying 1Mx1M matrices is orders of magnitude more complex than multiplying 10x10. (Or substitute your favorite linear algebraic operation)

SnoopJ
@SnoopJ@hachyderm.io

I would argue precisely the opposite: Markov language models are more complex because one generally means using a more corpus representation than what's done with LLMs.

I.e. when you build a Markov n-(word)gram model, you
explicitly bake the concept of "word" into your representation. With LLMs, you do no such transformation and it's generally considered a faux paux to do "feature engineering" with convolutional networks in general.

The LLM's data model is
less complex in this sense.


SnoopJ
@SnoopJ@hachyderm.io

The commenter also says that Markov models are a good way to talk about what LLMs are doing, but even this is something I disagree with.

You don't need to talk about another kind of model
at all, you only need to establish that language has statistical patterns (I like Zipf's Law for this), and then point out that with obscene amounts of compute and data you can brute-force solutions to statistical problems. It's one of the original reasons for building computers, for crying out loud.

SnoopJ
@SnoopJ@hachyderm.io
SnoopJ
@SnoopJ@hachyderm.io