Brutkey

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

You are assuming that you are allowed to. Analogies are useful only if they represent the thing that they are being used to explain.

What are the services that you are selling? If you memorise large chunks of existing web sites and store that information in your brain, that's permitted, but if you then reproduce them verbatim or in a way that is a derived work, then you are not.

Here's a couple of alternate versions:

I'm not allowed to download everything from the web, create a database of a compressed form of it, and then sell access to that database. Why should I be allowed to if the database is a machine-learning model?

I'm not allowed to download a film and recompress it with a lossy compression algorithm, why should I be allowed to if the lossy compression algorithm is a machine-learning model?


Erik Jonker
@ErikJonker@mastodon.social

@david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca ...for strictly personal use, downloading/using/analysing public data is allowed, at least in the EU. Selling ofcourse not.

Erik Jonker
@ErikJonker@mastodon.social

@david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca ...in the end it is for judges to decide but i find it completely irrational that it should not be allowed to what i can read publicly on the internet, process/analyse with algorithms for my personal (!) use. People with disabilities also use machine readers, we allow those also.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

I can read a blog post, because it is placed on the Internet with a license that permits me to read it.

I can quote from it if I write something else, because that is fair use.

If I download a copy and store it on my computer, that is may not technically be allowed, but if I do not distribute it then there are no statutory damages and it would be hard to argue actual damages, so on one cares.

If I extract a substantial portion of it into some other work, that new work is a derived work of the original and I just have a license to create it.

If I create a new post having read some original and it is clearly (as judged by a court) a copy of the original then I require a license. For example, Paramount grants an explicit license for non-commercial fan fiction set in the Star Trek universe because it would be illegal to distribute otherwise. This is why Axanar was unable to be completed: they tried to make it a commercial project and it was not covered by the non-commercial license grant. There is an explicit exemption here for parody, but it must be a parody
of the thing being copied.

All of this is well established in copyright law.

The claim by LLM trainers is that training an LLM is creating a new creative work (if it is not, then the weights cannot be covered by copyright), but that it is not a derived work of the original. The last part of this claim is tenuous because LLMs can generate this that are identical to the originals and they can also output things that would be regarded as a derived work by the court.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

I can read a blog post, because it is placed on the Internet with a license that permits me to read it.

I can quote from it if I write something else, because that is fair use.

If I download a copy and store it on my computer, that is may not technically be allowed, but if I do not distribute it then there are no statutory damages and it would be hard to argue actual damages, so on one cares.

If I extract a substantial portion of it into some other work, that new work is a derived work of the original and I just have a license to create it.

If I create a new post having read some original and it is clearly (as judged by a court) a copy of the original then I require a license. For example, Paramount grants an explicit license for non-commercial fan fiction set in the Star Trek universe because it would be illegal to distribute otherwise. This is why Axanar was unable to be completed: they tried to make it a commercial project and it was not covered by the non-commercial license grant. There is an explicit exemption here for parody, but it must be a parody
of the thing being copied.

All of this is well established in copyright law.

The claim by LLM trainers is that training an LLM is creating a new creative work (if it is not, then the weights cannot be covered by copyright), but that it is not a derived work of the original. The last part of this claim is tenuous because LLMs can generate this that are identical to the originals and they can also output things that would be regarded as a derived work by the court.