Brutkey

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

I can read a blog post, because it is placed on the Internet with a license that permits me to read it.

I can quote from it if I write something else, because that is fair use.

If I download a copy and store it on my computer, that is may not technically be allowed, but if I do not distribute it then there are no statutory damages and it would be hard to argue actual damages, so on one cares.

If I extract a substantial portion of it into some other work, that new work is a derived work of the original and I just have a license to create it.

If I create a new post having read some original and it is clearly (as judged by a court) a copy of the original then I require a license. For example, Paramount grants an explicit license for non-commercial fan fiction set in the Star Trek universe because it would be illegal to distribute otherwise. This is why Axanar was unable to be completed: they tried to make it a commercial project and it was not covered by the non-commercial license grant. There is an explicit exemption here for parody, but it must be a parody
of the thing being copied.

All of this is well established in copyright law.

The claim by LLM trainers is that training an LLM is creating a new creative work (if it is not, then the weights cannot be covered by copyright), but that it is not a derived work of the original. The last part of this claim is tenuous because LLMs can generate this that are identical to the originals and they can also output things that would be regarded as a derived work by the court.