Thread | Brutkey

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca the argument you would use is actually that it's a transformative work. The process of the content going through the machine makes it unique.

The problem there is that LLMs can't have original ideas. They can't rephrase or paraphrase. All of that transformation comes from combining your words with other people's words

So the defence is that you have stolen from so many people that it's not obvious what was stolen from whom

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.