Thread | Brutkey

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

You are assuming that you are allowed to. Analogies are useful only if they represent the thing that they are being used to explain.

What are the services that you are selling? If you memorise large chunks of existing web sites and store that information in your brain, that's permitted, but if you then reproduce them verbatim or in a way that is a derived work, then you are not.

Here's a couple of alternate versions:

I'm not allowed to download everything from the web, create a database of a compressed form of it, and then sell access to that database. Why should I be allowed to if the database is a machine-learning model?

I'm not allowed to download a film and recompress it with a lossy compression algorithm, why should I be allowed to if the lossy compression algorithm is a machine-learning model?

Erik Jonker
@ErikJonker@mastodon.social

@david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca ...for strictly personal use, downloading/using/analysing public data is allowed, at least in the EU. Selling ofcourse not.

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca here's a hint

COPY right

Erik Jonker
@ErikJonker@mastodon.social

@Wearwolf@kind.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca for strictly personal, non-commercial use copyright law gives a lot of freedom.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @Wearwolf@kind.social @chris@mstdn.chrisalemany.ca

Try creating copies of music CDs or movie DVDs for strictly personal, non-commercial use and see what the EUCD says about your legal liability.

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca it's not personal, non-commercial use though

LLM work by learning what words are associated with what context and then spitting those words out again when prompted with a similar context

They are regurgitation engines. They can only spit out, produce a copy of, the content that went into them.

It would be illegal to sell collages of people's Instagram posts without their permission. LLMs are that but with people's words

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca the argument you would use is actually that it's a transformative work. The process of the content going through the machine makes it unique.

The problem there is that LLMs can't have original ideas. They can't rephrase or paraphrase. All of that transformation comes from combining your words with other people's words

So the defence is that you have stolen from so many people that it's not obvious what was stolen from whom

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca the argument you would use is actually that it's a transformative work. The process of the content going through the machine makes it unique.

The problem there is that LLMs can't have original ideas. They can't rephrase or paraphrase. All of that transformation comes from combining your words with other people's words

So the defence is that you have stolen from so many people that it's not obvious what was stolen from whom

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.