Thread | Brutkey

@chris@mstdn.chrisalemany.ca

Confirmed, Meta has scraped my content for their AI. I have not given them permission to do this, at the same time, I did not explicitly prevent them from doing this. So I am sure by some stupid lawyer’s definition, they can use my content for whatever purpose they want.

F that.

If they want to do that, then the whole notion of “pirating video” or music, or software, or anything else that ends up in the public domain is right out the window.

On the bright side, #socialbc does not appear on the list.

#meta #copyright #personalproperty

A list of domains in a document named meta leaked list shows my do,win highlighted.

Edwin G. :mapleleafroundel:
@EdwinG@mstdn.moimeme.ca

@chris@mstdn.chrisalemany.ca I'm lucky not to be on the list. I do have a tentative block on their AI agent.

Now, I escalated this to blocking their entire Autonomous System numbers. That should block all their computers from reaching my equipment.

Bernd Paysan R.I.P Natenom 🕯

️
@forthy42@mastodon.net2o.de

@chris@mstdn.chrisalemany.ca Proof of having a valid license is up to the copyright violator.

I guess that scraping and using content for AI training will be considered fair use, because in our corrupt world, the bigger business always wins.

If you create a picture from said legally trained AI, then you'll be liable for copyright violation if it looks similar to a real photo. Bigger business wins.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@chris@mstdn.chrisalemany.ca

My profile contains the following text:

No license, implied or explicit, is granted to use any of my posts for training AI models.

If you have scraped them and used them in training any machine-learning model, this is legal only if there is an explicit law that treats doing so as fair use or similar.

Erik Jonker
@ErikJonker@mastodon.social

@chris@mstdn.chrisalemany.ca ..i am allowed to read all public websites, learn from it and sell my services as a consultant, without paying anyone from those public website a fee, why is an algorithm not allowed to do this?

Toni Aittoniemi
@gimulnautti@mastodon.green

@chris@mstdn.chrisalemany.ca I never get tired of bringing up #wilhoitslaw

“#Conservatism consists of exactly one proposition, to wit: There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.”

It’s a political philosophy that starts from the WHO should be in power, not WHAT. And rule of law is never compatible with it.

A Person
@timdesuyo@mastodon.social

@chris@mstdn.chrisalemany.ca https://nightshade.cs.uchicago.edu/whatis.html

Lstn2urmama 🇨🇦

@Lstn2urmama@mstdn.ca

@chris@mstdn.chrisalemany.ca sadly they own the places we all put anything on ....found out years ago with Twitter and those of us who used to write prose off each other's written word that all was being stolen .. was even taken the whole acct and had to open another as had no access suddenly to orginal ones

stony kark
@aapis@mastodon.world

@chris@mstdn.chrisalemany.ca FB and everyone they employ are trash humans. The only way to opt out is to stop posting, unfortunately. At least they all know how much everyone hates them, now.

Ed Wiebe
@edwiebe@mstdn.ca

@forthy42@mastodon.net2o.de @chris@mstdn.chrisalemany.ca I think it’s the other way around. The copyright holder has to defend their copyrights, pursue remedies when others use material without a license. Almost no one can afford to do that.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

You are assuming that you are allowed to. Analogies are useful only if they represent the thing that they are being used to explain.

What are the services that you are selling? If you memorise large chunks of existing web sites and store that information in your brain, that's permitted, but if you then reproduce them verbatim or in a way that is a derived work, then you are not.

Here's a couple of alternate versions:

I'm not allowed to download everything from the web, create a database of a compressed form of it, and then sell access to that database. Why should I be allowed to if the database is a machine-learning model?

I'm not allowed to download a film and recompress it with a lossy compression algorithm, why should I be allowed to if the lossy compression algorithm is a machine-learning model?

Erik Jonker
@ErikJonker@mastodon.social

@david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca ...for strictly personal use, downloading/using/analysing public data is allowed, at least in the EU. Selling ofcourse not.

Erik Jonker
@ErikJonker@mastodon.social

@david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca ...in the end it is for judges to decide but i find it completely irrational that it should not be allowed to what i can read publicly on the internet, process/analyse with algorithms for my personal (!) use. People with disabilities also use machine readers, we allow those also.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

I can read a blog post, because it is placed on the Internet with a license that permits me to read it.

I can quote from it if I write something else, because that is fair use.

If I download a copy and store it on my computer, that is may not technically be allowed, but if I do not distribute it then there are no statutory damages and it would be hard to argue actual damages, so on one cares.

If I extract a substantial portion of it into some other work, that new work is a derived work of the original and I just have a license to create it.

If I create a new post having read some original and it is clearly (as judged by a court) a copy of the original then I require a license. For example, Paramount grants an explicit license for non-commercial fan fiction set in the Star Trek universe because it would be illegal to distribute otherwise. This is why Axanar was unable to be completed: they tried to make it a commercial project and it was not covered by the non-commercial license grant. There is an explicit exemption here for parody, but it must be a parody of the thing being copied.

All of this is well established in copyright law.

The claim by LLM trainers is that training an LLM is creating a new creative work (if it is not, then the weights cannot be covered by copyright), but that it is not a derived work of the original. The last part of this claim is tenuous because LLMs can generate this that are identical to the originals and they can also output things that would be regarded as a derived work by the court.

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca here's a hint

COPY right

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca here's a hint

COPY right

Erik Jonker
@ErikJonker@mastodon.social

@Wearwolf@kind.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca for strictly personal, non-commercial use copyright law gives a lot of freedom.

Erik Jonker
@ErikJonker@mastodon.social

@Wearwolf@kind.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca for strictly personal, non-commercial use copyright law gives a lot of freedom.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @Wearwolf@kind.social @chris@mstdn.chrisalemany.ca

Try creating copies of music CDs or movie DVDs for strictly personal, non-commercial use and see what the EUCD says about your legal liability.

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca it's not personal, non-commercial use though

LLM work by learning what words are associated with what context and then spitting those words out again when prompted with a similar context

They are regurgitation engines. They can only spit out, produce a copy of, the content that went into them.

It would be illegal to sell collages of people's Instagram posts without their permission. LLMs are that but with people's words

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@ErikJonker@mastodon.social @Wearwolf@kind.social @chris@mstdn.chrisalemany.ca

Try creating copies of music CDs or movie DVDs for strictly personal, non-commercial use and see what the EUCD says about your legal liability.

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca it's not personal, non-commercial use though

LLM work by learning what words are associated with what context and then spitting those words out again when prompted with a similar context

They are regurgitation engines. They can only spit out, produce a copy of, the content that went into them.

It would be illegal to sell collages of people's Instagram posts without their permission. LLMs are that but with people's words

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca the argument you would use is actually that it's a transformative work. The process of the content going through the machine makes it unique.

The problem there is that LLMs can't have original ideas. They can't rephrase or paraphrase. All of that transformation comes from combining your words with other people's words

So the defence is that you have stolen from so many people that it's not obvious what was stolen from whom

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca the argument you would use is actually that it's a transformative work. The process of the content going through the machine makes it unique.

The problem there is that LLMs can't have original ideas. They can't rephrase or paraphrase. All of that transformation comes from combining your words with other people's words

So the defence is that you have stolen from so many people that it's not obvious what was stolen from whom

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.

Kyle Brown :DBFHBear:
@Wearwolf@kind.social

@ErikJonker@mastodon.social @david_chisnall@infosec.exchange @chris@mstdn.chrisalemany.ca the argument you would use is actually that it's a transformative work. The process of the content going through the machine makes it unique.

The problem there is that LLMs can't have original ideas. They can't rephrase or paraphrase. All of that transformation comes from combining your words with other people's words

So the defence is that you have stolen from so many people that it's not obvious what was stolen from whom

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@Wearwolf@kind.social @ErikJonker@mastodon.social @chris@mstdn.chrisalemany.ca

The legal fig leaf here might be that quoting small amounts of other peoples' work is protected by fair use (which is an affirmative defence). There are two problems with this:

First, as a few of the lawsuits have shown, the correct prompt can generate large amounts of original versions, far beyond the amount allowed for quotation. If it can be extracted from the model, then it must be contained within the model and so the model is a derived work.

Second, quoting more than a very small amount usually requires attribution. This is where non-US laws may be stricter. In the UK and EU, there is a notion of 'moral rights'. You may have read something like 'the moral rights of the author to be associated with the work...' in the front of a book. Moral rights are somewhat different to copyright: even where you have the right to use something that I have written, you do not have the right to claim that you wrote it. The fact that LLMs are not able to provide attribution risks running afoul of this.