So according to this article: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower
#Meta is scraping the media proxies of mstdn, masto and .coffee..
If this is true, this is very worriying and pisses me very much off
No wonder our media loads so crappy if they are constantly tapping in..
Fuck #Meta to hell
@stux@mstdn.social A leak from what orifice ?
@stux@mstdn.social
The robots.txt file is just an 'ask' but means nothing.
See Cloudflare and Perplexity pointing at each other.
@stux@mstdn.social "niche forums, personal blogs, and even revenge porn sites"
...so Meta is training its AI to cater to incels, eh?
@stux@mstdn.social lol. They prohibit, shadowban, or actual-ban any acknowledgement of the existence of the fediverse on their own platforms, while also scraping it for content illegally.
Duplicitous and vile.
@stux@mstdn.social does that mean Fedi people can take part in the class action law suit that could do "immense harm not only to a single AI company, but to the entire fledgling AI industry and to Americaβs global technological competitiveness." as stated by said industry reps?
https://arstechnica.com/tech-policy/2025/08/ai-industry-horrified-to-face-largest-copyright-class-action-ever-certified/
via @Lazarou@mastodon.social
https://mastodon.social/@Lazarou/114994400792642672
Does anyone know a good way to block them?
I do not want anything to do with their shady business and they should stay away from ours
Just edited our #nginx configs and added
if ($http_user_agent ~* "Meta-ExternalAgent") {
return 403;
}
to the server block
#AI #NoAI #Meta
@stux@mstdn.social block their entire ASN? it will make at least them spend money by using another network
@athos@bolha.one @stux@mstdn.social They have a whole page about their crawlers and a handy command to get all their IP address blocks from their ASN: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers
I suggest checking that every so often and updating the address blocks accordingly
@athos@bolha.one @stux@mstdn.social They have a whole page about their crawlers and a handy command to get all their IP address blocks from their ASN: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers
I suggest checking that every so often and updating the address blocks accordingly