Thread | Brutkey

:pona_plush: #FediPact :pona_plush:
@FediPact@cyberpunk.lol

【LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI (Including Many Fediverse Instances!!!)】

"The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal."

ARTICLE: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower

FULL PDF: https://www.dropsitenews.com/api/v1/file/b3555944-e204-4f5e-9a64-e44281b19a82.pdf

#FediPact #meta #threads #AI

ophiocephalic 🐍

@ophiocephalic@kolektiva.social

@FediPact@cyberpunk.lol

Rather than scraping from sites directly, many of the addresses on Meta’s leaked list belong to Content Delivery Networks (CDNs) that are used by websites to cache and store information to improve site performance.

This is a critical point. An instance or website can defend itself in numerous different ways, including actively adversarial strategies, and still succumb to extraction - if they're using Cloudflare

cc: @subMedia@kolektiva.social

ophiocephalic 🐍

@ophiocephalic@kolektiva.social

@FediPact@cyberpunk.lol
Another sickening consideration here. If they're scraping Cloudflare and CDNs rather than directly, it's possible or likely they're not just extracting public posts, but all posts, including DMs

@subMedia@kolektiva.social

Kevin Karhan :verified:
@kkarhan@infosec.space

@ophiocephalic@kolektiva.social @FediPact@cyberpunk.lol @subMedia@kolektiva.social People who use #CloudFlare since #KiwiFarms became their client (which they only fired when bigger corporate clients went "it's us or them!") already gave up on #Hosting, cuz #ClownFlare is a #RogueISP known for willingly hosting #Cybercrime & #Daesh propaganda sites!
#OCILLA only protects their ass until they know about it!