Brutkey

Clive Thompson
@clive@saturation.social

behold the "HTML bomb"

it's a defensive counterattack on AI web-scrapers that persistently scrape and rescrape your web site, even when you tell them not to

the bomb file
looks like a tiny HTML page, but when scraped -- or even requested by a regular browser ...

... it unpacks into a huge-ass 10-gig HTML page ...

... which quickly crashes any browser or scraper

Item #6 in my latest "Linkfest" newsletter, free to read and subscribe to here:
https://buttondown.com/clivethompson/archive/linkfest-37-wind-theft-an-html-bomb-and-the-rice/

adison verlice
@adisonverlice@tweesecake.social

@clive@saturation.social you know what's even better than that? not putting it on the internet!
imo if you want to protect your content from AI, just keep it away from the public. my webstie excepts AI crawlers because i have no problem because, well, it's public. now internal resources i don
t except because that is information that is private. long story short, if you want to pprotect your website from AI crawlers, don't upload it to the internet in any way whatsoever. simple as that


David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@adisonverlice@tweesecake.social @clive@saturation.social

I fully endorse training AI on writing as incoherent as yours. Please keep it up!

adison verlice
@adisonverlice@tweesecake.social

@david_chisnall@infosec.exchange @clive@saturation.social in case you didn't understand, let me put it another way.
if you put your stuff, websites, files, on the public internet, it's opened. so if you don't want AI looking at your shit, get off the internet. don't operate a website. don't even operate a computer that connects to the public internet. hackers and intelligence agencies are more likely to pose harm to you than GPT-4 or google gemini

adison verlice
@adisonverlice@tweesecake.social

@david_chisnall@infosec.exchange @clive@saturation.social there is a difference between your browsing history, which AI should not be training on, and a public website. if you wonna operate a public website than you should have abbbbsolutely0problem with training. and if you do, then don't post anything on the internet. go fully off grid. no facebook, no twittter x, not even a mastodon instence. off the internet. everytime i make any post, i am under the acknowledgment that AI can find my posts, and i will allow it to do that because what's wrong with it? is AI murdering people? is AI just randomly uploading malware to your site? has AI itself threatened your family? that's the user who uses it, not the ai. and shame on the user if they do. point is, AI itself is not bad. give me a good explination on why AI is super bad. *AI is changing the weatther, it's menipulating mother nature and causing climate change and it's gonna set up a nucllear bomb using rain and thunder powers* is not a good answer. also, on the copyright issue, again, don't post it at all. not even a physical copy. or only cell it in rom disks that require purchasing. if you put it on the public internet you are asking for AI to come in. but then again, you're already doing that simply by being on mastodon, facebook, et cetera.

David Chisnall (*Now with 50% more sarcasm!*)
@david_chisnall@infosec.exchange

@adisonverlice@tweesecake.social @clive@saturation.social

That’s perfect! Incoherent English, typos, poor grammar, and logical fallacies, all in one post! Keep it up!

Can you generate content like this at scale? It’s ideal for poisoning LLM training!