Thread | Brutkey

1.3.6.1.4.1.61513
@xssfox@cloudisland.nz

I'm out of the loop, ai

So I see often any Ai bots scraping websites often at high rates. What's the actual go here, are they training without storing the scrapped data? Or is this after training when people are using them (requests to supplement a query, eg tool/mcp usage). Or are people inflating scraping the site 7 times a second to mean the whole site when it's just one page

I'm not doubting the extra load, I'm just curious about how it's not a scrape once and then they are done kind of deal

networking wizard catgirl
@pearl@fedi.rrr.sh

I'm out of the loop, ai

@xssfox@cloudisland.nz a lot of them will scrape a page, then scrape it a few hours later in case it changed in the last few hours, for every page on a website, including ones that are expensive for the server to handle

Michael
@fincham@cloudisland.nz

I'm out of the loop, ai

@xssfox@cloudisland.nz my assumption has been that these scrapers are all the “also ran” companies who don’t give a shit about anything, least of all engineering

Alastair McBain :unverified:
@asmcbain@woof.tech

I'm out of the loop, ai

@xssfox@cloudisland.nz Gonna guess with the rates of people asking about stuff, and what people ask about being all over the place, caching isn't as effective.

Combine the desire to keep data fresh with cache fall-off to avoid it becoming infinite size and that there's 3+ major companies offering "general purpose" NLP search front-ends in addition to existing search engines (Google, etc.) ...

and it's a mess. one big hype-bubbled mess. The pop is gonna be glorious but ugly.