@david_chisnall@infosec.exchange
@bradley@techhub.social @DavyJones@c.im
We have decades of research that tells us that machine learning techniques tend not to do well with adaptive adversaries because the adversary can adjust their behaviour faster than the model can adapt. There's a huge body of anomaly detection research that worked really well, right up until a red team got involved and did something slightly different.
This is even more true for things like LLMs, where a huge amount of their behaviour is baked during a slow (and very expensive) training step. People aren't going to retrain LLMs every time a new kind of ad bypasses some filter and does prompt injection, they'll add more rule-based filters and they'll tweak the prompt to try to block it, which means the attacker will find it easy to bypass.
@pluralistic@mamot.fr
@david_chisnall@infosec.exchange @bradley@techhub.social @DavyJones@c.im
I first wrote about this 20+ years ago:
https://people.well.com/user/doctorow/metacrap.htm
I can't believe I have to restate it in the context of AI:
https://pluralistic.net/2025/08/02/inventing-the-pedestrian/#three-apis-in-a-trenchcoat