@bradley@techhub.social
@DavyJones@c.im @david_chisnall@infosec.exchange seems to me they'll end up putting "ads" in the AI and they'll be much harder to spot
@david_chisnall@infosec.exchange
@bradley@techhub.social @DavyJones@c.im
We have decades of research that tells us that machine learning techniques tend not to do well with adaptive adversaries because the adversary can adjust their behaviour faster than the model can adapt. There's a huge body of anomaly detection research that worked really well, right up until a red team got involved and did something slightly different.
This is even more true for things like LLMs, where a huge amount of their behaviour is baked during a slow (and very expensive) training step. People aren't going to retrain LLMs every time a new kind of ad bypasses some filter and does prompt injection, they'll add more rule-based filters and they'll tweak the prompt to try to block it, which means the attacker will find it easy to bypass.