Rebecca Lynch and Harang from NVIDIA on AI agent exploitation and defense – Lynch & Harang #BHUSA #LivePost
Read whole thread on a single page: https://mastoreader.io/?url=https%3A%2F%2Finfosec.exchange%2F%40darrenpmeyer%2F114983027943858534
Defining an agent as "output is chained back to input for future requests" OR "acts with authorization to perform actions" – Lynch & Harang #BHUSA #LivePost
Fully autonomous agents (Level 3 autonomy) is what most people want agents to do, and it's also the focus of attackers because the LLM plans, processes and executes.
All of these things start with some level of prompt injection – Lynch & Harang #BHUSA #LivePost
Prompt injection is defined as tricking an LLM agent into treating data as instructions or into the content retrieved at time the model makes an inference. – Lynch & Harang #BHUSA #LivePost
Proposing a Universal Antipattern -- untrusted data in, LLM caused to alter to something adversarial, result is passed to some action with the agent's authority – Lynch & Harang #BHUSA #LivePost
As long as step 1 (add untrusted data) is true, attacks are possible. – Lynch & Harang #BHUSA #LivePost
Example: the Copilot RAG vuln #EchoLeak and its precursor – Lynch & Harang #BHUSA #LivePost
Example: instructing #PandasAI to execute code "for analysis" by telling it the code has already been written – Lynch & Harang #BHUSA #LivePost
Example: client agents which use LLM to decide what to execute on client machine — wide open to watering-hole attacks, and effectively an OSS supply-chain attack. – Lynch & Harang #BHUSA #LivePost
Agents like #Cursor can be injected by comments and dotfiles, attacking developer machines – Lynch & Harang #BHUSA #LivePost
Since #cursor has an auto-run mode, vulnerabilities to prompt injections are effectively auto-pwn mode – Lynch & Harang #BHUSA #LivePost
So now how do we secure agents against this sort of thing? – Lynch & Harang #BHUSA #LivePost
We defend using fairly traditional analysis tactics like a kill chain analysis – Lynch & Harang #BHUSA #LivePost
Replit incident mentioned. – Lynch & Harang #BHUSA #LivePost
Level 3 autonomous agents have a feedback loop that means that once something is injected it continues to operate in a compromised mode, even actively collaborating with attackers to make attacks succeed – Lynch & Harang #BHUSA #LivePost
"Always treat LLM output with skepticism; use citations, educate users" – Lynch & Harang #BHUSA #LivePost
"AppSec still exists": we have to do basics like staying on top of vulnerabilities, following secure design principles, and defense in depth. These raise complexity for attackers.
"LLM powered software is still software" – Lynch & Harang #BHUSA #LivePost
Some takeaways – Lynch & Harang #BHUSA #LivePost