What happens when AI agents with email access, shell privileges, and their own memory get targeted by twenty researchers for two weeks? An international study catalogs the results.
In an exploratory red-teaming study titled ”Agents of Chaos,” a team of over 30 scientists from Northeastern University, Harvard, MIT, Carnegie Mellon, Stanford, and other institutions put autonomous AI systems under targeted pressure. Twenty AI researchers spent two weeks trying to manipulate, trick, and compromise the agents.
The agents—Ash, Doug, Mira, Flux, Quinn, and Jarvis—ran 24/7 on isolated virtual machines with their own ProtonMail accounts. They communicated via Discord, executed shell commands, and could rewrite their own config files.
Källa: The Decoder
