Suddenly, Claude was kicking off four, five, six, seven, even eight agents at once. I had no visibility into what they were all doing. I didn't even have a way to stop them if one or more ran amok. And run amok they sure did. One got stuck trying to access a file for which it didn't have root privileges. Another went in and attempted to refactor an entire app (which I did not request).
The most dangerous assumption in quality engineering right now is that you can validate an autonomous testing agent the same way you validated a deterministic application. When your systems can reason, adapt, and make decisions on their own, that linear validation model collapses.
AI on the dark side has done three things particularly well: speed, scale, and sophistication. As a result, the time between a successful intrusion and the actual theft of data has decreased significantly over the past three years. Whereas three years ago the average period was nine days, it is now one day. The fastest case documented by Palo Alto Networks was even 72 minutes.
AI agents and other systems can't yet conduct cyberattacks fully on their own - but they can help criminals in many stages of the attack chain, according to the International AI Safety report. The second annual report, chaired by the Canadian computer scientist Yoshua Bengio and authored by more than 100 experts across 30 countries, found that over the past year, developers of AI systems have vastly improved their ability to help automate and perpetrate cyberattacks.