
"Mythos outshined all previous models, becoming the first model to solve TLO from start to finish. The average Mythos Preview run got through 22 of the 32 required infiltration steps, significantly higher than the 16-step average achieved by Claude 4.6."
"AISI points out that the model still struggles with 'Cooling Tower,' an even more difficult seven-step test designed to simulate an attempted disruption of the control software for a power plant."
"Mythos' performance on TLO suggests that the model is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained."
"AISI warns that those designing system protections should similarly utilize AI models to help harden their defenses as future models match or outperform Mythos' capabilities."
Mythos outperformed previous models by successfully completing TLO tests, achieving 22 out of 32 infiltration steps compared to Claude 4.6's 16 steps. However, it struggles with the more complex 'Cooling Tower' test. AISI notes that while Mythos can autonomously attack weakly defended systems, its evaluations may not reflect real-world conditions due to the lack of active defenses in simulated environments. Future models may surpass Mythos, prompting the need for enhanced AI-driven defenses in system protection.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]