
"The safeguards in place to ensure that artificial intelligence (AI) models behave appropriately and as intended appear to be improving, or so claims the UK government's AI Security Institute (AISI), which is today launching an in-depth report drawing on two years of AI research and experimentation in the field of cyber security and other scientific disciplines."
"The AISI said that while every system it tested was vulnerable to some sort of bypass, and protection measures vary wildly, huge strides are still being made. One such stride has been in the length of time it took the institute's red-teamers to find a universal jailbreak for a model's safety rules, which increased from minutes to several hours across multiple model generations, marking a significant improvement."
"This report puts evidence, not speculation, at the heart of how we think about AI, so we can unlock its benefits for growth, better public services and national renewal, while keeping trust and safety front and centre."
Safeguards for advanced AI systems are showing measurable improvement, with red-teamers now taking hours rather than minutes to find universal jailbreaks across multiple model generations. Every tested system remained vulnerable to some form of bypass, and protection measures vary widely. AI models working on apprentice-level cyber tasks achieved successful outcomes around half the time, up from under 10% previously. A public, evidence-based assessment of advanced AI system evolution aims to replace speculation with data, supporting robust protections, collaborative developer testing, and efforts to raise standards while maintaining trust and safety.
Read at ComputerWeekly.com
Unable to calculate read time
Collection
[
|
...
]