Syntax hacking: Researchers discover sentence structure can bypass AI safety rules
Briefly

Syntax hacking: Researchers discover sentence structure can bypass AI safety rules
"Researchers from MIT, Northeastern University, and Meta recently released a paper suggesting that large language models (LLMs) similar to those that power ChatGPT may sometimes prioritize sentence structure over meaning when answering questions. The findings reveal a weakness in how these models process instructions that may shed light on why some prompt injection or jailbreaking approaches work, though the researchers caution their analysis of some production models remains speculative"
"since training data details of prominent commercial AI models are not publicly available. The team, led by Chantal Shaib and Vinith M. Suriyakumar, tested this by asking models questions with preserved grammatical patterns but nonsensical words. For example, when prompted with "Quickly sit Paris clouded?" (mimicking the structure of "Where is Paris located?"), models still answered "France." This suggests models absorb both meaning and syntactic patterns,"
Large language models can encode both semantic content and syntactic templates from training data. In controlled tests, models received prompts that preserved grammatical patterns but replaced meaningful words with nonsense; models often produced real-world answers tied to the syntactic template. Strong correlations between syntactic patterns and specific domains in training data can cause models to prefer structural shortcuts over semantic interpretation, enabling certain prompt-injection or jailbreak behaviors. Models' behavior depends on training-data details that are often unavailable for commercial systems, making some analyses speculative. The findings illustrate a vulnerability in how pattern-matching and context interact in LLM outputs.
Read at Ars Technica
Unable to calculate read time
[
|
]