LLMs need companion bots to check work, keep them honest
Briefly

LLMs need companion bots to check work, keep them honest
"Sikka is a towering figure in AI. He has a PhD in the subject from Stanford, where his student advisor was John McCarthy, the man who in 1955 coined the term "artificial intelligence." Lessons Sikka learned from McCarthy inspired him to team up with his son and write a study, "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models," which was published in July."
""We have an example my son came up with of two prompts that have identical tokens and when you run them, the exact same number of operations get performed independent of what the tokens are," he said. "Therein is the entire point, that whether the prompt is expressing the user's desire to perform a particular calculation or the prompt is expressing a user's desire to write a piece of text on something, it does exactly the same number of calculations.""
""When we say, 'Go book a ticket for me and then charge my credit card or deduct the amount from my bank and then send a post to my financial app,' which is what all these agent vendors are kind of saying, you are asking the agents to perform an action which holds a meaning to you, which holds a particular semantic to you, and if it i"
Large language models operate with a fixed computational budget determined by their training and architecture. When tasks require more reliable or numerous calculations than that budget allows, the models begin producing hallucinations. Identical-token prompts can trigger the same number of internal operations regardless of semantic intent, so asking for different semantic outcomes does not increase computational work. Requesting agents to execute real-world tasks (for example, booking tickets and charging credit cards) assigns semantic meaning and computation beyond the model's reliable limits. Companion verification agents or bots that check calculations and actions can mitigate hallucinations by validating or offloading steps.
Read at Theregister
Unable to calculate read time
[
|
]