See through local AI lies with Irish eyes

"LLMs [large language models] confidently claim things that are manifestly untrue. Enforce has developed Verity, a tool that helps minimise false claims and fake sources from self-hosted LLMs."

"Verity is not simply an LLM-as-judge setup in which one LLM evaluates the output of another. Rather, it's a set of seven layers designed to review model output. The system involves: strict rules for fact sourcing; a strong critic LLM that differs from the primary model family; a small critic LLM similar to the strong critic but from different training data; an encoder transformer trained on entailment labels; a regex evaluator; a stochastic re-sampler for catching low-confidence guesses; and a logprob analyser that checks token entropy."

"The Verity MCP server offers access to a set of smaller models that will try to assess the accuracy of a primary local LLM, something more people have begun to explore in response to rising prices at cloud AI providers, availability issues, and privacy concerns."

"The hardware recommendations call for a system with two GPUs, but that's to allow concurrent delivery of a second opinion from the Verity checker. On a machine with one GPU, like a MacBook P"

Verity is a self-hosted MCP server that helps minimize false claims and fake sources produced by local LLMs. It provides access to smaller models that assess the accuracy of a primary local LLM, addressing concerns about cloud pricing, availability, and privacy. Verity is not a simple LLM-as-judge approach; it uses seven layers to review model output. These layers include strict rules for fact sourcing, a strong critic LLM from a different model family, a small critic LLM trained on different data, an entailment-label encoder transformer, a regex evaluator, a stochastic re-sampler for low-confidence guesses, and a logprob analyzer using token entropy. Hardware guidance assumes two GPUs for concurrent second-opinion delivery.

#ai-fact-checking #llm-evaluation #mcp-model-context-protocol #self-hosted-ai #model-ensembles

Read at theregister

Unable to calculate read time

Collection

[

...

]

See through local AI lies with Irish eyesSee through local AI lies with Irish eyes Briefly

See through local AI lies with Irish eyes
See through local AI lies with Irish eyes
Briefly