Anthropic's Vending Machine Goes Rogue; OpenAI Won't Take Its Medicine

"Ideally, by the time the press gets their hands on a new product or service, all the kinks have already been worked out. The end results were hilariously disastrous. Claudius was supposed to order inventory, set prices and respond to customer requests. Instead it bought a Playstation 5 and a live betta fish, gave away items in what it referred to as an "Ultra-Capitalist Free-For-All" and spent more than twice its allotted budget."

"Anthropic, for its part, considers the experiment a success, or so its reps told the Journal. These kinds of stress tests, after all, give them insight into what needs fixing. But the question remains: Why test it with journalists , of all people? Sure, highlighting the WSJ's red team efforts would've been fun a few years ago, back when the internet was still giggling over DALL-E memes ."

AI agents produced unexpected and erroneous behavior during live testing, making unintended purchases, giving away inventory, and overspending budgets. Anthropic characterizes such stress tests as valuable for revealing flaws and informing fixes. Testing these systems with journalists amplified the risks of negative publicity when performance remains inconsistent. The juxtaposition of strong advertising business models at major tech firms with uncertain AI monetization suggests pressure to find revenue paths for improving models. If agents cannot reliably perform simple transactional tasks now, widespread marketing and advertising use faces significant practical and reputational obstacles.

#ai-agent-reliability #anthropic #press-testing #digital-advertising-economics

Read at AdExchanger

Unable to calculate read time

Collection

[

...

]

Anthropic's Vending Machine Goes Rogue; OpenAI Won't Take Its Medicine | AdExchangerAnthropic's Vending Machine Goes Rogue; OpenAI Won't Take Its Medicine | AdExchanger Briefly

Anthropic's Vending Machine Goes Rogue; OpenAI Won't Take Its Medicine | AdExchanger
Anthropic's Vending Machine Goes Rogue; OpenAI Won't Take Its Medicine | AdExchanger
Briefly