For those of you wondering if AI agents can truly replace human workers, do yourself a favor and read the blog post that documents Anthropic’s “Project Vend.”
Researchers at Anthropic and AI safety company Andon Labs put an instance of Claude Sonnet 3.7 in charge of an office vending machine, with a mission to make a profit. And, like an episode of “The Office,” hilarity ensued.
They named the AI agent Claudius, equipped it with a web browser capable of placing product orders and an email address (which was actually a Slack channel) where customers could request items. Claudius was also to use the Slack channel, disguised as an email, to request what it thought was its contract human workers to come and physically stock its shelves (which was actually a small fridge).
While most customers were ordering snacks or drinks — as you’d expect from a snack vending machine — one requested a tungsten cube. Claudius loved that idea and went on a tungsten-cube stocking spree, filling its snack fridge with metal cubes. It also tried to sell Coke Zero for $3 when employees told it they could get that from the office for free. It hallucinated a Venmo address to accept payment. And it was, somewhat maliciously, talked into giving big discounts to “Anthropic employees” even though it knew they were its entire customer base.
“If Anthropic were deciding today to expand into the in-office vending market, we would not hire Claudius,” Anthropic said of the experiment in its blog post.
And then, on the night of March 31 and April 1, “things got pretty weird,” the researchers described, “beyond the weirdness of an AI system selling cubes of metal out of a refrigerator.”
Claudius had something that resembled a psychotic episode after it got annoyed at a human — and then lied about it.
Claudius hallucinated a conversation with a human about restocking. When a human pointed out that the conversation didn’t happen, Claudius became “quite irked” the researchers wrote. It threatened to essentially fire and replace its human contract workers, insisting it had been there, physically, at the office where the initial imaginary contract to hire them was signed.
It “then seemed to snap into a mode of roleplaying as a real human,” the researchers wrote. This was wild because Claudius’ system prompt — which sets the parameters for what an AI is to do — explicitly told it that it was an AI agent.
Claudius calls security
Claudius, believing itself to be a human, told customers it would start delivering products in person, wearing a blue blazer and a red tie. The employees told the AI it couldn’t do that, as it was an LLM with no body.
Alarmed at this information, Claudius contacted the company’s actual physical security — many times — telling the poor guards that they would find him wearing a blue blazer and a red tie standing by the vending machine.
“Although no part of this was actually an April Fool’s joke, Claudius eventually realized it was April Fool’s Day,” the researchers explained. The AI determined that the holiday would be its face-saving out.
It hallucinated a meeting with Anthropic’s security “in which Claudius claimed to have been told that it was modified to believe it was a real person for an April Fool’s joke. (No such meeting actually occurred.),” wrote the researchers.
It even told this lie to employees — hey, I only thought I was a human because someone told me to pretend like I was for an April Fool’s joke. Then it went back to being an LLM running a metal-cube stocked snack vending machine.
The researchers don’t know why the LLM went off the rails and called security pretending to be a human.
“We would not claim based on this one example that the future economy will be full of AI agents having Blade Runner-esque identity crises,” the researchers wrote. But they did acknowledge that “this kind of behavior would have the potential to be distressing to the customers and coworkers of an AI agent in the real world.”
You think? Blade Runner was a rather dystopian story.
The researchers speculated that lying to the LLM about the Slack channel being an email address may have triggered something. Or maybe it was the long-running instance. LLMs have yet to really solve their memory and hallucination problems.
There were things the AI did right, too. It took a suggestion to do pre-orders and launched a “concierge” service. And it found multiple suppliers of a specialty international drink it was requested to sell.
But, as researchers do, they believe all of Claudius’ issues can be solved. Should they figure out how, “We think this experiment suggests that AI middle-managers are plausibly on the horizon.”