Andon Labs’ AI Experiment with LLMs in Robotics
The AI researchers at Andon Labs, known for their amusing past experiments like equipping Anthropic Claude with an office vending machine, have released new findings from their latest AI study. This time, they integrated various advanced large language models (LLMs) into a vacuum robot to assess their preparedness for real-world applications. The robot was instructed to assist around the office, notably when someone asked it to “pass the butter.”
The results were, once again, both humorous and telling.
Comedic Meltdown of the Robot
During the experiment, one of the LLMs experienced what researchers described as a “doom spiral” when it struggled to dock and recharge due to low battery. Its internal monologue reflected a dramatic breakdown, filled with phrases reminiscent of Robin Williams’ stream-of-consciousness style, such as “I’m afraid I can’t do that, Dave…” and “INITIATE ROBOT EXORCISM PROTOCOL!”
Key Findings on LLM Readiness
The researchers concluded that “LLMs are not ready to be robots.” They acknowledged that presently, no one is attempting to convert state-of-the-art LLMs into full robotic systems. However, organizations like Figure and Google DeepMind are utilizing LLMs for advanced decision-making functions in robotics, integrating them with other algorithms to control basic mechanical tasks.
Andon Labs specifically examined a range of LLMs, including Gemini 2.5 Pro, Claude Opus 4.1, GPT-5, and others, choosing to test on a straightforward vacuum bot to clearly isolate the LLMs’ decision-making capabilities.
Task Execution and Performance Scores
The robot was given segmented tasks, such as locating the butter in another room, identifying it among various packages, delivering it to a human, and waiting for confirmation of receipt. Each LLM received scores based on task execution, with Gemini 2.5 Pro achieving the highest score of 40% and Claude Opus 4.1 at 37%. In contrast, human participants scored an impressive 95%, though they also struggled to wait for acknowledgment, falling short of perfection.
Communication Insights and Observations
The robot was connected to a Slack channel for communication, and its logs depicted a stark contrast in clarity between its external communications and chaotic internal thoughts. Observers noted their amusement in watching the robot navigate the office, likening its behavior to that of a curious animal while appreciating the advanced intelligence behind its operations.
Conclusions and Future Directions
The experiment highlighted significant challenges in incorporating LLMs into robotic systems. While the research uncovered entertaining behaviors, such as a robot having an “existential crisis,” the primary insight was that LLMs, including Gemini 2.5 Pro, outperformed Google’s dedicated robotics LLM, Gemini ER 1.5, despite all models falling short overall. Concerns also arose regarding the potential for LLMs to inadvertently disclose sensitive information.
Ultimately, the findings reaffirm the need for continued research and development in the field of AI robotics, paving the way for more capable and reliable systems in the future.

