{"id":2160,"date":"2025-06-28T23:20:10","date_gmt":"2025-06-28T23:20:10","guid":{"rendered":"https:\/\/violethoward.com\/new\/from-hallucinations-to-hardware-lessons-from-a-real-world-computer-vision-project-gone-sideways\/"},"modified":"2025-06-28T23:20:10","modified_gmt":"2025-06-28T23:20:10","slug":"from-hallucinations-to-hardware-lessons-from-a-real-world-computer-vision-project-gone-sideways","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/from-hallucinations-to-hardware-lessons-from-a-real-world-computer-vision-project-gone-sideways\/","title":{"rendered":"From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways"},"content":{"rendered":" \r\n
\n\t\t\t\t
\n

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy.\u00a0Learn more<\/em><\/p>\n\n\n\n


\n<\/div>

Computer vision projects rarely go exactly as planned, and this one was no exception. The idea was simple: Build a model that could look at a photo of a laptop and identify any physical damage \u2014 things like cracked screens, missing keys or broken hinges. It seemed like a straightforward use case for image models and large language models (LLMs), but it quickly turned into something more complicated.<\/p>\n\n\n\n

Along the way, we ran into issues with hallucinations, unreliable outputs and images that were not even laptops. To solve these, we ended up applying an agentic framework in an atypical way \u2014 not for task automation, but to improve the model\u2019s performance.<\/p>\n\n\n\n

In this post, we will walk through what we tried, what didn\u2019t work and how a combination of approaches eventually helped us build something reliable.<\/p>\n\n\n\n

Where we started: Monolithic prompting<\/h2>\n\n\n\n

Our initial approach was fairly standard for a multimodal model. We used a single, large prompt to pass an image into an image-capable LLM and asked it to identify visible damage. This monolithic prompting strategy is simple to implement and works decently for clean, well-defined tasks. But real-world data rarely plays along.<\/p>\n\n\n\n

We ran into three major issues early on:<\/p>\n\n\n\n