\n\t\t\t\t

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> Subscribe Now<\/em><\/p>\n\n\n\n

\n<\/div>
Researchers have published the most comprehensive survey to date of so-called \u201cOS Agents\u201d \u2014 artificial intelligence systems that can autonomously control computers, mobile phones and web browsers by directly interacting with their interfaces. The 30-page academic review, accepted for publication at the prestigious Association for Computational Linguistics conference, maps a rapidly evolving field that has attracted billions in investment from major technology companies.<\/p>\n\n\n\n
\u201cThe dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations,\u201d the researchers write. \u201cWith the evolution of (multimodal) large language models ((M)LLMs), this dream is closer to reality.\u201d<\/p>\n\n\n\n
The survey, led by researchers from Zhejiang University and OPPO AI Center, comes as major technology companies race to deploy AI agents that can perform complex digital tasks. OpenAI recently launched \u201cOperator,\u201d Anthropic released \u201cComputer Use,\u201d Apple introduced enhanced AI capabilities in \u201cApple Intelligence,\u201d and Google unveiled \u201cProject Mariner\u201d \u2014 all systems designed to automate computer interactions.<\/p>\n\n\n\n
$\"\"$
OS agents work by observing computer screens and system data, then executing actions like clicks and swipes across mobile, desktop and web platforms. The systems must understand interfaces, plan multi-step tasks and translate those plans into executable code. (Credit: GitHub)<\/figcaption><\/figure>\n\n\n\n
Tech giants rush to deploy AI that controls your desktop<\/h2>\n\n\n\n
The speed at which academic research has transformed into consumer-ready products is unprecedented, even by Silicon Valley standards. The survey reveals a research explosion: over 60 foundation models and 50 agent frameworks developed specifically for computer control, with publication rates accelerating dramatically since 2023.<\/p>\n\n\n\n
\n
\n\n\n\n
AI Scaling Hits Its Limits<\/strong><\/p>\n\n\n\n
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:<\/p>\n\n\n\n
\n
Turning energy into a strategic advantage<\/li>\n\n\n\n
Architecting efficient inference for real throughput gains<\/li>\n\n\n\n
Unlocking competitive ROI with sustainable AI systems<\/li>\n<\/ul>\n\n\n\n
Secure your spot to stay ahead<\/strong>: https:\/\/bit.ly\/4mwGngO<\/p>\n\n\n\n
\n<\/div>
This isn\u2019t just incremental progress. We\u2019re witnessing the emergence of AI systems that can genuinely understand and manipulate the digital world the way humans do. Current systems work by taking screenshots of computer screens, using advanced computer vision to understand what\u2019s displayed, then executing precise actions like clicking buttons, filling forms, and navigating between applications.<\/p>\n\n\n\n
\u201cOS Agents can complete tasks autonomously and have the potential to significantly enhance the lives of billions of users worldwide,\u201d the researchers note. \u201cImagine a world where tasks such as online shopping, travel arrangements booking, and other daily activities could be seamlessly performed by these agents.\u201d<\/p>\n\n\n\n
The most sophisticated systems can handle complex multi-step workflows that span different applications \u2014 booking a restaurant reservation, then automatically adding it to your calendar, then setting a reminder to leave early for traffic. What took humans minutes of clicking and typing can now happen in seconds, without human intervention.<\/p>\n\n\n\n
$\"\"$
The development of AI agents requires a complex training pipeline that combines multiple approaches, from initial pre-training on screen data to reinforcement learning that optimizes performance through trial and error. (Credit: arxiv.org)<\/figcaption><\/figure>\n\n\n\n
Why security experts are sounding alarms about AI-controlled corporate systems<\/h2>\n\n\n\n
For enterprise technology leaders, the promise of productivity gains comes with a sobering reality: these systems represent an entirely new attack surface that most organizations aren\u2019t prepared to defend.<\/p>\n\n\n\n
The researchers dedicate substantial attention to what they diplomatically term \u201csafety and privacy\u201d concerns, but the implications are more alarming than their academic language suggests. \u201cOS Agents are confronted with these risks, especially considering its wide applications on personal devices with user data,\u201d they write.<\/p>\n\n\n\n
The attack methods they document read like a cybersecurity nightmare. \u201cWeb Indirect Prompt Injection\u201d allows malicious actors to embed hidden instructions in web pages that can hijack an AI agent\u2019s behavior. Even more concerning are \u201cenvironmental injection attacks\u201d where seemingly innocuous web content can trick agents into stealing user data or performing unauthorized actions.<\/p>\n\n\n\n
Consider the implications: an AI agent with access to your corporate email, financial systems, and customer databases could be manipulated by a carefully crafted web page to exfiltrate sensitive information. Traditional security models, built around human users who can spot obvious phishing attempts, break down when the \u201cuser\u201d is an AI system that processes information differently.<\/p>\n\n\n\n
The survey reveals a concerning gap in preparedness. While general security frameworks exist for AI agents, \u201cstudies on defenses specific to OS Agents remain limited.\u201d This isn\u2019t just an academic concern \u2014 it\u2019s an immediate challenge for any organization considering deployment of these systems.<\/p>\n\n\n\n
The reality check: Current AI agents still struggle with complex digital tasks<\/h2>\n\n\n\n
Despite the hype surrounding these systems, the survey\u2019s analysis of performance benchmarks reveals significant limitations that temper expectations for immediate widespread adoption.<\/p>\n\n\n\n
Success rates vary dramatically across different tasks and platforms. Some commercial systems achieve success rates above 50% on certain benchmarks \u2014 impressive for a nascent technology \u2014 but struggle with others. The researchers categorize evaluation tasks into three types: basic \u201cGUI grounding\u201d (understanding interface elements), \u201cinformation retrieval\u201d (finding and extracting data), and complex \u201cagentic tasks\u201d (multi-step autonomous operations).<\/p>\n\n\n\n
The pattern is telling: current systems excel at simple, well-defined tasks but falter when faced with the kind of complex, context-dependent workflows that define much of modern knowledge work. They can reliably click a specific button or fill out a standard form, but struggle with tasks that require sustained reasoning or adaptation to unexpected interface changes.<\/p>\n\n\n\n
This performance gap explains why early deployments focus on narrow, high-volume tasks rather than general-purpose automation. The technology isn\u2019t yet ready to replace human judgment in complex scenarios, but it\u2019s increasingly capable of handling routine digital busywork.<\/p>\n\n\n\n
$\"\"$
OS agents rely on interconnected systems for perception, planning, memory and action execution. The complexity of coordinating these components helps explain why current systems still struggle with sophisticated tasks. (Credit: arxiv.org)<\/figcaption><\/figure>\n\n\n\n
What happens when AI agents learn to customize themselves for every user<\/h2>\n\n\n\n
Perhaps the most intriguing \u2014 and potentially transformative \u2014 challenge identified in the survey involves what researchers call \u201cpersonalization and self-evolution.\u201d Unlike today\u2019s stateless AI assistants that treat every interaction as independent, future OS agents will need to learn from user interactions and adapt to individual preferences over time.<\/p>\n\n\n\n
\u201cDeveloping personalized OS Agents has been a long-standing goal in AI research,\u201d the authors write. \u201cA personal assistant is expected to continuously adapt and provide enhanced experiences based on individual user preferences.\u201d<\/p>\n\n\n\n
This capability could fundamentally change how we interact with technology. Imagine an AI agent that learns your email writing style, understands your calendar preferences, knows which restaurants you prefer, and can make increasingly sophisticated decisions on your behalf. The potential productivity gains are enormous, but so are the privacy implications.<\/p>\n\n\n\n
The technical challenges are substantial. The survey points to the need for better multimodal memory systems that can handle not just text but images and voice, presenting \u201csignificant challenges\u201d for current technology. How do you build a system that remembers your preferences without creating a comprehensive surveillance record of your digital life?<\/p>\n\n\n\n
For technology executives evaluating these systems, this personalization challenge represents both the greatest opportunity and the largest risk. The organizations that solve it first will gain significant competitive advantages, but the privacy and security implications could be severe if handled poorly.<\/p>\n\n\n\n
The race to build AI assistants that can truly operate like human users is intensifying rapidly. While fundamental challenges around security, reliability, and personalization remain unsolved, the trajectory is clear. The researchers maintain an open-source repository tracking developments, acknowledging that \u201cOS Agents are still in their early stages of development\u201d with \u201crapid advancements that continue to introduce novel methodologies and applications.\u201d<\/p>\n\n\n\n
The question isn\u2019t whether AI agents will transform how we interact with computers \u2014 it\u2019s whether we\u2019ll be ready for the consequences when they do. The window for getting the security and privacy frameworks right is narrowing as quickly as the technology is advancing.<\/p>\n
\n
\n
Daily insights on business use cases with VB Daily<\/strong><\/p>\n
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n
Read our Privacy Policy<\/p>\n
\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n
An error occured.<\/p>\n<\/p><\/div>\n
\n\t\t\t\t\t $\"\"\/$ \n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n
\r\n
Source link <\/a>","protected":false},"excerpt":{"rendered":"
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers have published the most comprehensive survey to date of so-called \u201cOS Agents\u201d \u2014 artificial intelligence systems that can autonomously control computers, mobile phones and web browsers by directly […]<\/p>\n","protected":false},"author":1,"featured_media":3104,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3103","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/08\/nuneybits_Vector_art_of_human_and_AI_sharing_keyboard_9a3adda9-66ab-4482-8716-8d7eda0c5b72.webp.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3103"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3103\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3104"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}