{"id":3384,"date":"2025-08-27T05:01:41","date_gmt":"2025-08-27T05:01:41","guid":{"rendered":"https:\/\/violethoward.com\/new\/how-procedural-memory-can-cut-the-cost-and-complexity-of-ai-agents\/"},"modified":"2025-08-27T05:01:41","modified_gmt":"2025-08-27T05:01:41","slug":"how-procedural-memory-can-cut-the-cost-and-complexity-of-ai-agents","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/how-procedural-memory-can-cut-the-cost-and-complexity-of-ai-agents\/","title":{"rendered":"How procedural memory can cut the cost and complexity of AI agents"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> <em>Subscribe Now<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>A new technique from Zhejiang University and Alibaba Group gives large language model (LLM) agents a dynamic memory, making them more efficient and effective at complex tasks. The technique, called Memp, provides agents with a \u201cprocedural memory\u201d that is continuously updated as they gain experience, much like how humans learn from practice.<\/p>\n\n\n\n<p>Memp creates a lifelong learning framework where agents don\u2019t have to start from scratch for every new task. Instead, they become progressively better and more efficient as they encounter new situations in real-world environments, a key requirement for reliable enterprise automation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-case-for-procedural-memory-in-ai-agents\">The case for procedural memory in AI agents<\/h2>\n\n\n\n<p>LLM agents hold promise for automating complex, multi-step business processes. In practice, though, these long-horizon tasks can be fragile. The researchers point out that unpredictable events like network glitches, user interface changes or shifting data schemas can derail the entire process. For current agents, this often means starting over every time, which can be time-consuming and costly.<\/p>\n\n\n\n<p>Meanwhile, many complex tasks, despite surface differences, share deep structural commonalities. Instead of relearning these patterns every time, an agent should be able to extract and reuse its experience from past successes and failures, the researchers point out. This requires a specific \u201cprocedural memory,\u201d which in humans is the long-term memory responsible for skills like typing or riding a bike, that become automatic with practice.<\/p>\n\n\n\n<div id=\"boilerplate_2803147\" class=\"post-boilerplate boilerplate-speedbump\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong\/><strong>AI Scaling Hits Its Limits<\/strong><\/p>\n\n\n\n<p>Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Turning energy into a strategic advantage<\/li>\n\n\n\n<li>Architecting efficient inference for real throughput gains<\/li>\n\n\n\n<li>Unlocking competitive ROI with sustainable AI systems<\/li>\n<\/ul>\n\n\n\n<p><strong>Secure your spot to stay ahead<\/strong>: https:\/\/bit.ly\/4mwGngO<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div><figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" height=\"591\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?w=800\" alt=\"\" class=\"wp-image-3016096\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png 997w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=300,222 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=768,568 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=800,591 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=400,296 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=750,554 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=578,427 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_8ff64a.png?resize=930,687 930w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>Starting from scratch (top) vs using procedural memory (bottom) (source: arXiv)<\/em><\/figcaption><\/figure>\n\n\n\n<p>Current agent systems often lack this capability. Their procedural knowledge is typically hand-crafted by developers, stored in rigid prompt templates or embedded within the model\u2019s parameters, which are expensive and slow to update. Even existing memory-augmented frameworks provide only coarse abstractions and don\u2019t adequately address how skills should be built, indexed, corrected and eventually pruned over an agent\u2019s lifecycle.<\/p>\n\n\n\n<p>Consequently, the researchers note in their paper, \u201cthere is no principled way to quantify how efficiently an agent evolves its procedural repertoire or to guarantee that new experiences improve rather than erode performance.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-memp-works\">How Memp works<\/h2>\n\n\n\n<p>Memp is a task-agnostic framework that treats procedural memory as a core component to be optimized. It consists of three key stages that work in a continuous loop: building, retrieving, and updating memory.<\/p>\n\n\n\n<p>Memories are built from an agent\u2019s past experiences, or \u201ctrajectories.\u201d The researchers explored storing these memories in two formats: verbatim, step-by-step actions; or distilling these actions into higher-level, script-like abstractions. For retrieval, the agent searches its memory for the most relevant past experience when given a new task. The team experimented with different methods, such vector search, to match the new task\u2019s description to past queries or extracting keywords to find the best fit.<\/p>\n\n\n\n<p>The most critical component is the update mechanism. Memp introduces several strategies to ensure the agent\u2019s memory evolves. As an agent completes more tasks, its memory can be updated by simply adding the new experience, filtering for only successful outcomes or, most effectively, reflecting on failures to correct and revise the original memory.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" height=\"450\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?w=800\" alt=\"\" class=\"wp-image-3016097\" style=\"width:840px;height:auto\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png 997w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=300,169 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=768,432 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=800,450 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=400,225 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=750,422 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=578,325 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_84f2ca.png?resize=930,523 930w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>Memp framework (source: arXiv)<\/em><\/figcaption><\/figure>\n\n\n\n<p>This focus on dynamic, evolving memory places Memp within a growing field of research aimed at making AI agents more reliable for long-term tasks. The work parallels other efforts, such as Mem0, which consolidates key information from long conversations into structured facts and knowledge graphs to ensure consistency. Similarly, A-MEM enables agents to autonomously create and link \u201cmemory notes\u201d from their interactions, forming a complex knowledge structure over time.<\/p>\n\n\n\n<p>However, co-author Runnan Fang highlights a critical distinction between Memp and other frameworks.<\/p>\n\n\n\n<p>\u201cMem0 and A-MEM are excellent works\u2026 but they focus on remembering salient content <em>within<\/em> a single trajectory or conversation,\u201d Fang commented to VentureBeat. In essence, they help an agent remember \u201cwhat\u201d happened. \u201cMemp, by contrast, targets cross-trajectory procedural memory.\u201d It focuses on \u201chow-to\u201d knowledge that can be generalized across similar tasks, preventing the agent from re-exploring from scratch each time.\u00a0<\/p>\n\n\n\n<p>\u201cBy distilling past successful workflows into reusable procedural priors, Memp raises success rates and shortens steps,\u201d Fang added. \u201cCrucially, we also introduce an update mechanism so that this procedural memory keeps improving\u2014 after all, practice makes perfect for agents too.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-overcoming-the-cold-start-problem\">Overcoming the \u2018cold-start\u2019 problem<\/h2>\n\n\n\n<p>While the concept of learning from past trajectories is powerful, it raises a practical question: How does an agent build its initial memory when there are no perfect examples to learn from? The researchers address this \u201ccold-start\u201d problem with a pragmatic approach.<\/p>\n\n\n\n<p>Fang explained that devs can first define a robust evaluation metric instead of requiring a perfect \u201cgold\u201d trajectory upfront. This metric, which can be rule-based or even another LLM, scores the quality of an agent\u2019s performance. \u201cOnce that metric is in place, we let state-of-the-art models explore within the agent workflow and retain the trajectories that achieve the highest scores,\u201d Fang said. This process rapidly bootstraps an initial set of useful memories, allowing a new agent to get up to speed without extensive manual programming.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-memp-in-action\">Memp in action<\/h2>\n\n\n\n<p>To test the framework, the team implemented Memp on top of powerful LLMs like GPT-4o, Claude 3.5 Sonnet and Qwen2.5, evaluating them on complex tasks like household chores in the ALFWorld benchmark and information-seeking in TravelPlanner. The results showed that building and retrieving procedural memory allowed an agent to distill and reuse its prior experience effectively.<\/p>\n\n\n\n<p>During testing, agents equipped with Memp not only achieved higher success rates but became much more efficient. They eliminated fruitless exploration and trial-and-error, leading to a substantial reduction in both the number of steps and the token consumption required to complete a task.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" height=\"396\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?w=800\" alt=\"\" class=\"wp-image-3016098\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png 997w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=300,149 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=768,381 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=800,396 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=100,50 100w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=400,198 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=750,372 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=578,286 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_0fe908.png?resize=930,461 930w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>Using procedural memory (right) helps agents accomplish tasks in fewer steps and using fewer tokens (source: arXiv)<\/em><\/figcaption><\/figure>\n\n\n\n<p>One of the most significant findings for enterprise applications is that procedural memory is transferable. In one experiment, procedural memory generated by the powerful GPT-4o was given to a much smaller model, Qwen2.5-14B. The smaller model saw a significant boost in performance, improving its success rate and reducing the steps needed to complete tasks. <\/p>\n\n\n\n<p>According to Fang, this works because smaller models often handle simple, single-step actions well but falter when it comes to long-horizon planning and reasoning. The procedural memory from the larger model effectively fills this capability gap. This suggests that knowledge can be acquired using a state-of-the-art model, then deployed on smaller, more cost-effective models without losing the benefits of that experience.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-toward-truly-autonomous-agents\">Toward truly autonomous agents<\/h2>\n\n\n\n<p>By equipping agents with memory-update mechanisms, the Memp framework allows them to continuously build and refine their procedural knowledge while operating in a live environment. The researchers found this endowed the agent with a \u201ccontinual, almost linear mastery of the task.\u201d<\/p>\n\n\n\n<p>However, the path to full autonomy requires overcoming another hurdle: Many real-world tasks, such as producing a research report, lack a simple success signal. To continuously improve, an agent needs to know if it did a good job. Fang says the future lies in using LLMs themselves as judges. <\/p>\n\n\n\n<p>\u201cToday we often combine powerful models with hand-crafted rules to compute completion scores,\u201d he notes. \u201cThis works, but hand-written rules are brittle and hard to generalize.\u201d <\/p>\n\n\n\n<p>An LLM-as-judge could provide the nuanced, supervisory feedback needed for an agent to self-correct on complex, subjective tasks. This would make the entire learning loop more scalable and robust, marking a critical step toward building the resilient, adaptable and truly autonomous AI workers needed for sophisticated enterprise automation.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/how-procedural-memory-can-cut-the-cost-and-complexity-of-ai-agents\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new technique from Zhejiang University and Alibaba Group gives large language model (LLM) agents a dynamic memory, making them more efficient and effective at complex tasks. The technique, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3385,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3384","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/08\/llm-agent-memory.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3384","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3384"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3384\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3385"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3384"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3384"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3384"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 22:12:09 UTC -->