{"id":3213,"date":"2025-08-19T12:08:54","date_gmt":"2025-08-19T12:08:54","guid":{"rendered":"https:\/\/violethoward.com\/new\/gepa-optimizes-llms-without-costly-reinforcement-learning\/"},"modified":"2025-08-19T12:08:54","modified_gmt":"2025-08-19T12:08:54","slug":"gepa-optimizes-llms-without-costly-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/gepa-optimizes-llms-without-costly-reinforcement-learning\/","title":{"rendered":"GEPA optimizes LLMs without costly reinforcement learning"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> <em>Subscribe Now<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Researchers from the University of California, Berkeley, Stanford University and Databricks have introduced a new AI optimization method called GEPA that significantly outperforms traditional reinforcement learning (RL) techniques for adapting large language models (LLMs) to specialized tasks.<\/p>\n\n\n\n<p>GEPA removes the popular paradigm of learning through thousands of trial-and-error attempts guided by simple numerical scores. Instead, it uses an LLM\u2019s own language understanding to reflect on its performance, diagnose errors, and iteratively evolve its instructions. In addition to being more accurate than established techniques, GEPA is significantly more efficient, achieving superior results with up to 35 times fewer trial runs.<\/p>\n\n\n\n<p>For businesses building complex AI agents and workflows, this translates directly into faster development cycles, substantially lower computational costs, and more performant, reliable applications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-high-cost-of-optimizing-modern-ai-systems\">The high cost of optimizing modern AI systems<\/h2>\n\n\n\n<p>Modern enterprise AI applications are rarely a single call to an LLM. They are often \u201ccompound AI systems,\u201d complex workflows that chain multiple LLM modules, external tools such as databases or code interpreters, and custom logic to perform sophisticated tasks, including multi-step research and data analysis.<\/p>\n\n\n\n<div id=\"boilerplate_2803147\" class=\"post-boilerplate boilerplate-speedbump\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong\/><strong>AI Scaling Hits Its Limits<\/strong><\/p>\n\n\n\n<p>Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Turning energy into a strategic advantage<\/li>\n\n\n\n<li>Architecting efficient inference for real throughput gains<\/li>\n\n\n\n<li>Unlocking competitive ROI with sustainable AI systems<\/li>\n<\/ul>\n\n\n\n<p><strong>Secure your spot to stay ahead<\/strong>: https:\/\/bit.ly\/4mwGngO<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n<\/div><p>A popular way to optimize these systems is through reinforcement learning methods<span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">, such as\u00a0Group Relative Policy Optimization\u00a0(GRPO), a technique employed in popular reasoning models, including<\/span> DeepSeek-R1. This method treats the system as a black box; it runs a task, gets a simple success metric (a \u201cscalar reward,\u201d like a score of 7\/10), and uses this feedback to slowly nudge the model\u2019s parameters in the right direction.<\/p>\n\n\n\n<p>The major drawback of RL is its sample inefficiency. To learn effectively from these sparse numerical scores, RL methods often require tens of thousands, or even hundreds of thousands, of trial runs, known as \u201crollouts.\u201d For any real-world enterprise application that involves expensive tool calls (e.g., API queries, code compilation) or uses powerful proprietary models, this process is prohibitively slow and costly.<\/p>\n\n\n\n<p>As Lakshya A Agrawal, co-author of the paper and doctoral student at UC Berkeley, told VentureBeat, this complexity is a major barrier for many companies. \u201cFor many teams, RL is not practical due to its cost and complexity\u2014and their go-to approach so far would often just be prompt engineering by hand,\u201d Agrawal said. He noted that GEPA is designed for teams that need to optimize systems built on top-tier models that often can\u2019t be fine-tuned, allowing them to improve performance without managing custom GPU clusters.<\/p>\n\n\n\n<p>The researchers frame this challenge as follows: \u201cHow can we extract maximal learning signal from every expensive rollout to enable effective adaptation of complex, modular AI systems in low-data or budget-constrained settings?\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-an-optimizer-that-learns-with-language\">An optimizer that learns with language<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" height=\"464\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?w=800\" alt=\"\" class=\"wp-image-3015654\" style=\"width:840px;height:auto\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png 1434w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=300,174 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=768,446 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=800,464 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=400,232 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=750,435 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=578,335 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_b1ddfc.png?resize=930,540 930w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>GEPA framework Source: arXiv<\/em><\/figcaption><\/figure>\n\n\n\n<p>GEPA (Genetic-Pareto) is a prompt optimizer that tackles this challenge by replacing sparse rewards with rich, natural language feedback. It leverages the fact that the entire execution of an AI system (including its reasoning steps, tool calls, and even error messages) can be serialized into text that an LLM can read and understand. GEPA\u2019s methodology is built on three core pillars.<\/p>\n\n\n\n<p>First is \u201cgenetic prompt evolution,\u201d where GEPA treats a population of prompts like a gene pool. It iteratively \u201cmutates\u201d prompts to create new, potentially better versions. This mutation is an intelligent process driven by the second pillar: \u201creflection with natural language feedback.\u201d After a few rollouts, GEPA provides an LLM with the full execution trace (what the system tried to do) and the outcome (what went right or wrong). The LLM then \u201creflects\u201d on this feedback in natural language to diagnose the problem and write an improved, more detailed prompt. For instance, instead of just seeing a low score on a code generation task, it might analyze a compiler error and conclude the prompt needs to specify a particular library version.<\/p>\n\n\n\n<p>The third pillar is \u201cPareto-based selection,\u201d which ensures smart exploration. Instead of focusing only on the single best-performing prompt, which can lead to getting stuck in a suboptimal solution (a \u201clocal optimum\u201d), GEPA maintains a diverse roster of \u201cspecialist\u201d prompts. It tracks which prompts perform best on different individual examples, creating a list of top candidates. By sampling from this diverse set of winning strategies, GEPA ensures it explores more solutions and is more likely to discover a prompt that generalizes well across a wide range of inputs.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" height=\"412\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?w=800\" alt=\"\" class=\"wp-image-3015655\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png 912w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?resize=300,155 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?resize=768,396 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?resize=800,412 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?resize=400,206 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?resize=750,387 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_25fed4.png?resize=578,298 578w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>Selecting a single best candidate (left) can result in models getting stuck in local minima while Pareto selection (right) can explore more options and find optimal solutions Source: arXiv<\/em><\/figcaption><\/figure>\n\n\n\n<p>The effectiveness of this entire process hinges on what the researchers call \u201cfeedback engineering.\u201d Agrawal explains that the key is to surface the rich, textual details that systems already produce but often discard. \u201cTraditional pipelines often reduce this detail to a single numerical reward, obscuring why particular outcomes occur,\u201d he said. \u201cGEPA\u2019s core guidance is to structure feedback that surfaces not only outcomes but also intermediate trajectories and errors in plain text\u2014the same evidence a human would use to diagnose system behavior.\u201d<\/p>\n\n\n\n<p>For example, for a document retrieval system, this means listing which documents were retrieved correctly and which were missed, rather than just calculating a final score.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-gepa-in-action\">GEPA in action<\/h2>\n\n\n\n<p>The researchers evaluated GEPA across four diverse tasks, including multi-hop question answering (HotpotQA) and privacy-preserving queries (PUPA). They used both open-source (Qwen3 8B) and proprietary (GPT-4.1 mini) models, comparing GEPA against the RL-based GRPO and the state-of-the-art prompt optimizer MIPROv2.<\/p>\n\n\n\n<p>Across all tasks, GEPA substantially outperformed GRPO, achieving up to a 19% higher score while using up to 35 times fewer rollouts. Agrawal provided a concrete example of this efficiency gain: \u201cWe used GEPA to optimize a QA system in ~3 hours versus GRPO\u2019s 24 hours\u2014an 8x reduction in development time, while also achieving 20% higher performance,\u201d he explained. \u201cRL-based optimization of the same scenario in our test cost about $300 in GPU time, while GEPA cost less than $20 for better results\u201415x savings in our experiments.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" height=\"289\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?w=800\" alt=\"\" class=\"wp-image-3015656\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png 1446w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=300,108 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=768,277 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=800,289 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=400,144 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=750,271 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=578,209 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/08\/image_1b53c1.png?resize=930,336 930w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>GEPA outperforms other baselines on key benchmarks Source: arXiv<\/em><\/figcaption><\/figure>\n\n\n\n<p>Beyond raw performance, the researchers found that GEPA-optimized systems are more reliable when faced with new, unseen data. This is measured by the \u201cgeneralization gap\u201d (the difference between performance on training data and final test data). Agrawal hypothesizes that this is because GEPA learns from richer feedback. \u201cGEPA\u2019s smaller generalization gap may stem from its use of rich natural-language feedback on each outcome\u2014what worked, what failed, and why\u2014rather than relying solely on a single scalar reward,\u201d he said. \u201cThis may encourage the system to develop instructions and strategies grounded in a broader understanding of success, instead of merely learning patterns specific to the training data.\u201d For enterprises, this improved reliability means less brittle, more adaptable AI applications in customer-facing roles.<\/p>\n\n\n\n<p>A major practical benefit is that GEPA\u2019s instruction-based prompts are up to 9.2 times shorter than prompts produced by optimizers like MIPROv2, which include many few-shot examples. Shorter prompts decrease latency and reduce costs for API-based models. This makes the final application faster and cheaper to run in production.<\/p>\n\n\n\n<p>The paper also presents promising results for utilizing GEPA as an \u201cinference-time\u201d search strategy, transforming the AI from a single-answer generator into an iterative problem solver. Agrawal described a scenario where GEPA could be integrated into a company\u2019s CI\/CD pipeline. When new code is committed, GEPA could automatically generate and refine multiple optimized versions, test them for performance, and open a pull request with the best-performing variant for engineers to review. \u201cThis turns optimization into a continuous, automated process\u2014rapidly generating solutions that often match or surpass expert hand-tuning,\u201d Agrawal noted. In their experiments on CUDA code generation, this approach boosted performance on 20% of tasks to an expert level, compared to 0% for a single-shot attempt from GPT-4o.<\/p>\n\n\n\n<p>The paper\u2019s authors believe GEPA is a foundational step toward a new paradigm of AI development. But beyond creating more human-like AI, its most immediate impact may be in who gets to build high-performing systems.<\/p>\n\n\n\n<p>\u201cWe expect GEPA to enable a positive shift in AI system building\u2014making the optimization of such systems approachable by end-users, who often have the domain expertise relevant to the task, but not necessarily the time and willingness to learn complex RL specifics,\u201d Agrawal said. \u201cIt gives power directly to the stakeholders with the exact task-specific domain knowledge.\u201d<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/gepa-optimizes-llms-without-costly-reinforcement-learning\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Researchers from the University of California, Berkeley, Stanford University and Databricks have introduced a new AI optimization method called GEPA that significantly outperforms traditional reinforcement learning (RL) techniques for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3214,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3213","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/08\/prompt-optimization.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3213","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3213"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3213\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3214"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3213"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3213"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3213"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 20:40:44 UTC -->