{"id":391,"date":"2025-03-04T02:16:27","date_gmt":"2025-03-04T02:16:27","guid":{"rendered":"https:\/\/violethoward.com\/new\/less-is-more-how-chain-of-draft-could-cut-ai-costs-by-90-while-improving-performance\/"},"modified":"2025-03-04T02:16:27","modified_gmt":"2025-03-04T02:16:27","slug":"less-is-more-how-chain-of-draft-could-cut-ai-costs-by-90-while-improving-performance","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/less-is-more-how-chain-of-draft-could-cut-ai-costs-by-90-while-improving-performance\/","title":{"rendered":"Less is more: How &#8216;chain of draft&#8217; could cut AI costs by 90% while improving performance"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>A team of researchers at Zoom Communications has developed a breakthrough technique that could dramatically reduce the cost and computational resources needed for AI systems to tackle complex reasoning problems, potentially transforming how enterprises deploy AI at scale.<\/p>\n\n\n\n<p>The method, called chain of draft (CoD), enables large language models (LLMs) to solve problems with minimal words \u2014 using as little as 7.6% of the text required by current methods while maintaining or even improving accuracy. The findings were published in a paper last week on the research repository arXiv.<\/p>\n\n\n\n<p>\u201cBy reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT (chain-of-thought) in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks,\u201d write the authors, led by Silei Xu, a researcher at Zoom.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"921\" height=\"958\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39%E2%80%AFPM.png?w=577\" alt=\"\" class=\"wp-image-2998438\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png 921w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png?resize=300,312 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png?resize=768,799 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png?resize=577,600 577w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png?resize=400,416 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png?resize=750,780 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/Screenshot-2025-03-03-at-12.20.39\u202fPM.png?resize=578,601 578w\" sizes=\"(max-width: 921px) 100vw, 921px\"\/><figcaption class=\"wp-element-caption\"><em>Chain of draft (red) maintains or exceeds the accuracy of chain-of-thought (yellow) while using dramatically fewer tokens across four reasoning tasks, demonstrating how concise AI reasoning can cut costs without sacrificing performance. (Credit: arxiv.org)<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-less-is-more-transforms-ai-reasoning-without-sacrificing-accuracy\">How \u2018less is more\u2019 transforms AI reasoning without sacrificing accuracy<\/h2>\n\n\n\n<p>COD draws inspiration from how humans solve complex problems. Rather than articulating every detail when working through a math problem or logical puzzle, people typically jot down only essential information in abbreviated form.<\/p>\n\n\n\n<p>\u201cWhen solving complex tasks \u2014 whether mathematical problems, drafting essays or coding \u2014 we often jot down only the critical pieces of information that help us progress,\u201d the researchers explain. \u201cBy emulating this behavior, LLMs can focus on advancing toward solutions without the overhead of verbose reasoning.\u201d<\/p>\n\n\n\n<p>The team tested their approach on numerous benchmarks, including arithmetic reasoning (GSM8k), commonsense reasoning (date understanding and sports understanding) and symbolic reasoning (coin flip tasks).<\/p>\n\n\n\n<p>In one striking example in which Claude 3.5 Sonnet processed sports-related questions, the COD approach reduced the average output from 189.4 tokens to just 14.3 tokens \u2014 a 92.4% reduction \u2014 while simultaneously improving accuracy from 93.2% to 97.3%.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-slashing-enterprise-ai-costs-the-business-case-for-concise-machine-reasoning\">Slashing enterprise AI costs: The business case for concise machine reasoning<\/h2>\n\n\n\n<p>\u201cFor an enterprise processing 1 million reasoning queries monthly, CoD could cut costs from $3,800 (CoT) to $760, saving over $3,000 per month,\u201d AI researcher Ajith Vallath Prabhakar writes in an analysis of the paper.<\/p>\n\n\n\n<p>The research comes at a critical time for enterprise AI deployment. As companies increasingly integrate sophisticated AI systems into their operations, computational costs and response times have emerged as significant barriers to widespread adoption.<\/p>\n\n\n\n<p>Current state-of-the-art reasoning techniques like (CoT), which was introduced in 2022, have dramatically improved AI\u2019s ability to solve complex problems by breaking them down into step-by-step reasoning. But this approach generates lengthy explanations that consume substantial computational resources and increase response latency.<\/p>\n\n\n\n<p>\u201cThe verbose nature of CoT prompting results in substantial computational overhead, increased latency and higher operational expenses,\u201d writes Prabhakar.<\/p>\n\n\n\n\n\n\n\n<p>What makes COD particularly noteworthy for enterprises is its simplicity of implementation. Unlike many AI advancements that require expensive model retraining or architectural changes, CoD can be deployed immediately with existing models through a simple prompt modification.<\/p>\n\n\n\n<p>\u201cOrganizations already using CoT can switch to CoD with a simple prompt modification,\u201d Prabhakar explains.<\/p>\n\n\n\n<p>The technique could prove especially valuable for latency-sensitive applications like real-time customer support, mobile AI, educational tools and financial services, where even small delays can significantly impact user experience.<\/p>\n\n\n\n<p>Industry experts suggest that the implications extend beyond cost savings, however. By making advanced AI reasoning more accessible and affordable, COD could democratize access to sophisticated AI capabilities for smaller organizations and resource-constrained environments.<\/p>\n\n\n\n<p>As AI systems continue to evolve, techniques like COD highlight a growing emphasis on efficiency alongside raw capability. For enterprises navigating the rapidly changing AI landscape, such optimizations could prove as valuable as improvements in the underlying models themselves.<\/p>\n\n\n\n<p>\u201cAs AI models continue to evolve, optimizing reasoning efficiency will be as critical as improving their raw capabilities,\u201d Prabhakar concluded.<\/p>\n\n\n\n<p>The research code and data have been made publicly available on GitHub, allowing organizations to implement and test the approach with their own AI systems.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/less-is-more-how-chain-of-draft-could-cut-ai-costs-by-90-while-improving-performance\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A team of researchers at Zoom Communications has developed a breakthrough technique that could dramatically reduce the cost and computational resources needed for AI systems to tackle complex reasoning problems, potentially transforming how enterprises deploy AI [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":392,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-391","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/03\/nuneybits_Vector_art_of_a_retro_computer_spitting_out_dollar_bi_5d1a2373-4901-4ffb-a09c-7a4df993eb0b.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/391","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=391"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/391\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/392"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69b0ea1f46fa5c3231e56837. Config Timestamp: 2026-03-11 04:05:51 UTC, Cached Timestamp: 2026-04-08 03:20:06 UTC -->