{"id":2266,"date":"2025-07-04T02:57:59","date_gmt":"2025-07-04T02:57:59","guid":{"rendered":"https:\/\/violethoward.com\/new\/sakana-ais-treequest-deploy-multi-model-teams-that-outperform-individual-llms-by-30\/"},"modified":"2025-07-04T02:57:59","modified_gmt":"2025-07-04T02:57:59","slug":"sakana-ais-treequest-deploy-multi-model-teams-that-outperform-individual-llms-by-30","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/sakana-ais-treequest-deploy-multi-model-teams-that-outperform-individual-llms-by-30\/","title":{"rendered":"Sakana AI&#8217;s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> <em>Subscribe Now<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Japanese AI lab Sakana AI has introduced a new technique that allows multiple large language models (LLMs) to cooperate on a single task, effectively creating a \u201cdream team\u201d of AI agents. The method, called Multi-LLM AB-MCTS, enables models to perform trial-and-error and combine their unique strengths to solve problems that are too complex for any individual model.<\/p>\n\n\n\n<p>For enterprises, this approach provides a means to develop more robust and capable AI systems. Instead of being locked into a single provider or model, businesses could dynamically leverage the best aspects of different frontier models, assigning the right AI for the right part of a task to achieve superior results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-power-of-collective-intelligence\">The power of collective intelligence<\/h2>\n\n\n\n<p>Frontier AI models are evolving rapidly. However, each model has its own distinct strengths and weaknesses derived from its unique training data and architecture. One might excel at coding, while another excels at creative writing. Sakana AI\u2019s researchers argue that these differences are not a bug, but a feature.<\/p>\n\n\n\n<p>\u201cWe see these biases and varied aptitudes not as limitations, but as precious resources for creating collective intelligence,\u201d the researchers state in their blog post. They believe that just as humanity\u2019s greatest achievements come from diverse teams, AI systems can also achieve more by working together. \u201cBy pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-thinking-longer-at-inference-time\">Thinking longer at inference time<\/h2>\n\n\n\n<p>Sakana AI\u2019s new algorithm is an \u201cinference-time scaling\u201d technique (also referred to as \u201ctest-time scaling\u201d), an area of research that has become very popular in the past year. While most of the focus in AI has been on \u201ctraining-time scaling\u201d (making models bigger and training them on larger datasets), inference-time scaling improves performance by allocating more computational resources after a model is already trained.\u00a0<\/p>\n\n\n\n<p>One common approach involves using reinforcement learning to prompt models to generate longer, more detailed chain-of-thought (CoT) sequences, as seen in popular models such as OpenAI o3 and DeepSeek-R1. Another, simpler method is repeated sampling, where the model is given the same prompt multiple times to generate a variety of potential solutions, similar to a brainstorming session. Sakana AI\u2019s work combines and advances these ideas.<\/p>\n\n\n\n<p>\u201cOur framework offers a smarter, more strategic version of Best-of-N (aka repeated sampling),\u201d Takuya Akiba, research scientist at Sakana AI and co-author of the paper, told VentureBeat. \u201cIt complements reasoning techniques like long CoT through RL. By dynamically selecting the search strategy and the appropriate LLM, this approach maximizes performance within a limited number of LLM calls, delivering better results on complex tasks.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-adaptive-branching-search-works\">How adaptive branching search works<\/h2>\n\n\n\n<p>The core of the new method is an algorithm called Adaptive Branching Monte Carlo Tree Search (AB-MCTS). It enables an LLM to effectively perform trial-and-error by intelligently balancing two different search strategies: \u201csearching deeper\u201d and \u201csearching wider.\u201d Searching deeper involves taking a promising answer and repeatedly refining it, while searching wider means generating completely new solutions from scratch. AB-MCTS combines these approaches, allowing the system to improve a good idea but also to pivot and try something new if it hits a dead end or discovers another promising direction.<\/p>\n\n\n\n<p>To accomplish this, the system uses Monte Carlo Tree Search (MCTS), a decision-making algorithm famously used by DeepMind\u2019s AlphaGo. At each step, AB-MCTS uses probability models to decide whether it\u2019s more strategic to refine an existing solution or generate a new one.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" height=\"422\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?w=800\" alt=\"\" class=\"wp-image-3013775\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png 3130w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=300,158 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=768,406 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=800,422 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=1536,811 1536w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=2048,1082 2048w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=400,211 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=750,396 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=578,305 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_ac751e.png?resize=930,491 930w\" sizes=\"(max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>Different test-time scaling strategies Source: Sakana AI<\/em><\/figcaption><\/figure>\n\n\n\n<p>The researchers took this a step further with Multi-LLM AB-MCTS, which not only decides \u201cwhat\u201d to do (refine vs. generate) but also \u201cwhich\u201d LLM should do it. At the start of a task, the system doesn\u2019t know which model is best suited for the problem. It begins by trying a balanced mix of available LLMs and, as it progresses, learns which models are more effective, allocating more of the workload to them over time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-putting-the-ai-dream-team-to-the-test\">Putting the AI \u2018dream team\u2019 to the test<\/h2>\n\n\n\n<p>The researchers tested their Multi-LLM AB-MCTS system on the ARC-AGI-2 benchmark. ARC (Abstraction and Reasoning Corpus) is designed to test a human-like ability to solve novel visual reasoning problems, making it notoriously difficult for AI.\u00a0<\/p>\n\n\n\n<p>The team used a combination of frontier models, including o4-mini, Gemini 2.5 Pro, and DeepSeek-R1.<\/p>\n\n\n\n<p>The collective of models was able to find correct solutions for over 30% of the 120 test problems, a score that significantly outperformed any of the models working alone. The system demonstrated the ability to dynamically assign the best model for a given problem. On tasks where a clear path to a solution existed, the algorithm quickly identified the most effective LLM and used it more frequently.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" height=\"414\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?w=800\" alt=\"AB-MCTS vs individual models (source: Sakana AI)\" class=\"wp-image-3013776\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png 2048w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=300,155 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=768,398 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=800,414 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=1536,796 1536w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=400,207 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=750,389 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=578,299 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_fae32e.png?resize=930,482 930w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>AB-MCTS vs individual models Source: Sakana AI<\/em><\/figcaption><\/figure>\n\n\n\n<p>More impressively, the team observed instances where the models solved problems that were previously impossible for any single one of them. In one case, a solution generated by the o4-mini model was incorrect. However, the system passed this flawed attempt to DeepSeek-R1 and Gemini-2.5 Pro, which were able to analyze the error, correct it, and ultimately produce the right answer.\u00a0<\/p>\n\n\n\n<p>\u201cThis demonstrates that Multi-LLM AB-MCTS can flexibly combine frontier models to solve previously unsolvable problems, pushing the limits of what is achievable by using LLMs as a collective intelligence,\u201d the researchers write.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" height=\"170\" width=\"800\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?w=800\" alt=\"AB-MTCS can select different models at different stages of solving a problem (source: Sakana AI)\" class=\"wp-image-3013777\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png 2048w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=300,64 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=768,163 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=800,170 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=1536,326 1536w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=400,85 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=750,159 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=578,122 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/07\/image_579cf6.png?resize=930,197 930w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\"\/><figcaption class=\"wp-element-caption\"><em>AB-MTCS can select different models at different stages of solving a problem Source: Sakana AI<\/em><\/figcaption><\/figure>\n\n\n\n<p>\u201cIn addition to the individual pros and cons of each model, the tendency to hallucinate can vary significantly among them,\u201d Akiba said. \u201cBy creating an ensemble with a model that is less likely to hallucinate, it could be possible to achieve the best of both worlds: powerful logical capabilities and strong groundedness. Since hallucination is a major issue in a business context, this approach could be valuable for its mitigation.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-from-research-to-real-world-applications\">From research to real-world applications<\/h2>\n\n\n\n<p>To help developers and businesses apply this technique, Sakana AI has released the underlying algorithm as an open-source framework called TreeQuest, available under an Apache 2.0 license (usable for commercial purposes). TreeQuest provides a flexible API, allowing users to implement Multi-LLM AB-MCTS for their own tasks with custom scoring and logic.<\/p>\n\n\n\n<p>\u201cWhile we are in the early stages of applying AB-MCTS to specific business-oriented problems, our research reveals significant potential in several areas,\u201d Akiba said.\u00a0<\/p>\n\n\n\n<p>Beyond the ARC-AGI-2 benchmark, the team was able to successfully apply AB-MCTS to tasks like complex algorithmic coding and improving the accuracy of machine learning models.\u00a0<\/p>\n\n\n\n<p>\u201cAB-MCTS could also be highly effective for problems that require iterative trial-and-error, such as optimizing performance metrics of existing software,\u201d Akiba said. \u201cFor example, it could be used to automatically find ways to improve the response latency of a web service.\u201d<\/p>\n\n\n\n<p>The release of a practical, open-source tool could pave the way for a new class of more powerful and reliable enterprise AI applications.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/sakana-ais-treequest-deploy-multi-model-teams-that-outperform-individual-llms-by-30\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Japanese AI lab Sakana AI has introduced a new technique that allows multiple large language models (LLMs) to cooperate on a single task, effectively creating a \u201cdream team\u201d of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2267,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-2266","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/07\/ChatGPT-Image-Jul-3-2025-09_58_19-PM.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2266","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=2266"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2266\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/2267"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=2266"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=2266"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=2266"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 12:02:35 UTC -->