{"id":847,"date":"2025-03-28T15:00:05","date_gmt":"2025-03-28T15:00:05","guid":{"rendered":"https:\/\/violethoward.com\/new\/the-tao-of-data-how-databricks-is-optimizing-ai-llm-fine-tuning-without-data-labels\/"},"modified":"2025-03-28T15:00:05","modified_gmt":"2025-03-28T15:00:05","slug":"the-tao-of-data-how-databricks-is-optimizing-ai-llm-fine-tuning-without-data-labels","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/the-tao-of-data-how-databricks-is-optimizing-ai-llm-fine-tuning-without-data-labels\/","title":{"rendered":"The TAO of data: How Databricks is optimizing\u00a0 AI LLM fine-tuning without data labels"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>AI models perform only as well as the data used to train or fine-tune them.<\/p>\n\n\n\n<p>Labeled data has been a foundational element of machine learning (ML) and generative AI for much of their history. Labeled data is information tagged to help AI models understand context during training.<\/p>\n\n\n\n<p>As enterprises race to implement AI applications, the hidden bottleneck often isn\u2019t technology \u2013 it\u2019s the months-long process of collecting, curating and labeling domain-specific data. This \u201cdata labeling tax\u201d has forced technical leaders to choose between delaying deployment or accepting suboptimal performance from generic models.<\/p>\n\n\n\n<p>Databricks is taking direct aim at that challenge.\u00a0<\/p>\n\n\n\n<p>This week, the company released research on a new approach called Test-time Adaptive Optimization (TAO). The basic idea behind the approach is to enable enterprise-grade large language model (LLM) tuning using only input data that companies already have \u2013 no labels required \u2013 while achieving results that outperform traditional fine-tuning on thousands of labeled examples. Databricks started as a data lakehouse platform vendor and increasingly focused on AI in recent years. <span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">Databricks\u00a0acquired MosaicML\u00a0for $1.3 billion and is steadily rolling out tools that help developers\u00a0create A<\/span>I apps rapidly. The Mosaic research team at Databricks developed the new TAO method.<\/p>\n\n\n\n<p>\u201cGetting labeled data is hard and poor labels will directly lead to poor outputs, this is why frontier labs use data labeling vendors to buy expensive human-annotated data,\u201d Brandon Cui, reinforcement learning lead and senior research scientist at Databricks told VentureBeat. \u201cWe want to meet customers where they are, labels were an obstacle to enterprise AI adoption, and with TAO, no longer.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-technical-innovation-how-tao-reinvents-llm-fine-tuning\">The technical innovation: How TAO reinvents LLM fine-tuning<\/h2>\n\n\n\n<p>At its core, TAO shifts the paradigm of how developers personalize models for specific domains.<\/p>\n\n\n\n<p>Rather than the conventional supervised fine-tuning approach, which requires paired input-output examples, TAO uses reinforcement learning and systematic exploration to improve models using only example queries.<\/p>\n\n\n\n<p>The technical pipeline employs four distinct mechanisms working in concert:<\/p>\n\n\n\n<p><strong>Exploratory response generation<\/strong>: The system takes unlabeled input examples and generates multiple potential responses for each using advanced prompt engineering techniques that explore the solution space.<\/p>\n\n\n\n<p><strong>Enterprise-calibrated reward modeling<\/strong>: Generated responses are evaluated by the Databricks Reward Model (DBRM), which is specifically engineered to assess performance on enterprise tasks with emphasis on correctness. <\/p>\n\n\n\n<p><strong>Reinforcement <\/strong><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\"><strong>learning-based model optimization<\/strong>: The model parameters are then optimized through reinforcement learning, which essentially teaches<\/span> the model to generate high-scoring responses directly.<\/p>\n\n\n\n<p><strong>Continuous data flywheel<\/strong>: As users interact with the deployed system, new inputs are automatically collected, creating a self-improving loop without additional human labeling effort.<\/p>\n\n\n\n<p>Test-time compute is not a new idea. OpenAI used test-time compute to develop the o1 reasoning model, and DeepSeek applied similar techniques to train the R1 model. What distinguishes TAO from other test-time compute methods is that while it uses additional compute during training, the final tuned model has the same inference cost as the original model. This offers a critical advantage for production deployments where inference costs scale with usage.<\/p>\n\n\n\n<p>\u201cTAO only uses additional compute as part of the training process; it does not increase the model\u2019s inference cost after training,\u201d Cui explained. \u201cIn the long run, we think TAO and test-time compute approaches like o1 and R1 will be complementary\u2014you can do both.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-benchmarks-reveal-surprising-performance-edge-over-traditional-fine-tuning\">Benchmarks reveal surprising performance edge over traditional fine-tuning<\/h2>\n\n\n\n<p>Databricks\u2019 research reveals TAO doesn\u2019t just match traditional fine-tuning \u2013 it surpasses it. Across multiple enterprise-relevant benchmarks, Databricks claims the approach is better despite using significantly less human effort.<\/p>\n\n\n\n<p>On FinanceBench (a financial document Q&amp;A benchmark), TAO improved Llama 3.1 8B performance by 24.7 percentage points and Llama 3.3 70B by 13.4 points. For SQL generation using the BIRD-SQL benchmark adapted to Databricks\u2019 dialect, TAO delivered improvements of 19.1 and 8.7 points, respectively.<\/p>\n\n\n\n<p>Most remarkably, the TAO-tuned Llama 3.3 70B approached the performance of GPT-4o and o3-mini across these benchmarks\u2014models that typically cost 10-20x more to run in production environments.<\/p>\n\n\n\n<p>This presents a compelling value proposition for technical decision-makers: the ability to deploy smaller, more affordable models that perform comparably to their premium counterparts on domain-specific tasks, without the traditionally required extensive labeling costs.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1448\" height=\"870\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?w=800\" alt=\"\" class=\"wp-image-3002219\" srcset=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png 1448w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=300,180 300w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=768,461 768w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=800,481 800w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=400,240 400w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=750,451 750w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=578,347 578w, https:\/\/venturebeat.com\/wp-content\/uploads\/2025\/03\/tao-data.png?resize=930,559 930w\" sizes=\"(max-width: 1448px) 100vw, 1448px\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-tao-enables-time-to-market-advantage-for-enterprises\">TAO enables time-to-market advantage for enterprises<\/h2>\n\n\n\n<p>While TAO delivers clear cost advantages by enabling the use of smaller, more efficient models, its greatest value may be in accelerating time-to-market for AI initiatives.<\/p>\n\n\n\n<p>\u201cWe think TAO saves enterprises something more valuable than money: it saves them time,\u201d Cui emphasized. \u201cGetting labeled data typically requires crossing organizational boundaries, setting up new processes, getting subject matter experts to do the labeling and verifying the quality. Enterprises don\u2019t have months to align multiple business units just to prototype one AI use case.\u201d<\/p>\n\n\n\n<p>This time compression creates a strategic advantage. For example, a financial services company implementing a contract analysis solution could begin deploying and iterating using only sample contracts, rather than waiting for legal teams to label thousands of documents. Similarly, healthcare organizations could improve clinical decision support systems using only physician queries, without requiring paired expert responses.<\/p>\n\n\n\n<p>\u201cOur researchers spend a lot of time talking to our customers, understanding the real challenges they face when building AI systems, and developing new technologies to overcome those challenges,\u201d Cui said. \u201cWe are already applying TAO across many enterprise applications and helping customers continuously iterate and improve their models.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-this-means-for-technical-decision-makers\">What this means for technical decision-makers<\/h2>\n\n\n\n<p>For enterprises looking to lead in AI adoption, TAO represents a potential inflection point in how specialized AI systems are deployed. Achieving high-quality, domain-specific performance without extensive labeled datasets removes one of the most significant barriers to widespread AI implementation.<\/p>\n\n\n\n<p>This approach particularly benefits organizations with rich troves of unstructured data and domain-specific requirements but limited resources for manual labeling \u2013 precisely the position in which many enterprises find themselves.<\/p>\n\n\n\n<p>As AI becomes increasingly central to competitive advantage, technologies that compress the time from concept to deployment while simultaneously improving performance will separate leaders from laggards. TAO appears poised to be such a technology, potentially enabling enterprises to implement specialized AI capabilities in weeks rather than months or quarters.<\/p>\n\n\n\n<p>Currently, TAO is only available on the Databricks platform and is in private preview.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/data-infrastructure\/the-tao-of-data-how-databricks-is-optimizing-ai-llm-fine-tuning-without-data-labels\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More AI models perform only as well as the data used to train or fine-tune them. Labeled data has been a foundational element of machine learning (ML) and generative AI for much of their history. Labeled data [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":848,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-847","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/03\/enterprise_ai_data_labeling-smk.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=847"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/847\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/848"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-28 23:51:24 UTC -->