{"id":1984,"date":"2025-06-20T15:26:56","date_gmt":"2025-06-20T15:26:56","guid":{"rendered":"https:\/\/violethoward.com\/new\/googles-gemini-transparency-cut-leaves-enterprise-developers-debugging-blind\/"},"modified":"2025-06-20T15:26:56","modified_gmt":"2025-06-20T15:26:56","slug":"googles-gemini-transparency-cut-leaves-enterprise-developers-debugging-blind","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/googles-gemini-transparency-cut-leaves-enterprise-developers-debugging-blind\/","title":{"rendered":"Google&#8217;s Gemini transparency cut leaves enterprise developers &#8216;debugging blind&#8217;"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy.\u00a0Learn more<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Google\u2018s recent decision to hide the raw reasoning tokens of its flagship model, Gemini 2.5 Pro, has sparked a fierce backlash from developers who have been relying on that transparency to build and debug applications.\u00a0<\/p>\n\n\n\n<p>The change, which echoes a similar move by OpenAI, replaces the model\u2019s step-by-step reasoning with a simplified summary. The response highlights a critical tension between creating a polished user experience and providing the observable, trustworthy tools that enterprises need.<\/p>\n\n\n\n<p>As businesses integrate large language models (LLMs) into more complex and mission-critical systems, the debate over how much of the model\u2019s internal workings should be exposed is becoming a defining issue for the industry.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-fundamental-downgrade-in-ai-transparency\">A \u2018fundamental downgrade\u2019 in AI transparency<\/h2>\n\n\n\n<p>To solve complex problems, advanced AI models generate an internal monologue, also referred to as the \u201cChain of Thought\u201d (CoT). This is a series of intermediate steps (e.g., a plan, a draft of code, a self-correction) that the model produces before arriving at its final answer. For example, it might reveal how it is processing data, which bits of information it is using, how it is evaluating its own code, etc.\u00a0<\/p>\n\n\n\n<p>For developers, this reasoning trail often serves as an essential diagnostic and debugging tool. When a model provides an incorrect or unexpected output, the thought process reveals where its logic went astray. And it happened to be one of the key advantages of Gemini 2.5 Pro over OpenAI\u2019s o1 and o3.\u00a0<\/p>\n\n\n\n<p>In Google\u2019s AI developer forum, users called the removal of this feature a \u201cmassive regression.\u201d Without it, developers are left in the dark. As one user on the Google forum said, \u201cI can\u2019t accurately diagnose any issues if I can\u2019t see the raw chain of thought like we used to.\u201d Another described being forced to \u201cguess\u201d why the model failed, leading to \u201cincredibly frustrating, repetitive loops trying to fix things.\u201d<\/p>\n\n\n\n<p>Beyond debugging, this transparency is crucial for building sophisticated AI systems. Developers rely on the CoT to fine-tune prompts and system instructions, which are the primary ways to steer a model\u2019s behavior. The feature is especially important for creating agentic workflows, where the AI must execute a series of tasks. One developer noted, \u201cThe CoTs helped enormously in tuning agentic workflows correctly.\u201d\u00a0<\/p>\n\n\n\n<p>For enterprises, this move toward opacity can be problematic. Black-box AI models that hide their reasoning introduce significant risk, making it difficult to trust their outputs in high-stakes scenarios. This trend, started by OpenAI\u2019s o-series reasoning models and now adopted by Google, creates a clear opening for open-source alternatives such as DeepSeek-R1 and QwQ-32B.\u00a0<\/p>\n\n\n\n<p>Models that provide full access to their reasoning chains give enterprises more control and transparency over the model\u2019s behavior. The decision for a CTO or AI lead is no longer just about which model has the highest benchmark scores. It is now a strategic choice between a top-performing but opaque model and a more transparent one that can be integrated with greater confidence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-google-s-response-nbsp\">Google\u2019s response\u00a0<\/h2>\n\n\n\n<p>In response to the outcry, members of the Google team explained their rationale. Logan Kilpatrick, a senior product manager at Google DeepMind, clarified that the change was \u201cpurely cosmetic\u201d and does not impact the model\u2019s internal performance. He noted that for the consumer-facing Gemini app, hiding the lengthy thought process creates a cleaner user experience. \u201cThe % of people who will or do read thoughts in the Gemini app is very small,\u201d he said.<\/p>\n\n\n\n<p>For developers, the new summaries were intended as a first step toward programmatically accessing reasoning traces through the API, which wasn\u2019t previously possible.\u00a0<\/p>\n\n\n\n<p>The Google team acknowledged the value of raw thoughts for developers. \u201cI hear that you all want raw thoughts, the value is clear, there are use cases that require them,\u201d Kilpatrick wrote, adding that bringing the feature back to the developer-focused AI Studio is \u201csomething we can explore.\u201d\u00a0<\/p>\n\n\n\n<p>Google\u2019s reaction to the developer backlash suggests a middle ground is possible, perhaps through a \u201cdeveloper mode\u201d that re-enables raw thought access. The need for observability will only grow as AI models evolve into more autonomous agents that use tools and execute complex, multi-step plans.\u00a0<\/p>\n\n\n\n<p>As Kilpatrick concluded in his remarks, \u201c\u2026I can easily imagine that raw thoughts becomes a critical requirement of all AI systems given the increasing complexity and need for observability + tracing.\u201d\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-are-reasoning-tokens-overrated\">Are reasoning tokens overrated?<\/h2>\n\n\n\n<p>However, experts suggest there are deeper dynamics at play than just user experience. Subbarao Kambhampati, an AI professor at Arizona State University, questions whether the \u201cintermediate tokens\u201d a reasoning model produces before the final answer can be used as a reliable guide for understanding how the model solves problems. A paper he recently co-authored argues that anthropomorphizing \u201cintermediate tokens\u201d as \u201creasoning traces\u201d or \u201cthoughts\u201d can have dangerous implications.\u00a0<\/p>\n\n\n\n<p>Models often go into endless and unintelligible directions in their reasoning process. Several experiments show that models trained on false reasoning traces and correct results can learn to solve problems just as well as models trained on well-curated reasoning traces. Moreover, the latest generation of reasoning models are trained through reinforcement learning algorithms that only verify the final result and don\u2019t evaluate the model\u2019s \u201creasoning trace.\u201d\u00a0<\/p>\n\n\n\n<p>\u201cThe fact that intermediate token sequences often reasonably look like better-formatted and spelled human scratch work\u2026 doesn\u2019t tell us much about whether they are used for anywhere near the same purposes that humans use them for, let alone about whether they can be used as an interpretable window into what the LLM is \u2018thinking,\u2019 or as a reliable justification of the final answer,\u201d the researchers write.<\/p>\n\n\n\n<p>\u201cMost users can\u2019t make out anything from the volumes of the raw intermediate tokens that these models spew out,\u201d Kambhampati told VentureBeat. \u201cAs we mention, DeepSeek R1 produces 30 pages of pseudo-English in solving a simple planning problem! A cynical explanation of why o1\/o3 decided not to show the raw tokens originally was perhaps because they realized people will notice how incoherent they are!\u201d<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">Maybe there is a reason why even after capitulation OAI is putting out only the &#8220;summaries&#8221; of intermediate tokens (presumably appropriately white washed)..<\/p>\u2014 Subbarao Kambhampati (\u0c15\u0c02\u0c2d\u0c02\u0c2a\u0c3e\u0c1f\u0c3f \u0c38\u0c41\u0c2c\u0c4d\u0c2c\u0c3e\u0c30\u0c3e\u0c35\u0c41) (@rao2z) <a href=\"https:\/\/twitter.com\/rao2z\/status\/1887880905626382799?ref_src=twsrc%5Etfw\">February 7, 2025<\/a><\/blockquote>\n<\/div><\/figure>\n\n\n\n<p>That said, Kambhampati suggests that summaries or post-facto explanations are likely to be more comprehensible to the end users. \u201cThe issue becomes to what extent they are actually indicative of the internal operations that LLMs went through,\u201d he said. \u201cFor example, as a teacher, I might solve a new problem with many false starts and backtracks, but explain the solution in the way I think facilitates student comprehension.\u201d<\/p>\n\n\n\n<p>The decision to hide CoT also serves as a competitive moat. Raw reasoning traces are incredibly valuable training data. As Kambhampati notes, a competitor can use these traces to perform \u201cdistillation,\u201d the process of training a smaller, cheaper model to mimic the capabilities of a more powerful one. Hiding the raw thoughts makes it much harder for rivals to copy a model\u2019s secret sauce, a crucial advantage in a resource-intensive industry.<\/p>\n\n\n\n<p>The debate over Chain of Thought is a preview of a much larger conversation about the future of AI. There is still a lot to learn about the internal workings of reasoning models, how we can leverage them, and how far model providers are willing to go to enable developers to access them.<\/p>\n\n\n\n\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div><template id="3BDkYxHkHVr9jfTu78WG"></template><\/script>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/googles-gemini-transparency-cut-leaves-enterprise-developers-debugging-blind\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy.\u00a0Learn more Google\u2018s recent decision to hide the raw reasoning tokens of its flagship model, Gemini 2.5 Pro, has sparked a fierce backlash from developers who have been relying on that transparency to build [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1985,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-1984","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/06\/LLM-reasoning.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1984","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=1984"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1984\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/1985"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=1984"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=1984"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=1984"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 09:45:09 UTC -->