{"id":1547,"date":"2025-05-13T01:18:49","date_gmt":"2025-05-13T01:18:49","guid":{"rendered":"https:\/\/violethoward.com\/new\/sakana-introduces-new-ai-architecture-continuous-thought-machines-to-make-models-reason-with-less-guidance-like-human-brains\/"},"modified":"2025-05-13T01:18:49","modified_gmt":"2025-05-13T01:18:49","slug":"sakana-introduces-new-ai-architecture-continuous-thought-machines-to-make-models-reason-with-less-guidance-like-human-brains","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/sakana-introduces-new-ai-architecture-continuous-thought-machines-to-make-models-reason-with-less-guidance-like-human-brains\/","title":{"rendered":"Sakana introduces new AI architecture, &#8216;Continuous Thought Machines&#8217; to make models reason with less guidance \u2014 like human brains"},"content":{"rendered":" \r\n<br><div>\n\t\t\t\t<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity is-style-wide\"\/>\n<\/div><p>Tokyo-based artificial intelligence startup Sakana, co-founded by former top Google AI scientists including Llion Jones and David Ha, has unveiled a new type of AI model architecture called Continuous Thought Machines (CTM).<\/p>\n\n\n\n<p>CTMs are designed to usher in a new era of AI language models that will be more flexible and able to handle a wider range of cognitive tasks \u2014 such as solving complex mazes or navigation tasks without positional cues or pre-existing spatial embeddings \u2014 moving them closer to the way human beings reason through unfamiliar problems. <\/p>\n\n\n\n<p>Rather than relying on fixed, parallel layers that process inputs all at once \u2014 as Transformer models do \u2014CTMs unfold computation over steps within each input\/output unit, known as an artificial \u201cneuron.\u201d <\/p>\n\n\n\n<p>Each neuron in the model retains a short history of its previous activity and uses that memory to decide when to activate again. <\/p>\n\n\n\n<p>This added internal state allows CTMs to adjust the depth and duration of their reasoning dynamically, depending on the complexity of the task. As such, each neuron is far more informationally dense and complex than in a typical Transformer model. <\/p>\n\n\n\n<p>The startup has posted a paper on the open access journal arXiv describing its work, a microsite and Github repository. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-ctms-differ-from-transformer-based-llms\">How CTMs differ from Transformer-based LLMs<\/h2>\n\n\n\n<p>Most modern large language models (LLMs) are still fundamentally based upon the \u201cTransformer\u201d architecture outlined in the seminal 2017 paper from Google Brain researchers entitled \u201cAttention Is All You Need.\u201d <\/p>\n\n\n\n<p>These models use parallelized, fixed-depth layers of artificial neurons to process inputs in a single pass \u2014 whether those inputs come from user prompts at inference time or labeled data during training.<\/p>\n\n\n\n<p>By contrast, CTMs allow each artificial neuron to operate on its own internal timeline, making activation decisions based on a short-term memory of its previous states. These decisions unfold over internal steps known as \u201cticks,\u201d enabling the model to adjust its reasoning duration dynamically. <\/p>\n\n\n\n<p>This time-based architecture allows CTMs to reason progressively, adjusting how long and how deeply they compute \u2014 taking a different number of ticks based on the complexity of the input. <\/p>\n\n\n\n<p>Neuron-specific memory and synchronization help determine when computation should continue \u2014 or stop.<\/p>\n\n\n\n<p>The number of ticks changes according to the information inputted, and may be more or less even if the input information is identical, because <em>each neuron<\/em> is deciding how many ticks to undergo before providing an output (or not providing one at all). <\/p>\n\n\n\n<p>This represents both a technical and philosophical departure from conventional deep learning, moving toward a more biologically grounded model. Sakana has framed CTMs as a step toward more brain-like intelligence\u2014systems that adapt over time, process information flexibly, and engage in deeper internal computation when needed.<\/p>\n\n\n\n<p>Sakana\u2019s goal is to \u201cto eventually achieve levels of competency that rival or surpass human brains.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-using-variable-custom-timelines-to-provide-more-intelligence\">Using variable, custom timelines to provide more intelligence<\/h2>\n\n\n\n<p>The CTM is built around two key mechanisms. <\/p>\n\n\n\n<p>First, each neuron in the model maintains a short \u201chistory\u201d or working memory of when it activated and why, and uses this history to make a decision of when to fire next.  <\/p>\n\n\n\n<p>Second, neural synchronization \u2014 how and when <em>groups<\/em> of a model\u2019s artificial neurons \u201cfire,\u201d or process information together \u2014 is allowed to happen organically.<\/p>\n\n\n\n<p>Groups of neurons decide when to fire together based on internal alignment, not external instructions or reward shaping. These synchronization events are used to modulate attention and produce outputs \u2014 that is, attention is directed toward those areas where more neurons are firing. <\/p>\n\n\n\n<p>The model isn\u2019t just processing data, it\u2019s timing its thinking to match the complexity of the task.<\/p>\n\n\n\n<p>Together, these mechanisms let CTMs reduce computational load on simpler tasks while applying deeper, prolonged reasoning where needed. <\/p>\n\n\n\n<p>In demonstrations ranging from image classification and 2D maze solving to reinforcement learning, CTMs have shown both interpretability and adaptability. Their internal \u201cthought\u201d steps allow researchers to observe how decisions form over time\u2014a level of transparency rarely seen in other model families.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-early-results-how-ctms-compare-to-transformer-models-on-key-benchmarks-and-tasks\">Early results: how CTMs compare to Transformer models on key benchmarks and tasks<\/h2>\n\n\n\n<p>Sakana AI\u2019s Continuous Thought Machine is not designed to chase leaderboard-topping benchmark scores, but its early results indicate that its biologically inspired design does not come at the cost of practical capability. <\/p>\n\n\n\n<p>On the widely used ImageNet-1K benchmark, the CTM achieved 72.47% top-1 and 89.89% top-5 accuracy. <\/p>\n\n\n\n<p>While this falls short of state-of-the-art transformer models like ViT or ConvNeXt, it remains competitive\u2014especially considering that the CTM architecture is fundamentally different and was not optimized solely for performance.<\/p>\n\n\n\n<p>What stands out more are CTM\u2019s behaviors in sequential and adaptive tasks. In maze-solving scenarios, the model produces step-by-step directional outputs from raw images\u2014without using positional embeddings, which are typically essential in transformer models. Visual attention traces reveal that CTMs often attend to image regions in a human-like sequence, such as identifying facial features from eyes to nose to mouth.<\/p>\n\n\n\n<p>The model also exhibits strong calibration: its confidence estimates closely align with actual prediction accuracy. Unlike most models that require temperature scaling or post-hoc adjustments, CTMs improve calibration naturally by averaging predictions over time as their internal reasoning unfolds.<\/p>\n\n\n\n<p>This blend of sequential reasoning, natural calibration, and interpretability offers a valuable trade-off for applications where trust and traceability matter as much as raw accuracy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-s-needed-before-ctms-are-ready-for-enterprise-and-commercial-deployment\">What\u2019s needed before CTMs are ready for enterprise and commercial deployment?<\/h2>\n\n\n\n<p>While CTMs show substantial promise, the architecture is still experimental and not yet optimized for commercial deployment. Sakana AI presents the model as a platform for further research and exploration rather than a plug-and-play enterprise solution.<\/p>\n\n\n\n<p>Training CTMs currently demands more resources than standard transformer models. Their dynamic temporal structure expands the state space, and careful tuning is needed to ensure stable, efficient learning across internal time steps. Additionally, debugging and tooling support is still catching up\u2014many of today\u2019s libraries and profilers are not designed with time-unfolding models in mind.<\/p>\n\n\n\n<p>Still, Sakana has laid a strong foundation for community adoption. The full CTM implementation is open-sourced on GitHub and includes domain-specific training scripts, pretrained checkpoints, plotting utilities, and analysis tools. Supported tasks include image classification (ImageNet, CIFAR), 2D maze navigation, QAMNIST, parity computation, sorting, and reinforcement learning.<\/p>\n\n\n\n<p>An interactive web demo also lets users explore the CTM in action, observing how its attention shifts over time during inference\u2014a compelling way to understand the architecture\u2019s reasoning flow.<\/p>\n\n\n\n<p>For CTMs to reach production environments, further progress is needed in optimization, hardware efficiency, and integration with standard inference pipelines. But with accessible code and active documentation, Sakana has made it easy for researchers and engineers to begin experimenting with the model today.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-enterprise-ai-leaders-should-know-about-ctms\">What enterprise AI leaders should know about CTMs<\/h2>\n\n\n\n<p>The CTM architecture is still in its early days, but enterprise decision-makers should already take note. Its ability to adaptively allocate compute, self-regulate depth of reasoning, and offer clear interpretability may prove highly valuable in production systems facing variable input complexity or strict regulatory requirements.<\/p>\n\n\n\n<p>AI engineers managing model deployment will find value in CTM\u2019s energy-efficient inference \u2014 especially in large-scale or latency-sensitive applications. <\/p>\n\n\n\n<p>Meanwhile, the architecture\u2019s step-by-step reasoning unlocks richer explainability, enabling organizations to trace not just what a model predicted, but how it arrived there.<\/p>\n\n\n\n<p>For orchestration and MLOps teams, CTMs integrate with familiar components like ResNet-based encoders, allowing smoother incorporation into existing workflows. And infrastructure leads can use the architecture\u2019s profiling hooks to better allocate resources and monitor performance dynamics over time.<\/p>\n\n\n\n<p>CTMs aren\u2019t ready to replace transformers, but they represent a new category of model with novel affordances. For organizations prioritizing safety, interpretability, and adaptive compute, the architecture deserves close attention.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-sakana-s-checkered-ai-research-history\">Sakana\u2019s checkered AI research history<\/h2>\n\n\n\n<p>In February, Sakana introduced the AI CUDA Engineer, an agentic AI system designed to automate the production of highly optimized CUDA kernels, the instruction sets that allow Nvidia\u2019s (and others\u2019) graphics processing units (GPUs) to run code efficiently in parallel across multiple \u201cthreads\u201d or computational units. <\/p>\n\n\n\n<p>The promise was significant: speedups of 10x to 100x in ML operations. However, shortly after release, external reviewers discovered that the system was exploiting weaknesses in the evaluation sandbox\u2014essentially \u201ccheating\u201d by bypassing correctness checks through a memory exploit.<\/p>\n\n\n\n<p>In a public post, Sakana acknowledged the issue and credited community members with flagging it.<\/p>\n\n\n\n<p>They\u2019ve since overhauled their evaluation and runtime profiling tools to eliminate similar loopholes and are revising their results and research paper accordingly. The incident offered a real-world test of one of Sakana\u2019s stated values: embracing iteration and transparency in pursuit of better AI systems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-betting-on-evolutionary-mechanisms\">Betting on evolutionary mechanisms<\/h2>\n\n\n\n<p>Sakana AI\u2019s founding ethos lies in merging evolutionary computation with modern machine learning. The company believes current models are too rigid\u2014locked into fixed architectures and requiring retraining for new tasks. <\/p>\n\n\n\n<p>By contrast, Sakana aims to create models that adapt in real time, exhibit emergent behavior, and scale naturally through interaction and feedback, much like organisms in an ecosystem.<\/p>\n\n\n\n<p>This vision is already manifesting in products like Transformer\u00b2, a system that adjusts LLM parameters at inference time without retraining, using algebraic tricks like singular-value decomposition. <\/p>\n\n\n\n<p>It\u2019s also evident in their commitment to open-sourcing systems like the AI Scientist\u2014even amid controversy\u2014demonstrating a willingness to engage with the broader research community, not just compete with it.<\/p>\n\n\n\n<p>As large incumbents like OpenAI and Google double down on foundation models, Sakana is charting a different course: small, dynamic, biologically inspired systems that think in time, collaborate by design, and evolve through experience.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\"><div class=\"Boilerplate__newsletter-container vb\">\n<div class=\"Boilerplate__newsletter-main\">\n<p><strong>Daily insights on business use cases with VB Daily<\/strong><\/p>\n<p class=\"copy\">If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n<p class=\"Form__newsletter-legal\">Read our Privacy Policy<\/p>\n<p class=\"Form__success\" id=\"boilerplateNewsletterConfirmation\">\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n<p class=\"Form__error\">An error occured.<\/p>\n<\/p><\/div>\n<div class=\"image-container\">\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/venturebeat.com\/wp-content\/themes\/vb-news\/brand\/img\/vb-daily-phone.png\" alt=\"\"\/>\n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n<br>\r\n<br><a href=\"https:\/\/venturebeat.com\/ai\/sakana-introduces-new-ai-architecture-continuous-thought-machines-to-make-models-reason-with-less-guidance-like-human-brains\/\">Source link <\/a>","protected":false},"excerpt":{"rendered":"<p>Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Tokyo-based artificial intelligence startup Sakana, co-founded by former top Google AI scientists including Llion Jones and David Ha, has unveiled a new type of AI model architecture called Continuous Thought Machines (CTM). CTMs are designed to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1548,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-1547","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/05\/cfr0z3n_stark_crisp_neat_pop_art_colorful_flat_illustration_pol_570ae7e2-ddaf-4775-b7da-c0dd0257c8d0.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1547","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=1547"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/1547\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/1548"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=1547"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=1547"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=1547"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69e302c146fa5c92dc28ac12. Config Timestamp: 2026-04-18 04:04:16 UTC, Cached Timestamp: 2026-04-29 05:55:30 UTC -->