{"id":2908,"date":"2025-08-01T11:52:05","date_gmt":"2025-08-01T11:52:05","guid":{"rendered":"https:\/\/violethoward.com\/new\/deep-cogito-v2-open-source-models-have-self-improving-intuition\/"},"modified":"2025-08-01T11:52:05","modified_gmt":"2025-08-01T11:52:05","slug":"deep-cogito-v2-open-source-models-have-self-improving-intuition","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/deep-cogito-v2-open-source-models-have-self-improving-intuition\/","title":{"rendered":"Deep Cogito v2 open source models have self-improving intuition"},"content":{"rendered":" \r\n
\n\t\t\t\t
\n

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> Subscribe Now<\/em><\/p>\n\n\n\n


\n<\/div>

Deep Cogito, a lesser-known AI research startup based in San Francisco founded by ex-Googlers, has released four new open-ish large language models (LLMs) that attempt something few others do: Learning how to reason more effectively over time \u2014 and get better at it on their own.<\/p>\n\n\n\n

The models, released as part of Cogito\u2019s v2 family, range from 70 billion to 671 billion parameters and are available for AI developers and enterprises to use under a mix of limited and fully open licensing terms. They include:<\/p>\n\n\n\n

    \n
  • Cogito v2-70B (Dense)<\/strong><\/li>\n\n\n\n
  • Cogito v2-109B (Mixture-of-experts)<\/strong><\/li>\n\n\n\n
  • Cogito v2-405B (Dense)<\/strong><\/li>\n\n\n\n
  • Cogito v2-671B (MoE)<\/strong><\/li>\n<\/ul>\n\n\n\n

    Dense and MoE models are each suited to different needs. Dense 70B and 405B variant models activate all parameters on every forward pass, making them more predictable and easier to deploy across a wide range of hardware.<\/p>\n\n\n\n

    They\u2019re ideal for low-latency applications, fine-tuning and environments with limited GPU capacity. MoE models, such as the 109B and 671B versions, use a sparse routing mechanism to activate only a few specialized \u201cexpert\u201d subnetworks at a time, allowing for much larger total model sizes without proportional increases in compute cost.<\/p>\n\n\n\n

    \n
    \n\n\n\n

    The AI Impact Series Returns to San Francisco – August 5<\/strong><\/p>\n\n\n\n

    The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.<\/p>\n\n\n\n

    Secure your spot now – space is limited: https:\/\/bit.ly\/3GuuPLF<\/p>\n\n\n\n


    \n<\/div>

    This makes them well-suited for high-performance inference tasks, research into complex reasoning or serving frontier-level accuracy at lower runtime expense. In Cogito v2, the 671B MoE model serves as the flagship, leveraging its scale and routing efficiency to match or exceed leading open models on benchmarks \u2014 while using significantly shorter reasoning chains.<\/p>\n\n\n\n

    The models are available now on Hugging Face for download and usage by enterprises and on Unsloth for local usage, or, for those who can\u2019t host the model inferences on their own hardware, through application programming interfaces (APIs) from Together AI,\u00a0Baseten\u00a0and\u00a0RunPod.<\/p>\n\n\n\n

    There\u2019s also a quantized \u201c8-bit floating point (FP8)\u201d version of the 671B model, which reduces the size of the numbers used to represent the model\u2019s parameters from 16-bits to 8-bits, helping users run massive models faster, cheaper and on more accessible hardware \u2014 sometimes with only a negligible hit to performance (95 to 99%). However, this can slightly degrade model accuracy, especially for tasks requiring fine-grained precision (some math or reasoning problems).<\/p>\n\n\n\n

    All four Cogito v2 models are designed as hybrid reasoning systems: They can respond immediately to a query, or, when needed, reflect internally before answering.<\/p>\n\n\n\n

    Crucially, that reflection is not just runtime behavior \u2014 it\u2019s baked into the training process itself.<\/p>\n\n\n\n

    These models are trained to internalize their own reasoning. That means the very paths they take to arrive at answers \u2014 the mental steps, so to speak \u2014 are distilled back into the models\u2019 weights. <\/p>\n\n\n\n

    Over time, they learn which lines of thinking actually matter and which don\u2019t. <\/p>\n\n\n\n

    As Deep Cogito\u2019s blog post notes, the researchers \u201cdisincentivize the model from \u2018meandering more\u2019 to be able to arrive at the answer, and instead develop a stronger intuition for the right search trajectory for the reasoning process.\u201d<\/p>\n\n\n\n

    The result, Deep Cogito claims, is faster, more efficient reasoning and a general improvement in performance, even in so-called \u201cstandard\u201d mode.<\/p>\n\n\n\n

    Self-improving AI<\/h2>\n\n\n\n

    While many in the AI community are just encountering the company, Deep Cogito has been quietly building for over a year.<\/p>\n\n\n\n

    It emerged from stealth in April 2025 with a series of open-source models trained on Meta\u2019s Llama 3.2. Those early releases showed promising results.<\/p>\n\n\n\n

    As VentureBeat<\/em> previously reported, the smallest Cogito v1 models (3B and 8B) outperformed Llama 3 counterparts across several benchmarks \u2014 sometimes by wide margins. <\/p>\n\n\n\n

    Deep Cogito CEO and co-founder Drishan Arora \u2014 previously a lead LLM engineer at Google \u2014 described the company\u2019s long-term goal as building models that can reason and improve with each iteration, much like how AlphaGo refined its strategy through self-play.<\/p>\n\n\n\n

    Deep Cogito\u2019s core method, iterated distillation and amplification (IDA), replaces hand-written prompts or static teachers with the model\u2019s own evolving insights.<\/p>\n\n\n\n

    What is \u2018machine intuition\u2019?<\/h2>\n\n\n\n

    With Cogito v2, the team took that loop to a much larger scale. The central idea is simple: Reasoning shouldn\u2019t just be an inference-time tool; it should be part of the model\u2019s core intelligence.<\/p>\n\n\n\n

    So, the company implemented a system where the model runs reasoning chains during training, and then is trained on its intermediate thoughts.<\/p>\n\n\n\n

    This process yields concrete improvements, according to internal benchmarks. The flagship 671B MoE model outperforms DeepSeek R1 in reasoning tasks, matching or beating its latest 0528 model while using 60% shorter reasoning chains.<\/p>\n\n\n\n

    \"\"<\/figure>\n\n\n\n

    On MMLU, GSM8K and MGSM, Cogito 671B MoE\u2019s performance was roughly on par with top open models like Qwen1.5-72B and DeepSeek v3, and approached the performance tier of closed models like Claude 4 Opus and o3.<\/p>\n\n\n\n

    Specifically:<\/p>\n\n\n\n