\n\t\t\t\t

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More<\/em><\/p>\n\n\n\n

\n<\/div>
Deep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito <\/em>v1, a new line of open source large language models (LLMs) fine-tuned from Meta\u2019s Llama 3.2 and equipped with hybrid reasoning capabilities \u2014 the ability to answer quickly and immediately, or \u201cself-reflect\u201d like OpenAI\u2019s \u201co\u201d series and DeepSeek R1.<\/p>\n\n\n\n
The company aims to push the boundaries of AI beyond current human-overseer limitations by enabling models to iteratively refine and internalize their own improved reasoning strategies. It\u2019s ultimately on a quest toward developing superintelligence \u2014 AI smarter than all humans in all domains \u2014 yet the company says that \u201cAll models we create will be open sourced.\u201d<\/p>\n\n\n\n
Deep Cogito\u2019s CEO and co-founder Drishan Arora \u2014 a former Senior Software Engineer at Google who says he led the large language model (LLM) modeling for Google\u2019s generative search product \u2014also said in a post on X they are \u201cthe strongest open models at their scale \u2013 including those from LLaMA, DeepSeek, and Qwen.\u201d<\/p>\n\n\n\n
The initial model lineup includes five base sizes: 3 billion, 8 billion, 14 billion, 32 billion, and 70 billion parameters, available now on AI code sharing community Hugging Face, Ollama and through application programming interfaces (API) on Fireworks and Together AI. <\/p>\n\n\n\n
They\u2019re available under the Llama licensing terms which allows for commercial usage \u2014 so third-party enterprises could put them to work in paid products \u2014 up to 700 million monthly users, at which point they need to obtain a paid license from Meta.<\/p>\n\n\n\n
The company plans to release even larger models \u2014 up to 671 billion parameters \u2014 in the coming months.<\/p>\n\n\n\n
Arora describes the company\u2019s training approach, iterated distillation and amplification (IDA), as a novel alternative to traditional reinforcement learning from human feedback (RLHF) or teacher-model distillation. <\/p>\n\n\n\n
The core idea behind IDA is to allocate more compute for a model to generate improved solutions, then distill the improved reasoning process into the model\u2019s own parameters \u2014 effectively creating a feedback loop for capability growth. Arora likens this approach to Google AlphaGo\u2019s self-play strategy, applied to natural language.<\/p>\n\n\n\n
Benchmarks and evaluations<\/h2>\n\n\n\n
The company shared a broad set of evaluation results comparing Cogito models to open-source peers across general knowledge, mathematical reasoning, and multilingual tasks. Highlights include:<\/p>\n\n\n\n
\n
Cogito 3B (Standard)<\/strong> outperforms LLaMA 3.2 3B<\/em> on MMLU by 6.7 percentage points (65.4% vs. 58.7%), and on Hellaswag by 18.8 points (81.1% vs. 62.3%).<\/li>\n\n\n\n
In reasoning mode<\/strong>, Cogito 3B<\/em> scores 72.6% on MMLU and 84.2% on ARC, exceeding its own standard-mode performance and showing the effect of IDA-based self-reflection.<\/li>\n\n\n\n
Cogito 8B (Standard)<\/strong> scores 80.5% on MMLU, outperforming LLaMA 3.1 8B<\/em> by 12.8 points. It also leads by over 11 points on MMLU-Pro and achieves 88.7% on ARC.<\/li>\n\n\n\n
In reasoning mode<\/strong>, Cogito 8B<\/em> achieves 83.1% on MMLU and 92.0% on ARC. It surpasses DeepSeek R1 Distill 8B<\/em> in nearly every category except the MATH benchmark, where Cogito scores significantly lower (60.2% vs. 80.6%).<\/li>\n\n\n\n
Cogito 14B and 32B<\/strong> models outperform Qwen2.5<\/em> counterparts by around 2\u20133 percentage points on aggregate benchmarks, with Cogito 32B (Reasoning)<\/em> reaching 90.2% on MMLU and 91.8% on the MATH benchmark.<\/li>\n\n\n\n
Cogito 70B (Standard)<\/strong> outperforms LLaMA 3.3 70B<\/em> on MMLU by 6.4 points (91.7% vs. 85.3%) and exceeds LLaMA 4 Scout 109B<\/em> on aggregate benchmark scores (54.5% vs. 53.3%).<\/li>\n\n\n\n
Against DeepSeek R1 Distill 70B<\/em>, Cogito 70B (Reasoning)<\/em> posts stronger results in general and multilingual benchmarks, with a notable 91.0% on MMLU and 92.7% on MGSM.<\/li>\n<\/ul>\n\n\n\n
Cogito models generally show their highest performance in reasoning mode, though some trade-offs emerge \u2014 particularly in mathematics. <\/p>\n\n\n\n
For instance, while Cogito 70B (Standard) matches or slightly exceeds peers in MATH and GSM8K, Cogito 70B (Reasoning) trails DeepSeek R1 in MATH by over five percentage points (83.3% vs. 89.0%).<\/p>\n\n\n\n\n\n\n\n
In addition to general benchmarks, Deep Cogito evaluated its models on native tool-calling performance \u2014 a growing priority for agents and API-integrated systems.<\/p>\n\n\n\n
\n
Cogito 3B supports four tool-calling tasks natively (simple, parallel, multiple, and parallel-multiple), whereas LLaMA 3.2 3B<\/em> does not support tool calling.<\/li>\n\n\n\n
Cogito 3B scores 92.8% on simple tool calls and over 91% on multiple tool calls.<\/li>\n\n\n\n
Cogito 8B scores over 89% across all tool call types, significantly outperforming LLaMA 3.1 8B<\/em>, which ranges between 35% and 54%.<\/li>\n<\/ul>\n\n\n\n
These improvements are attributed not only to model architecture and training data, but also to task-specific post-training, which many baseline models currently lack.<\/p>\n\n\n\n
Looking ahead<\/h2>\n\n\n\n
Deep Cogito plans to release larger-scale models in upcoming months, including mixture-of-expert variants at 109B, 400B, and 671B parameter scales. The company will also continue updating its current model checkpoints with extended training.<\/p>\n\n\n\n
The company positions its IDA methodology as a long-term path toward scalable self-improvement, removing dependence on human or static teacher models. <\/p>\n\n\n\n
Arora emphasizes that while performance benchmarks are important, real-world utility and adaptability are the true tests for these models \u2014 and that the company is just at the beginning of what it believes is a steep scaling curve.<\/p>\n\n\n\n
Deep Cogito\u2019s research and infrastructure partnerships include teams from Hugging Face, RunPod, Fireworks AI, Together AI, and Ollama. All released models are open source and available now.<\/p>\n
\n
\n
Daily insights on business use cases with VB Daily<\/strong><\/p>\n
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n
Read our Privacy Policy<\/p>\n
\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n
An error occured.<\/p>\n<\/p><\/div>\n
\n\t\t\t\t\t $\"\"\/$ \n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n
\r\n
Source link <\/a>","protected":false},"excerpt":{"rendered":"
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Deep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito v1, a new line of open source large language models (LLMs) fine-tuned from Meta\u2019s Llama 3.2 and equipped […]<\/p>\n","protected":false},"author":1,"featured_media":1125,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[33],"tags":[],"class_list":["post-1124","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"aioseo_head":"\n\t\t\n\t\n\t\n\t\n\t\n\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\t