\n\t\t\t\t

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders.<\/em> Subscribe Now<\/em><\/p>\n\n\n\n

\n<\/div>
It\u2019s been a little more than a month since Chinese AI startup DeepSeek, an offshoot of Hong Kong-based High-Flyer Capital Management, released the latest version of its hit open source model DeepSeek, R1-0528.<\/p>\n\n\n\n
Like its predecessor, DeepSeek-R1 \u2014 which rocked the AI and global business communities with how cheaply it was trained and how well it performed on reasoning tasks, all available to developers and enterprises for free \u2014 R1-0528 is already being adapted and remixed by other AI labs and developers, thanks in large part to its permissive Apache 2.0 license.<\/p>\n\n\n\n
This week, the 24-year-old German firm TNG Technology Consulting GmbH released one such adaptation: DeepSeek-TNG R1T2 Chimera, the latest model in its Chimera large language model (LLM) family. R1T2 delivers a notable boost in efficiency and speed, scoring at upwards of 90% of R1-0528\u2019s intelligence benchmark scores<\/strong>, while generating answers with less than 40% of R1-0528\u2019s output token count<\/strong>. <\/p>\n\n\n\n
That means it produces shorter responses, translating directly into faster inference and lower compute costs<\/strong>. On the model card TNG released for its new R1T2 on the AI code sharing community Hugging Face, the company states that it is \u201cabout 20% faster than the regular R1\u201d (the one released back in January) \u201cand more than twice as fast as R1-0528\u201d (the May official update from DeepSeek).<\/p>\n\n\n\n
Already, the response has been incredibly positive from the AI developer community. \u201cDAMN! DeepSeek R1T2 \u2013 200% faster than R1-0528 & 20% faster than R1,\u201d wrote Vaibhav (VB) Srivastav, a senior leader at Hugging Face, on X. \u201cSignificantly better than R1 on GPQA & AIME 24, made via Assembly of Experts with DS V3, R1 & R1-0528 \u2014 and it\u2019s MIT-licensed, available on Hugging Face.\u201d<\/p>\n\n\n\n
This gain is made possible by TNG\u2019s Assembly-of-Experts (AoE) method \u2014 a technique for building LLMs by selectively merging the weight tensors (internal parameters) from multiple pre-trained models that TNG described in a paper published in May on arXiv, the non-peer reviewed open access online journal. <\/p>\n\n\n\n
A successor to the original R1T Chimera, R1T2 introduces a new \u201cTri-Mind\u201d configuration that integrates three parent models: DeepSeek-R1-0528, DeepSeek-R1, and DeepSeek-V3-0324. The result is a model engineered to maintain high reasoning capability while significantly reducing inference cost.<\/p>\n\n\n\n
R1T2 is constructed without further fine-tuning or retraining. It inherits the reasoning strength of R1-0528, the structured thought patterns of R1, and the concise, instruction-oriented behavior of V3-0324 \u2014 delivering a more efficient, yet capable model for enterprise and research use.<\/p>\n\n\n\n
How Assembly-of-Experts (AoE) Differs from Mixture-of-Experts (MoE)<\/h2>\n\n\n\n
Mixture-of-Experts (MoE) is an architectural design in which different components, or \u201cexperts,\u201d are conditionally activated per input. In MoE LLMs like DeepSeek-V3 or Mixtral, only a subset of the model\u2019s expert layers (e.g., 8 out of 256) are active during any given token\u2019s forward pass. This allows very large models to achieve higher parameter counts and specialization while keeping inference costs manageable \u2014 because only a fraction of the network is evaluated per token.<\/p>\n\n\n\n
Assembly-of-Experts (AoE) is a model merging technique, not an architecture. It\u2019s used to create a new model from multiple pre-trained MoE models by selectively interpolating their weight tensors. <\/p>\n\n\n\n
The \u201cexperts\u201d in AoE refer to the model components being merged \u2014 typically the routed expert tensors within MoE layers \u2014 not experts dynamically activated at runtime.<\/p>\n\n\n\n
TNG\u2019s implementation of AoE focuses primarily on merging routed expert tensors \u2014 the part of a model most responsible for specialized reasoning \u2014 while often retaining the more efficient shared and attention layers from faster models like V3-0324. This approach enables the resulting Chimera models to inherit reasoning strength without replicating the verbosity or latency of the strongest parent models.<\/p>\n\n\n\n
Performance and Speed: What the Benchmarks Actually Show<\/h2>\n\n\n\n
According to benchmark comparisons presented by TNG, R1T2 achieves between 90% and 92%<\/strong> of the reasoning performance of its most intelligent parent, DeepSeek-R1-0528, as measured by AIME-24, AIME-25, and GPQA-Diamond test sets. <\/p>\n\n\n\n
$\"\"$ <\/figure>\n\n\n\n
However, unlike DeepSeek-R1-0528 \u2014 which tends to produce long, detailed answers due to its extended chain-of-thought reasoning \u2014 R1T2 is designed to be much more concise. It delivers similarly intelligent responses while using significantly fewer words.<\/p>\n\n\n\n
Rather than focusing on raw processing time or tokens-per-second, TNG measures \u201cspeed\u201d in terms of output token count per answer<\/strong> \u2014 a practical proxy for both cost and latency. According to benchmarks shared by TNG, R1T2 generates responses using approximately 40% of the tokens<\/strong> required by R1-0528. <\/p>\n\n\n\n
That translates to a 60% reduction in output length<\/strong>, which directly reduces inference time and compute load, speeding up responses by 2X, or 200%.<\/p>\n\n\n\n
When compared to the original DeepSeek-R1, R1T2 is also around 20% more concise on average<\/strong>, offering meaningful gains in efficiency for high-throughput or cost-sensitive deployments.<\/p>\n\n\n\n
This efficiency does not come at the cost of intelligence. As shown in the benchmark chart presented in TNG\u2019s technical paper, R1T2 sits in a desirable zone on the intelligence vs. output cost curve. It preserves reasoning quality while minimizing verbosity \u2014 an outcome critical to enterprise applications where inference speed, throughput, and cost all matter.<\/p>\n\n\n\n
Deployment Considerations and Availability<\/h2>\n\n\n\n
R1T2 is released under a permissive MIT License and is available now on Hugging Face, meaning it is open source and available to be used and built into commercial applications. <\/p>\n\n\n\n
TNG notes that while the model is well-suited for general reasoning tasks, it is not currently recommended for use cases requiring function calling or tool use, due to limitations inherited from its DeepSeek-R1 lineage. These may be addressed in future updates.<\/p>\n\n\n\n
The company also advises European users to assess compliance with the EU AI Act, which comes into effect on August 2, 2025. <\/p>\n\n\n\n
Enterprises operating in the EU should review relevant provisions or consider halting model use after that date if requirements cannot be met.<\/p>\n\n\n\n
However, U.S. companies operating domestically and servicing U.S.-based users, or those of other nations, are not<\/em> subject to the terms of the EU AI Act, which should give them considerable flexibility when using and deploying this free, speedy open source reasoning model. If they service users in the E.U., some provisions of the EU Act will still apply. <\/p>\n\n\n\n
TNG has already made prior Chimera variants available through platforms like OpenRouter and Chutes, where they reportedly processed billions of tokens daily. The release of R1T2 represents a further evolution in this public availability effort.<\/p>\n\n\n\n
About TNG Technology Consulting GmbH<\/h2>\n\n\n\n
Founded in January 2001, TNG Technology Consulting GmbH is based in Bavaria, Germany, and employs over 900 people, with a high concentration of PhDs and technical specialists. <\/p>\n\n\n\n
The company focuses on software development, artificial intelligence, and DevOps\/cloud services, serving major enterprise clients across industries such as telecommunications, insurance, automotive, e-commerce, and logistics.<\/p>\n\n\n\n
TNG operates as a values-based consulting partnership. Its unique structure, grounded in operational research and self-management principles, supports a culture of technical innovation. <\/p>\n\n\n\n
It actively contributes to open-source communities and research, as demonstrated through public releases like R1T2 and the publication of its Assembly-of-Experts methodology.<\/p>\n\n\n\n
What It Means for Enterprise Technical Decision-Makers<\/h2>\n\n\n\n
For CTOs, AI platform owners, engineering leads, and IT procurement teams, R1T2 introduces tangible benefits and strategic options:<\/p>\n\n\n\n
\n
Lower Inference Costs<\/strong>: With fewer output tokens per task, R1T2 reduces GPU time and energy consumption, translating directly into infrastructure savings \u2014 especially important in high-throughput or real-time environments.<\/li>\n\n\n\n
High Reasoning Quality Without Overhead<\/strong>: It preserves much of the reasoning power of top-tier models like R1-0528, but without their long-windedness. This is ideal for structured tasks (math, programming, logic) where concise answers are preferable.<\/li>\n\n\n\n
Open and Modifiable<\/strong>: The MIT License allows full deployment control and customization, enabling private hosting, model alignment, or further training within regulated or air-gapped environments.<\/li>\n\n\n\n
Emerging Modularity<\/strong>: The AoE approach suggests a future where models are built modularly, allowing enterprises to assemble specialized variants by recombining strengths of existing models, rather than retraining from scratch.<\/li>\n\n\n\n
Caveats<\/strong>: Enterprises relying on function-calling, tool use, or advanced agent orchestration should note current limitations, though future Chimera updates may address these gaps.<\/li>\n<\/ul>\n\n\n\n
TNG encourages researchers, developers, and enterprise users to explore the model, test its behavior, and provide feedback. The R1T2 Chimera is available at huggingface.co\/tngtech\/DeepSeek-TNG-R1T2-Chimera, and technical inquiries can be directed to research@tngtech.com<\/strong>.<\/p>\n\n\n\n
For technical background and benchmark methodology, TNG\u2019s research paper is available at arXiv:2506.14794.<\/p>\n\n\n\n\n
\n
\n
Daily insights on business use cases with VB Daily<\/strong><\/p>\n
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.<\/p>\n
Read our Privacy Policy<\/p>\n
\n\t\t\t\t\tThanks for subscribing. Check out more VB newsletters here.\n\t\t\t\t<\/p>\n
An error occured.<\/p>\n<\/p><\/div>\n
\n\t\t\t\t\t $\"\"\/$ \n\t\t\t\t<\/div>\n<\/p><\/div>\n<\/div>\t\t\t<\/div>\r\n
\r\n
Source link <\/a>","protected":false},"excerpt":{"rendered":"
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now It\u2019s been a little more than a month since Chinese AI startup DeepSeek, an offshoot of Hong Kong-based High-Flyer Capital Management, released the latest version of its hit open […]<\/p>\n","protected":false},"author":1,"featured_media":2258,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-2257","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/07\/cfr0z3n_golden_age_sci-fi_comic_splash_page_minimalist_retro__9424f51d-4e5a-413e-a4af-911889cbe2c2_2.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2257","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=2257"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/2257\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/2258"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=2257"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=2257"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=2257"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}