For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.<\/p>\n

Chinese research labs including Alibaba's Qwen, DeepSeek, Moonshot and Baidu have rapidly set the pace in developing large-scale, open Mixture-of-Experts (MoE) models \u2014 often with permissive licenses and leading benchmark performance. While OpenAI fielded its own open source, general purpose LLM this summer as well \u2014 gpt-oss-20B and 120B \u2014 the uptake has been slowed by so many equally or better performing alternatives. <\/p>\n

Today, Arcee AI announced the release of Trinity Mini and Trinity Nano Preview, the first two models in its new \u201cTrinity\u201d family\u2014an open-weight MoE model suite fully trained in the United States. <\/p>\n

Users can try the former directly for themselves in a chatbot format on Acree's new website, chat.arcee.ai, and developers can download the code for both models on Hugging Face and run it themselves, as well as modify them\/fine-tune to their liking \u2014 all for free under an enterprise-friendly Apache 2.0 license. <\/p>\n

While small compared to the largest frontier models, these releases represent a rare attempt by a U.S. startup to build end-to-end open-weight models at scale\u2014trained from scratch, on American infrastructure, using a U.S.-curated dataset pipeline.<\/p>\n

"I'm experiencing a combination of extreme pride in my team and crippling exhaustion, so I'm struggling to put into words just how excited I am to have these models out," wrote Arcee Chief Technology Officer (CTO) Lucas Atkins in a post on the social network X (formerly Twitter). "Especially Mini."<\/p>\n

<\/div>\n

A third model, Trinity Large, is already in training: a 420B parameter model with 13B active parameters per token, scheduled to launch in January 2026.<\/p>\n

\u201cWe want to add something that has been missing in that picture,\u201d Atkins wrote in the Trinity launch manifesto published on Arcee's website. \u201cA serious open weight model family trained end to end in America\u2026 that businesses and developers can actually own.\u201d<\/p>\n

From Small Models to Scaled Ambition<\/b><\/h3>\n
The Trinity project marks a turning point for Arcee AI, which until now has been known for its compact, enterprise-focused models. The company has raised $29.5 million in funding to date, including a $24 million Series A in 2024 led by Emergence Capital, and its previous releases include AFM-4.5B, a compact instruct-tuned model released in mid-2025, and SuperNova, an earlier 70B-parameter instruction-following model designed for in-VPC enterprise deployment. <\/p>\n
Both were aimed at solving regulatory and cost issues plaguing proprietary LLM adoption in the enterprise.<\/p>\n
With Trinity, Arcee is aiming higher: not just instruction tuning or post-training, but full-stack pretraining of open-weight foundation models\u2014built for long-context reasoning, synthetic data adaptation, and future integration with live retraining systems.<\/p>\n
Originally conceived as a stepping stone to Trinity Large, both Mini and Nano emerged from early experimentation with sparse modeling and quickly became production targets themselves.<\/p>\n
Technical Highlights<\/b><\/h3>\n
Trinity Mini is a 26B parameter model with 3B active per token, designed for high-throughput reasoning, function calling, and tool use. Trinity Nano Preview is a 6B parameter model with roughly 800M active non-embedding parameters\u2014a more experimental, chat-focused model with a stronger personality, but lower reasoning robustness. <\/p>\n
Both models use Arcee\u2019s new Attention-First Mixture-of-Experts (AFMoE) architecture, a custom MoE design blending global sparsity, local\/global attention, and gated attention techniques.<\/p>\n
Inspired by recent advances from DeepSeek and Qwen, AFMoE departs from traditional MoE by tightly integrating sparse expert routing with an enhanced attention stack \u2014 including grouped-query attention, gated attention, and a local\/global pattern that improves long-context reasoning. <\/p>\n
Think of a typical MoE model like a call center with 128 specialized agents (called \u201cexperts\u201d) \u2014 but only a few are consulted for each call, depending on the question. This saves time and energy, since not every expert needs to weigh in.<\/p>\n
What makes AFMoE different is how it decides which agents to call and how it blends their answers. Most MoE models use a standard approach that picks experts based on a simple ranking. <\/p>\n
AFMoE, by contrast, uses a smoother method (called sigmoid routing) that\u2019s more like adjusting a volume dial than flipping a switch \u2014 letting the model blend multiple perspectives more gracefully.<\/p>\n
The \u201cattention-first\u201d part means the model focuses heavily on how it pays attention to different parts of the conversation. Imagine reading a novel and remembering some parts more clearly than others based on importance, recency, or emotional impact \u2014 that\u2019s attention. AFMoE improves this by combining local attention (focusing on what was just said) with global attention (remembering key points from earlier), using a rhythm that keeps things balanced.<\/p>\n
Finally, AFMoE introduces something called gated attention, which acts like a volume control on each attention output \u2014 helping the model emphasize or dampen different pieces of information as needed, like adjusting how much you care about each voice in a group discussion.<\/p>\n
All of this is designed to make the model more stable during training and more efficient at scale \u2014 so it can understand longer conversations, reason more clearly, and run faster without needing massive computing resources.<\/p>\n
Unlike many existing MoE implementations, AFMoE emphasizes stability at depth and training efficiency, using techniques like sigmoid-based routing without auxiliary loss, and depth-scaled normalization to support scaling without divergence.<\/p>\n
Model Capabilities<\/b><\/h3>\n
Trinity Mini adopts an MoE architecture with 128 experts, 8 active per token, and 1 always-on shared expert. Context windows reach up to 131,072 tokens, depending on provider. <\/p>\n
Benchmarks show Trinity Mini performing competitively with larger models across reasoning tasks, including outperforming gpt-oss on the SimpleQA benchmark (tests factual recall and whether the model admits uncertainty), MMLU (Zero shot, measuring broad academic knowledge and reasoning across many subjects without examples), and BFCL V3 (evaluates multi-step function calling and real-world tool use):<\/p>\n
\n
\n
MMLU (zero-shot):<\/b> 84.95<\/p>\n<\/li>\n
\n
Math-500:<\/b> 92.10<\/p>\n<\/li>\n
\n
GPQA-Diamond:<\/b> 58.55<\/p>\n<\/li>\n
\n
BFCL V3:<\/b> 59.67<\/p>\n<\/li>\n<\/ul>\n
Latency and throughput numbers across providers like Together and Clarifai show 200+ tokens per second throughput with sub-three-second E2E latency\u2014making Trinity Mini viable for interactive applications and agent pipelines.<\/p>\n
Trinity Nano, while smaller and not as stable on edge cases, demonstrates sparse MoE architecture viability at under 1B active parameters per token. <\/p>\n
Access, Pricing, and Ecosystem Integration<\/b><\/h3>\n
Both Trinity models are released under the permissive, enterprise-friendly, Apache 2.0 license<\/b>, allowing unrestricted commercial and research use. Trinity Mini is available via:<\/p>\n
\n
\n
Hugging Face<\/p>\n<\/li>\n
\n
OpenRouter<\/p>\n<\/li>\n
\n
chat.arcee.ai<\/p>\n<\/li>\n<\/ul>\n
API pricing for Trinity Mini via OpenRouter:<\/p>\n
\n
\n
$0.045 per million input tokens<\/p>\n<\/li>\n
\n
$0.15 per million output tokens<\/p>\n<\/li>\n
\n
A free tier is available for a limited time on OpenRouter<\/p>\n<\/li>\n<\/ul>\n
The model is already integrated into apps including Benchable.ai, Open WebUI, and SillyTavern. It's supported in Hugging Face Transformers, VLLM, LM Studio, and llama.cpp.<\/p>\n
Data Without Compromise: DatologyAI\u2019s Role<\/b><\/h3>\n
Central to Arcee\u2019s approach is control over training data\u2014a sharp contrast to many open models trained on web-scraped or legally ambiguous datasets. That\u2019s where DatologyAI, a data curation startup co-founded by former Meta and DeepMind researcher Ari Morcos, plays a critical role.<\/p>\n
DatologyAI\u2019s platform automates data filtering, deduplication, and quality enhancement across modalities, ensuring Arcee\u2019s training corpus avoids the pitfalls of noisy, biased, or copyright-risk content. <\/p>\n
For Trinity, DatologyAI helped construct a 10 trillion token curriculum organized into three phases: 7T general data, 1.8T high-quality text, and 1.2T STEM-heavy material, including math and code.<\/p>\n
This is the same partnership that powered Arcee\u2019s AFM-4.5B\u2014but scaled significantly in both size and complexity. According to Arcee, it was Datology\u2019s filtering and data-ranking tools that allowed Trinity to scale cleanly while improving performance on tasks like mathematics, QA, and agent tool use.<\/p>\n
Datology\u2019s contribution also extends into synthetic data generation. For Trinity Large, the company has produced over 10 trillion synthetic tokens\u2014paired with 10T curated web tokens\u2014to form a 20T-token training corpus for the full-scale model now in progress.<\/p>\n
Building the Infrastructure to Compete: Prime Intellect<\/b><\/h3>\n
Arcee\u2019s ability to execute full-scale training in the U.S. is also thanks to its infrastructure partner, Prime Intellect. The startup, founded in early 2024, began with a mission to democratize access to AI compute by building a decentralized GPU marketplace and training stack.<\/p>\n
While Prime Intellect made headlines with its distributed training of INTELLECT-1\u2014a 10B parameter model trained across contributors in five countries\u2014its more recent work, including the 106B INTELLECT-3, acknowledges the tradeoffs of scale: distributed training works, but for 100B+ models, centralized infrastructure is still more efficient.<\/p>\n
For Trinity Mini and Nano, Prime Intellect supplied the orchestration stack, modified TorchTitan runtime, and physical compute environment: 512 H200 GPUs in a custom bf16 pipeline, running high-efficiency HSDP parallelism. It is also hosting the 2048 B300 GPU cluster used to train Trinity Large.<\/p>\n
The collaboration shows the difference between branding and execution. While Prime Intellect\u2019s long-term goal remains decentralized compute, its short-term value for Arcee lies in efficient, transparent training infrastructure\u2014infrastructure that remains under U.S. jurisdiction, with known provenance and security controls.<\/p>\n
A Strategic Bet on Model Sovereignty<\/b><\/h3>\n
Arcee's push into full pretraining reflects a broader thesis: that the future of enterprise AI will depend on owning the training loop\u2014not just fine-tuning. As systems evolve to adapt from live usage and interact with tools autonomously, compliance and control over training objectives will matter as much as performance.<\/p>\n
\u201cAs applications get more ambitious, the boundary between \u2018model\u2019 and \u2018product\u2019 keeps moving,\u201d Atkins noted in Arcee's Trinity manifesto. \u201cTo build that kind of software you need to control the weights and the training pipeline, not only the instruction layer.\u201d<\/p>\n
This framing sets Trinity apart from other open-weight efforts. Rather than patching someone else\u2019s base model, Arcee has built its own\u2014from data to deployment, infrastructure to optimizer\u2014alongside partners who share that vision of openness and sovereignty.<\/p>\n
Looking Ahead: Trinity Large<\/b><\/h3>\n
Training is currently underway for Trinity Large, Arcee\u2019s 420B parameter MoE model, using the same afmoe architecture scaled to a larger expert set. <\/p>\n
The dataset includes 20T tokens, split evenly between synthetic data from DatologyAI and curated wb data.<\/p>\n
The model is expected to launch next month in January 2026, with a full technical report to follow shortly thereafter.<\/p>\n
If successful, it would make Trinity Large one of the only fully open-weight, U.S.-trained frontier-scale models\u2014positioning Arcee as a serious player in the open ecosystem at a time when most American LLM efforts are either closed or based on non-U.S. foundations.<\/p>\n
A recommitment to U.S. open source<\/b><\/h3>\n
In a landscape where the most ambitious open-weight models are increasingly shaped by Chinese research labs, Arcee\u2019s Trinity launch signals a rare shift in direction: an attempt to reclaim ground for transparent, U.S.-controlled model development. <\/p>\n
Backed by specialized partners in data and infrastructure, and built from scratch for long-term adaptability, Trinity is a bold statement about the future of U.S. AI development, showing that small, lesser-known companies can still push the boundaries and innovate in an open fashion even as the industry is increasingly productized and commodtized. <\/p>\n
What remains to be seen is whether Trinity Large can match the capabilities of its better-funded peers. But with Mini and Nano already in use, and a strong architectural foundation in place, Arcee may already be proving its central thesis: that model sovereignty, not just model size, will define the next era of AI.<\/p>\n

\n
Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"
For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou. Chinese research labs including Alibaba's Qwen, DeepSeek, Moonshot and Baidu have rapidly set the pace in developing large-scale, open Mixture-of-Experts (MoE) models \u2014 often with permissive licenses and leading […]<\/p>\n","protected":false},"author":1,"featured_media":4673,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-4672","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/12\/JIw2KAqLv9DVkYKstOJ2Q.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4672","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=4672"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4672\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/4673"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=4672"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=4672"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=4672"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}