{"id":3780,"date":"2025-10-08T17:06:48","date_gmt":"2025-10-08T17:06:48","guid":{"rendered":"https:\/\/violethoward.com\/new\/ai21s-jamba-reasoning-3b-redefines-what-small-means-in-llms-250k-context-on-a-laptop\/"},"modified":"2025-10-08T17:06:48","modified_gmt":"2025-10-08T17:06:48","slug":"ai21s-jamba-reasoning-3b-redefines-what-small-means-in-llms-250k-context-on-a-laptop","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/ai21s-jamba-reasoning-3b-redefines-what-small-means-in-llms-250k-context-on-a-laptop\/","title":{"rendered":"AI21\u2019s Jamba Reasoning 3B Redefines What \u201cSmall\u201d Means in LLMs \u2014 250K Context on a Laptop"},"content":{"rendered":"<p> <br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/images.ctfassets.net\/jdtwqhzvc2n1\/5oeaGc6bdeKn9zv5ZBduGS\/f5fd953849eaf0b642bec868df68b07e\/crimedy7_illustration_of_dozens_of_small_robots_working_on_a__4b54aeff-0eaf-4af3-abfa-c8fe2d943247_2.png\" \/><\/p>\n<p>The latest addition to the small model wave for enterprises comes from <u>AI21 Labs<\/u>, which is betting that bringing models to devices will free up traffic in data centers.\u00a0<\/p>\n<p>AI21\u2019s Jamba Reasoning 3B, a \u201ctiny\u201d open-source model that can run extended reasoning, code generation and respond based on ground truth. Jamba Reasoning 3B handles more than 250,000 tokens and can run inference on edge devices.\u00a0<\/p>\n<p>The company said Jamba Reasoning 3B works on devices such as laptops and mobile phones.\u00a0<\/p>\n<p>Ori Goshen, co-CEO of AI21, told VentureBeat that the company sees more enterprise use cases for small models, mainly because moving most inference to devices frees up data centers.\u00a0\u00a0<\/p>\n<p>\u201cWhat we&#x27;re seeing right now in the industry is an economics issue where there are very expensive data center build-outs, and the revenue that is generated from the data centers versus the depreciation rate of all their chips shows the math doesn&#x27;t add up,\u201d Goshen said.\u00a0<\/p>\n<p>He added that in the future \u201cthe industry by and large would be hybrid in the sense that some of the computation will be on devices locally and other inference will move to GPUs.\u201d<\/p>\n<h2>Tested on a MacBook<\/h2>\n<p>\nJamba Reasoning 3B combines the Mamba architecture and Transformers to allow it to run a 250K token window on devices. AI21 said it can do 2-4x faster inference speeds. Goshen said the Mamba architecture significantly contributed to the model\u2019s speed.\u00a0<\/p>\n<p>Jamba Reasoning 3B\u2019s hybrid architecture also allows it to reduce memory requirements, thereby reducing its computing needs.\u00a0<\/p>\n<p>AI21 tested the model on a standard MacBook Pro and found that it can process 35 tokens per second.\u00a0<\/p>\n<p>Goshen said the model works best for tasks involving function calling, policy-grounded generation and tool routing. He said that simple requests, such as asking for information about a forthcoming meeting and asking the model to create an agenda for it, could be done on devices. The more complex reasoning tasks can be saved for GPU clusters.\u00a0<\/p>\n<h2>Small models in enterprise<\/h2>\n<p>Enterprises have been interested in using a mix of small models, some of which are specifically designed for their industry and some that are condensed versions of LLMs.\u00a0<\/p>\n<p>In September, <u>Meta<\/u> released <u>MobileLLM-R1, a family of reasoning models<\/u> ranging from 140M to 950M parameters. These models are designed for math, coding and scientific reasoning rather than chat applications. MobileLLM-R1 can run on compute-constrained devices.\u00a0<\/p>\n<p><u>Google<\/u>\u2019s <u>Gemma<\/u> was one of the first small models to come to the market, designed to run on portable devices like laptops and mobile phones. Gemma has since <u>been expanded<\/u>.\u00a0<\/p>\n<p>Companies like <u>FICO<\/u> have also begun building their own models. <u>FICO launched<\/u> its FICO Focused Language and FICO Focused Sequence small models that will only answer finance-specific questions.\u00a0<\/p>\n<p>Goshen said the big difference their model offers is that it\u2019s even smaller than most models and yet it can run reasoning tasks without sacrificing speed.\u00a0<\/p>\n<h2>Benchmark testing\u00a0<\/h2>\n<p>In benchmark testing, Jamba Reasoning 3B demonstrated strong performance compared to other small models, including <u>Qwen<\/u> 4B, <u>Meta<\/u>\u2019s Llama 3.2B-3B, and Phi-4-Mini from <u>Microsoft<\/u>.\u00a0<\/p>\n<p>It outperformed all models on the IFBench test and Humanity\u2019s Last Exam, although it came in second to Qwen 4 on MMLU-Pro.\u00a0<\/p>\n<p>Goshen said another advantage of small models like Jamba Reasoning 3B is that they are highly steerable and provide better privacy options to enterprises because the inference is not sent to a server elsewhere.\u00a0<\/p>\n<p>\u201cI do believe there\u2019s a world where you can optimize for the needs and the experience of the customer, and the models that will be kept on devices are a large part of it,\u201d he said.\u00a0<\/p>\n<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/venturebeat.com\/ai\/ai21s-jamba-reasoning-3b-redefines-what-small-means-in-llms-250k-context-on\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The latest addition to the small model wave for enterprises comes from AI21 Labs, which is betting that bringing models to devices will free up traffic in data centers.\u00a0 AI21\u2019s Jamba Reasoning 3B, a \u201ctiny\u201d open-source model that can run extended reasoning, code generation and respond based on ground truth. Jamba Reasoning 3B handles more [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3781,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3780","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/10\/crimedy7_illustration_of_dozens_of_small_robots_working_on_a__4b54aeff-0eaf-4af3-abfa-c8fe2d943247_2.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3780","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3780"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3780\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3781"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3780"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3780"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3780"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d79d7d46fa5cbf45858bd1. Config Timestamp: 2026-04-09 12:37:16 UTC, Cached Timestamp: 2026-04-30 02:52:34 UTC -->