{"id":4854,"date":"2025-12-16T08:51:21","date_gmt":"2025-12-16T08:51:21","guid":{"rendered":"https:\/\/violethoward.com\/new\/bolmos-architecture-unlocks-efficient-byte-level-lm-training-without-sacrificing-quality\/"},"modified":"2025-12-16T08:51:21","modified_gmt":"2025-12-16T08:51:21","slug":"bolmos-architecture-unlocks-efficient-byte-level-lm-training-without-sacrificing-quality","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/bolmos-architecture-unlocks-efficient-byte-level-lm-training-without-sacrificing-quality\/","title":{"rendered":"Bolmo\u2019s architecture unlocks efficient byte\u2011level LM training without sacrificing quality"},"content":{"rendered":"<p> <br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/images.ctfassets.net\/jdtwqhzvc2n1\/3vX2sUyJCdUl0o3HxcSPiI\/71266904e616238751a36b1e7e0aef79\/crimedy7_illustration_of_robots_as_bytes_--ar_169_--v_7_97b9db0b-c676-4b02-82c9-af958afc193e_3.png?w=300&amp;q=30\" \/><\/p>\n<p>Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche \u2014 and make it practical at scale \u2014 the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its <u>Olmo 3 models<\/u> by \u201cbytefiying\u201d them and reusing their backbone and capabilities. <\/p>\n<p>The company launched two versions, Bolmo 7B and Bolmo 1B, which are \u201cthe first fully open byte-level language model,\u201d according to Ai2. The company said the two models performed competitively with \u2014 and in some cases surpassed \u2014 other byte-level and character-based models.<\/p>\n<div><\/div>\n<p>Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or tokenizer. This allows them to handle misspellings, rare languages, and unconventional text more reliably \u2014 key requirements for moderation, edge deployments, and multilingual applications.<\/p>\n<p>For enterprises deploying AI across multiple languages, noisy user inputs, or constrained environments, tokenizer-free models offer a way to reduce operational complexity. Ai2\u2019s Bolmo is an attempt to make that approach practical at scale \u2014 without retraining from scratch.<\/p>\n<h2>How Bolmo works and how it was built\u00a0<\/h2>\n<p>Ai2 said it trained the Bolmo models using its Dolma 3 data mix, which helped train its <u>Olmo flagship models<\/u>, and some open code datasets and character-level data.<\/p>\n<p>The company said its goal \u201cis to provide a reproducible, inspectable blueprint for byteifying strong subword language models in a way the community can adopt and extend.\u201d To meet this goal, Ai2 will release its checkpoints, code, and <u>a full paper<\/u> to help other organizations build byte-level models on top of its Olmo ecosystem.\u00a0<\/p>\n<p>Since training a byte-level model completely from scratch can get expensive, Ai2 researchers instead chose an existing Olmo 3 7B checkpoint to byteify in two stages.\u00a0<\/p>\n<p>In the first stage, Ai2 froze the Olmo 3 transformer so that they only train certain parts, such as the local encoder and decoder, the boundary predictor, and the language modeling head. This was designed to be \u201ccheap and fast\u201d and requires just 9.8 billion tokens.\u00a0<\/p>\n<p>The next stage unfreezes the model and trains it with additional tokens.\u00a0Ai2 said the byte-level approach allows Bolmo to avoid the vocabulary bottlenecks that limit traditional subword models.<\/p>\n<h2>Strong performance among its peers<\/h2>\n<p>Byte-level language models are not as mainstream as small language models or LLMs, but this is a growing field in research. <u>Meta released its BLT architecture<\/u> research last year, aiming to offer a model that is robust, processes raw data, and doesn\u2019t rely on fixed vocabularies.\u00a0<\/p>\n<p>Other research models in this space <u>include ByT5<\/u>, <u>Stanford\u2019s MrT5<\/u>, and <u>Canine<\/u>.\u00a0\u00a0<\/p>\n<p>Ai2 evaluated Bolmo using its evaluation suite, covering math, STEM reasoning, question answering, general knowledge, and code.\u00a0<\/p>\n<p>Bolmo 7B showed strong performance, outperforming character-focused benchmarks like CUTE and EXECUTE, and also improving accuracy over the base LLM Olmo 3.\u00a0<\/p>\n<p>Bolmo 7B outperformed models of comparable size in coding, math, multiple-choice QA, and character-level understanding.\u00a0<\/p>\n<h2>Why enterprises may choose byte-level models<\/h2>\n<p>Enterprises find value in a hybrid model structure, using a mix of models and model sizes.\u00a0<\/p>\n<p>Ai2 makes the case that organizations should also consider byte-level models not only for robustness and multilingual understanding, but because it \u201cnaturally plugs into an existing model ecosystem.\u201d<\/p>\n<p>\u201cA key advantage of the dynamic hierarchical setup is that compression becomes a toggleable knob,\u201d the company said. <\/p>\n<p>For enterprises already running heterogeneous model stacks, Bolmo suggests that byte-level models may no longer be purely academic. By retrofitting a strong subword model rather than training from scratch, Ai2 is signaling a lower-risk path for organizations that want robustness without abandoning existing infrastructure.<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/venturebeat.com\/ai\/bolmos-architecture-unlocks-efficient-byte-level-lm-training-without\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche \u2014 and make it practical at scale \u2014 the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by \u201cbytefiying\u201d [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4855,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-4854","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/12\/crimedy7_illustration_of_robots_as_bytes_-ar_169_-v_7_97b9db0b-c676-4b02-82c9-af958afc193e_3.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4854","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=4854"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/4854\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/4855"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=4854"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=4854"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=4854"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d79d7d46fa5cbf45858bd1. Config Timestamp: 2026-04-09 12:37:16 UTC, Cached Timestamp: 2026-04-29 17:42:56 UTC -->