{"id":3858,"date":"2025-10-13T22:41:20","date_gmt":"2025-10-13T22:41:20","guid":{"rendered":"https:\/\/violethoward.com\/new\/researchers-find-that-retraining-only-small-parts-of-ai-models-can-cut-costs-and-prevent-forgetting\/"},"modified":"2025-10-13T22:41:20","modified_gmt":"2025-10-13T22:41:20","slug":"researchers-find-that-retraining-only-small-parts-of-ai-models-can-cut-costs-and-prevent-forgetting","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/researchers-find-that-retraining-only-small-parts-of-ai-models-can-cut-costs-and-prevent-forgetting\/","title":{"rendered":"Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting"},"content":{"rendered":"<p> <br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/images.ctfassets.net\/jdtwqhzvc2n1\/wvhlzgUGzNikEzmtJxvNB\/ed69fb909b090e1e7e6b81ff61abf8b0\/crimedy7_illustration_of_a_sculptor_creating_a_robot_from_a_p_501bf165-0b44-4bb1-9608-1025a42400b7_1.png\" \/><\/p>\n<p>Enterprises often find that when <u>they fine-tune models<\/u>, one effective approach to making a large language model (LLM) fit for purpose and grounded in data is to have the model lose some of its abilities. After fine-tuning, some models \u201cforget\u201d how to perform certain tasks or other tasks they already learned.\u00a0<\/p>\n<p>Research from the University of Illinois Urbana-Champaign proposes a new method for retraining models that avoids \u201ccatastrophic forgetting,\u201d in which the model loses some of its prior knowledge. The paper focuses on two specific LLMs that generate responses from images: LLaVA and Qwen 2.5-VL.<\/p>\n<p>The approach encourages enterprises to retrain only narrow parts of an LLM to avoid retraining the entire model and incurring a significant increase in compute costs. The team claims that catastrophic forgetting isn\u2019t true memory loss, but rather a side effect of bias drift.\u00a0<\/p>\n<p>\u201cTraining a new LMM can cost millions of dollars, weeks of time, and emit hundreds of tons of CO2, so finding ways to more efficiently and effectively update existing models is a pressing concern,\u201d the team wrote in the <u>paper<\/u>. \u201cGuided by this result, we explore tuning recipes that preserve learning while limiting output shift.\u201d<\/p>\n<p>The researchers focused on a multi-layer perceptron (MLP), the model&#x27;s internal decision-making component.\u00a0\n<\/p>\n<h2>Catastrophic forgetting\u00a0<\/h2>\n<p>The researchers wanted first to verify the existence and the cause of catastrophic forgetting in models.\u00a0<\/p>\n<p>To do this, they created a set of target tasks for the models to complete. The models were then fine-tuned and evaluated to determine whether they led to substantial forgetting. But as the process went on, the researchers found that the models were recovering some of their abilities.\u00a0<\/p>\n<p>\u201cWe also noticed a surprising result, that the model performance would drop significantly in held out benchmarks after training on the counting task, it would mostly recover on PathVQA, another specialized task that is not well represented in the benchmarks,\u201d they said. \u201cMeanwhile, while performing the forgetting mitigation experiments, we also tried separately tuning only the self-attention projection (SA Proj) or MLP layers, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result \u2013 that tuning only self-attention projection layers led to very good learning of the target tasks with no drop in performance in held out tasks, even after training all five target tasks in a sequence.\u201d<\/p>\n<p>The researchers said they believe that \u201cwhat looks like forgetting or interference after fine-tuning on a narrow target task is actually bias in the output distribution due to the task distribution shift.\u201d<\/p>\n<h2>Narrow retraining<\/h2>\n<p>That finding turned out to be the key to the experiment. The researchers noted that tuning the MLP increases the likelihood of \u201coutputting numeric tokens and a highly correlated drop in held out task accuracy.\u201d What it showed is that a model forgetting some of its knowledge is only temporary and not a long-term matter.\u00a0<\/p>\n<p>\u201cTo avoid biasing the output distribution, we tune the MLP up\/gating projections while keeping the down projection frozen, and find that it achieves similar learning to full MLP tuning with little forgetting,\u201d the researchers said.\u00a0<\/p>\n<p>This allows for a more straightforward and more reproducible method for fine-tuning a model.\u00a0<\/p>\n<p>By focusing on a narrow segment of the model, rather than a wholesale retraining, enterprises can cut compute costs. It also allows better control of output drift.\u00a0<\/p>\n<p>However, the research focuses only on two models, specifically those dealing with vision and language. The researchers noted that due to limited resources, they are unable to try the experiment with other models.<\/p>\n<p>Their findings, however, can be extended to other LLMs, especially for different modalities.\u00a0<\/p>\n<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/venturebeat.com\/ai\/researchers-find-that-retraining-only-small-parts-of-ai-models-can-cut-costs\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Enterprises often find that when they fine-tune models, one effective approach to making a large language model (LLM) fit for purpose and grounded in data is to have the model lose some of its abilities. After fine-tuning, some models \u201cforget\u201d how to perform certain tasks or other tasks they already learned.\u00a0 Research from the University [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3859,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3858","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/10\/crimedy7_illustration_of_a_sculptor_creating_a_robot_from_a_p_501bf165-0b44-4bb1-9608-1025a42400b7_1.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3858","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3858"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3858\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3859"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3858"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3858"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3858"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d79d7d46fa5cbf45858bd1. Config Timestamp: 2026-04-09 12:37:16 UTC, Cached Timestamp: 2026-04-30 03:48:18 UTC -->