{"id":3866,"date":"2025-10-14T04:04:10","date_gmt":"2025-10-14T04:04:10","guid":{"rendered":"https:\/\/violethoward.com\/new\/self-improving-language-models-are-becoming-reality-with-mits-updated-seal-technique\/"},"modified":"2025-10-14T04:04:10","modified_gmt":"2025-10-14T04:04:10","slug":"self-improving-language-models-are-becoming-reality-with-mits-updated-seal-technique","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/self-improving-language-models-are-becoming-reality-with-mits-updated-seal-technique\/","title":{"rendered":"Self-improving language models are becoming reality with MIT&#039;s updated SEAL technique"},"content":{"rendered":"<p> <br \/>\n<br \/><img decoding=\"async\" src=\"https:\/\/images.ctfassets.net\/jdtwqhzvc2n1\/4816Y0YfsXKIENLGFsuaG6\/a4620bd99d25c8fe32ab054bd16ff390\/cfr0z3n_a_cybernetic_seal_looks_up_with_cute_alert_eyes_under_a_a6f43d56-7792-4d4f-bc1e-18b6dd2f5e4e.png\" \/><\/p>\n<p>Researchers at the Massachusetts Institute of Technology (MIT) are gaining renewed attention for developing and open sourcing a technique that allows large language models (LLMs) \u2014 like those underpinning ChatGPT and most modern AI chatbots \u2014 to improve themselves by generating synthetic data to fine-tune upon. <\/p>\n<p>The technique, known as SEAL (Self-Adapting LLMs), was first described in a paper published back in June and covered by VentureBeat at the time.<\/p>\n<p>A significantly expanded and updated version of the paper was released last month, as well as open source code posted on Github (under an MIT License, allowing for commercial and enterprise usage), and is making new waves among AI power users on the social network X this week.<\/p>\n<p>SEAL allows LLMs to autonomously generate and apply their own fine-tuning strategies. Unlike conventional models that rely on fixed external data and human-crafted optimization pipelines, SEAL enables models to evolve by producing their own synthetic training data and corresponding optimization directives.<\/p>\n<p>The development comes from a team affiliated with MIT\u2019s Improbable AI Lab, including Adam Zweiger, Jyothish Pari, Han Guo, Ekin Aky\u00fcrek, Yoon Kim, and Pulkit Agrawal. Their research was recently presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025).<\/p>\n<h3><b>Background: From \u201cBeyond Static AI\u201d to Self-Adaptive Systems<\/b><\/h3>\n<p>Earlier this year, VentureBeat first reported on SEAL as an early-stage framework that allowed language models to generate and train on their own synthetic data \u2014 a potential remedy for the stagnation of pretrained models once deployed. <\/p>\n<p>At that stage, SEAL was framed as a proof-of-concept that could let enterprise AI agents continuously learn in dynamic environments without manual retraining.<\/p>\n<p>Since then, the research has advanced considerably. The new version expands on the prior framework by demonstrating that SEAL\u2019s self-adaptation ability scales with model size, integrates reinforcement learning more effectively to reduce catastrophic forgetting, and formalizes SEAL\u2019s dual-loop structure (inner supervised fine-tuning and outer reinforcement optimization) for reproducibility. <\/p>\n<p>The updated paper also introduces evaluations across different prompting formats, improved stability during learning cycles, and a discussion of practical deployment challenges at inference time.<\/p>\n<h3><b>Addressing the Limitations of Static Models<\/b><\/h3>\n<p>While LLMs have demonstrated remarkable capabilities in text generation and understanding, their adaptation to new tasks or knowledge is often manual, brittle, or dependent on context. <\/p>\n<p>SEAL challenges this status quo by equipping models with the ability to generate what the authors call \u201cself-edits\u201d \u2014 natural language outputs that specify how the model should update its weights.<\/p>\n<p>These self-edits may take the form of reformulated information, logical implications, or tool configurations for augmentation and training. Once generated, the model fine-tunes itself based on these edits. The process is guided by reinforcement learning, where the reward signal comes from improved performance on a downstream task.<\/p>\n<p>The design mimics how human learners might rephrase or reorganize study materials to better internalize information. This restructuring of knowledge before assimilation serves as a key advantage over models that passively consume new data \u201cas-is.\u201d<\/p>\n<h3><b>Performance Across Tasks<\/b><\/h3>\n<p>SEAL has been tested across two main domains: knowledge incorporation and few-shot learning.<\/p>\n<p>In the knowledge incorporation setting, the researchers evaluated how well a model could internalize new factual content from passages similar to those in the SQuAD dataset, a benchmark reading comprehension dataset introduced by Stanford University in 2016, consisting of over 100,000 crowd-sourced question\u2013answer pairs based on Wikipedia articles (Rajpurkar et al., 2016). <\/p>\n<p>Rather than fine-tuning directly on passage text, <b>the model generated synthetic implications of the passage<\/b> and then fine-tuned on them. <\/p>\n<p>After two rounds of reinforcement learning, the model improved question-answering accuracy from 33.5% to 47.0% on a no-context version of SQuAD \u2014 surpassing results obtained using synthetic data generated by GPT-4.1.<\/p>\n<p>In the few-shot learning setting, SEAL was evaluated using a subset of the ARC benchmark, where tasks require reasoning from only a few examples. Here, SEAL generated self-edits specifying data augmentations and hyperparameters. <\/p>\n<p>After reinforcement learning,<b> the success rate in correctly solving held-out tasks jumped to 72.5%, up from 20% using self-edits generated without reinforcement learning. <\/b>Models that relied solely on in-context learning without any adaptation scored 0%.<\/p>\n<h3><b>Technical Framework<\/b><\/h3>\n<p>SEAL operates using a two-loop structure: an inner loop performs supervised fine-tuning based on the self-edit, while an outer loop uses reinforcement learning to refine the policy that generates those self-edits.<\/p>\n<p>The reinforcement learning algorithm used is based on ReSTEM, which combines sampling with filtered behavior cloning. During training, only self-edits that lead to performance improvements are reinforced. This approach effectively teaches the model which kinds of edits are most beneficial for learning.<\/p>\n<p>For efficiency, SEAL applies LoRA-based fine-tuning rather than full parameter updates, enabling rapid experimentation and low-cost adaptation.<\/p>\n<h3><b>Strengths and Limitations<\/b><\/h3>\n<p>The researchers report that SEAL can produce high-utility training data with minimal supervision, outperforming even large external models like GPT-4.1 in specific tasks. <\/p>\n<p>They also demonstrate that SEAL generalizes beyond its original setup: it continues to perform well when scaling from single-pass updates to multi-document continued pretraining scenarios.<\/p>\n<p>However, the framework is not without limitations. One issue is catastrophic forgetting, where updates to incorporate new information can degrade performance on previously learned tasks. <\/p>\n<p>In response to this concern, co-author Jyo Pari told VentureBeat via email that reinforcement learning (RL) appears to mitigate forgetting more effectively than standard supervised fine-tuning (SFT), citing a recent paper on the topic. He added that combining this insight with SEAL could lead to new variants where SEAL learns not just training data, but reward functions.<\/p>\n<p>Another challenge is computational overhead: evaluating each self-edit requires fine-tuning and performance testing, which can take 30\u201345 seconds per edit \u2014 significantly more than standard reinforcement learning tasks. <\/p>\n<p>As Jyo explained, \u201cTraining SEAL is non-trivial because it requires 2 loops of optimization, an outer RL one and an inner SFT one. At inference time, updating model weights will also require new systems infrastructure.\u201d He emphasized the need for future research into deployment systems as a critical path to making SEAL practical.<\/p>\n<p>Additionally, SEAL\u2019s current design assumes the presence of paired tasks and reference answers for every context, limiting its direct applicability to unlabeled corpora. However, Jyo clarified that as long as there is a downstream task with a computable reward, SEAL can be trained to adapt accordingly\u2014even in safety-critical domains. In principle, a SEAL-trained model could learn to avoid training on harmful or malicious inputs if guided by the appropriate reward signal.<\/p>\n<h3><b>AI Community Reactions<\/b><\/h3>\n<p>The AI research and builder community has reacted with a mix of excitement and speculation to the SEAL paper. On X, formerly Twitter, several prominent AI-focused accounts weighed in on the potential impact.<\/p>\n<p>User @VraserX, a self-described educator and AI enthusiast, called SEAL \u201cthe birth of continuous self-learning AI\u201d and predicted that models like OpenAI&#x27;s GPT-6 could adopt similar architecture. <\/p>\n<p>In their words, SEAL represents \u201cthe end of the frozen-weights era,\u201d ushering in systems that evolve as the world around them changes. <\/p>\n<p>They highlighted SEAL&#x27;s ability to form persistent memories, repair knowledge, and learn from real-time data, comparing it to a foundational step toward models that don\u2019t just use information but absorb it.<\/p>\n<p>Meanwhile, @alex_prompter, co-founder of an AI-powered marketing venture, framed SEAL as a leap toward models that literally rewrite themselves. \u201cMIT just built an AI that can rewrite its own code to get smarter,\u201d he wrote. <b>Citing the paper\u2019s key results \u2014 a 40% boost in factual recall and outperforming GPT-4.1 using self-generated data <\/b>\u2014 he described the findings as confirmation that \u201cLLMs that finetune themselves are no longer sci-fi.\u201d<\/p>\n<p>The enthusiasm reflects a broader appetite in the AI space for models that can evolve without constant retraining or human oversight \u2014 particularly in rapidly changing domains or personalized use cases.<\/p>\n<h3><b>Future Directions and Open Questions<\/b><\/h3>\n<p>In response to questions about scaling SEAL to larger models and tasks, Jyo pointed to experiments (Appendix B.7) showing that as model size increases, so does their self-adaptation ability. He compared this to students improving their study techniques over time \u2014 larger models are simply better at generating useful self-edits.<\/p>\n<p>When asked whether SEAL generalizes to new prompting styles, he confirmed it does, citing Table 10 in the paper. However, he also acknowledged that the team has not yet tested SEAL\u2019s ability to transfer across entirely new domains or model architectures. <\/p>\n<p>\u201cSEAL is an initial work showcasing the possibilities,\u201d he said. \u201cBut it requires much more testing.\u201d He added that generalization may improve as SEAL is trained on a broader distribution of tasks.<\/p>\n<p>Interestingly, the team found that only a few reinforcement learning steps already led to measurable performance gains. \u201cThis is exciting,\u201d Jyo noted, \u201cbecause it means that with more compute, we could hopefully get even more improvements.\u201d He suggested future experiments could explore more advanced reinforcement learning methods beyond ReSTEM, such as Group Relative Policy Optimization (GRPO).<\/p>\n<h3><b>Toward More Adaptive and Agentic Models<\/b><\/h3>\n<p>SEAL represents a step toward models that can autonomously improve over time, both by integrating new knowledge and by reconfiguring how they learn. The authors envision future extensions where SEAL could assist in self-pretraining, continual learning, and the development of agentic systems \u2014 models that interact with evolving environments and adapt incrementally.<\/p>\n<p>In such settings, a model could use SEAL to synthesize weight updates after each interaction, gradually internalizing behaviors or insights. This could reduce the need for repeated supervision and manual intervention, particularly in data-constrained or specialized domains.<\/p>\n<p>As public web text becomes saturated and further scaling of LLMs becomes bottlenecked by data availability, self-directed approaches like SEAL could play a critical role in pushing the boundaries of what LLMs can achieve.<\/p>\n<p>You can access the SEAL project, including code and further documentation, at: https:\/\/jyopari.github.io\/posts\/seal<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/venturebeat.com\/ai\/self-improving-language-models-are-becoming-reality-with-mits-updated-seal\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Researchers at the Massachusetts Institute of Technology (MIT) are gaining renewed attention for developing and open sourcing a technique that allows large language models (LLMs) \u2014 like those underpinning ChatGPT and most modern AI chatbots \u2014 to improve themselves by generating synthetic data to fine-tune upon. The technique, known as SEAL (Self-Adapting LLMs), was first [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3867,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[33],"tags":[],"class_list":["post-3866","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-automation"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/violethoward.com\/new\/wp-content\/uploads\/2025\/10\/cfr0z3n_a_cybernetic_seal_looks_up_with_cute_alert_eyes_under_a_a6f43d56-7792-4d4f-bc1e-18b6dd2f5e4e-scaled.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3866","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/comments?post=3866"}],"version-history":[{"count":0,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/posts\/3866\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media\/3867"}],"wp:attachment":[{"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/media?parent=3866"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/categories?post=3866"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/violethoward.com\/new\/wp-json\/wp\/v2\/tags?post=3866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d79d7d46fa5cbf45858bd1. Config Timestamp: 2026-04-09 12:37:16 UTC, Cached Timestamp: 2026-04-30 03:48:28 UTC -->