Enterprises often find that when they fine-tune models, one effective approach to making a large language model (LLM) fit for purpose and grounded in data is to have the model lose some of its abilities. After fine-tuning, some models “forget” how to perform certain tasks or other tasks they already learned.
Research from the University of Illinois Urbana-Champaign proposes a new method for retraining models that avoids “catastrophic forgetting,” in which the model loses some of its prior knowledge. The paper focuses on two specific LLMs that generate responses from images: LLaVA and Qwen 2.5-VL.
The approach encourages enterprises to retrain only narrow parts of an LLM to avoid retraining the entire model and incurring a significant increase in compute costs. The team claims that catastrophic forgetting isn’t true memory loss, but rather a side effect of bias drift.
“Training a new LMM can cost millions of dollars, weeks of time, and emit hundreds of tons of CO2, so finding ways to more efficiently and effectively update existing models is a pressing concern,” the team wrote in the paper. “Guided by this result, we explore tuning recipes that preserve learning while limiting output shift.”
The researchers focused on a multi-layer perceptron (MLP), the model’s internal decision-making component.
Catastrophic forgetting
The researchers wanted first to verify the existence and the cause of catastrophic forgetting in models.
To do this, they created a set of target tasks for the models to complete. The models were then fine-tuned and evaluated to determine whether they led to substantial forgetting. But as the process went on, the researchers found that the models were recovering some of their abilities.
“We also noticed a surprising result, that the model performance would drop significantly in held out benchmarks after training on the counting task, it would mostly recover on PathVQA, another specialized task that is not well represented in the benchmarks,” they said. “Meanwhile, while performing the forgetting mitigation experiments, we also tried separately tuning only the self-attention projection (SA Proj) or MLP layers, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result – that tuning only self-attention projection layers led to very good learning of the target tasks with no drop in performance in held out tasks, even after training all five target tasks in a sequence.”
The researchers said they believe that “what looks like forgetting or interference after fine-tuning on a narrow target task is actually bias in the output distribution due to the task distribution shift.”
Narrow retraining
That finding turned out to be the key to the experiment. The researchers noted that tuning the MLP increases the likelihood of “outputting numeric tokens and a highly correlated drop in held out task accuracy.” What it showed is that a model forgetting some of its knowledge is only temporary and not a long-term matter.
“To avoid biasing the output distribution, we tune the MLP up/gating projections while keeping the down projection frozen, and find that it achieves similar learning to full MLP tuning with little forgetting,” the researchers said.
This allows for a more straightforward and more reproducible method for fine-tuning a model.
By focusing on a narrow segment of the model, rather than a wholesale retraining, enterprises can cut compute costs. It also allows better control of output drift.
However, the research focuses only on two models, specifically those dealing with vision and language. The researchers noted that due to limited resources, they are unable to try the experiment with other models.
Their findings, however, can be extended to other LLMs, especially for different modalities.