Sakana AI, an artificial intelligence research laboratory focusing on nature-inspired algorithms, recently launched an innovative adaptive language model called Transformer² (Transformer-squared). The model can dynamically learn and adapt to new tasks during inference without expensive fine-tuning, marking an important step in the development of large language model (LLM) technology.
The core innovation of Transformer² is its unique two-step dynamic weight adjustment mechanism. First, it analyzes incoming user requests to understand task requirements; then, through mathematical techniques, it uses singular value decomposition (SVD) to align model weights with task requirements. By selectively adjusting key components of model weights, Transformer² is able to optimize performance in real time without the need for time-consuming retraining. This is in sharp contrast to traditional fine-tuning methods, which require keeping parameters static after training, or using methods such as low-rank adaptation (LoRA) to modify only a small set of parameters.
Transformer square training and inference (Source: arXiv)
In order to achieve dynamic adjustment, the researchers adopted the singular value fine-tuning (SVF) method. During training, SVF learns a set of skill representations called z vectors from the SVD component of the model. At inference time, Transformer² analyzes the prompts to determine the required skills and then configures the corresponding z-vectors, enabling responses tailored to each prompt.
Test results show that Transformer² outperforms the LoRA model in various tasks such as mathematics, coding, reasoning, and visual question answering with fewer parameters. What is even more remarkable is that the model also has knowledge transfer capabilities, that is, the z vector learned from one model can be applied to another model, thus indicating the potential for widespread applications.
Transformer-squared (SVF in table) comparison with base model and LoRA (source: arXiv)
Sakana AI has released the training code for the Transformer² component on its GitHub page, opening the door to other researchers and developers.
As enterprises continue to explore the application of LLM, inference-time customization technology is gradually becoming a mainstream trend. Transformer², along with other technologies such as Google's Titans, is changing the way LLM is applied, allowing users to dynamically adapt models to their specific needs without the need for retraining. Advances in this technology will make LLM more useful and practical in a wider range of fields.
Researchers at Sakana AI say Transformer² represents a bridge between static AI and living intelligence, laying the foundation for efficient, personalized and fully integrated AI tools.