Sakana AI's Transformer² model breaks through the limitations of LLM and achieves dynamic reasoning

Author: LoRA Time: 24 Jan 2025 728

Sakana AI, an artificial intelligence research laboratory focusing on nature-inspired algorithms, recently launched an innovative adaptive language model called Transformer² (Transformer-squared). The model can dynamically learn and adapt to new tasks during inference without expensive fine-tuning, marking an important step in the development of large language model (LLM) technology.

The core innovation of Transformer² is its unique two-step dynamic weight adjustment mechanism. First, it analyzes incoming user requests to understand task requirements; then, through mathematical techniques, it uses singular value decomposition (SVD) to align model weights with task requirements. By selectively adjusting key components of model weights, Transformer² is able to optimize performance in real time without the need for time-consuming retraining. This is in sharp contrast to traditional fine-tuning methods, which require keeping parameters static after training, or using methods such as low-rank adaptation (LoRA) to modify only a small set of parameters.

Transformer square training and inference (Source: arXiv)

In order to achieve dynamic adjustment, the researchers adopted the singular value fine-tuning (SVF) method. During training, SVF learns a set of skill representations called z vectors from the SVD component of the model. At inference time, Transformer² analyzes the prompts to determine the required skills and then configures the corresponding z-vectors, enabling responses tailored to each prompt.

Test results show that Transformer² outperforms the LoRA model in various tasks such as mathematics, coding, reasoning, and visual question answering with fewer parameters. What is even more remarkable is that the model also has knowledge transfer capabilities, that is, the z vector learned from one model can be applied to another model, thus indicating the potential for widespread applications.

Transformer-squared (SVF in table) comparison with base model and LoRA (source: arXiv)

Sakana AI has released the training code for the Transformer² component on its GitHub page, opening the door to other researchers and developers.

As enterprises continue to explore the application of LLM, inference-time customization technology is gradually becoming a mainstream trend. Transformer², along with other technologies such as Google's Titans, is changing the way LLM is applied, allowing users to dynamically adapt models to their specific needs without the need for retraining. Advances in this technology will make LLM more useful and practical in a wider range of fields.

Researchers at Sakana AI say Transformer² represents a bridge between static AI and living intelligence, laying the foundation for efficient, personalized and fully integrated AI tools.

FAQ

Who is the AI course suitable for?

AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.

How difficult is the AI course to learn?

The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.

What foundations are needed to learn AI?

Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).

What can I learn from the AI course?

You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.

What kind of work can I do after completing the AI course?

You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.