Traditional large language model (LLM) fine-tuning methods are often computationally intensive and static in handling diverse tasks. To address these challenges, Sakana AI introduces a new adaptive framework called Transformer². Transformer² can adjust the weights of LLM in real time during the inference process, making it adaptable to a variety of unknown tasks, as flexible as an octopus.
The core of Transformer² lies in a two-stage mechanism:
In the first stage, a scheduling system analyzes the user's query and identifies the attributes of the task.
In the second stage, the system dynamically mixes multiple "expert" vectors. These vectors are trained using reinforcement learning, with each vector focusing on a specific type of task, resulting in model behavior customized for the task at hand.
This method uses fewer parameters and is more efficient than traditional fine-tuning methods (such as LoRA). Transformer² has demonstrated strong adaptability in different LLM architectures and modalities, including visual language tasks.
Key Technologies of Transformer²
Singular value fine-tuning (SVF): This is a novel parameter-efficient fine-tuning method, which is achieved by extracting and adjusting singular values in the model weight matrix. This approach reduces the risk of overfitting, reduces computational requirements, and allows for inherent composability. By using reinforcement learning training on a narrow dataset, an effective set of domain-specific "expert" vectors can be obtained to directly optimize task performance for individual subjects.
Adaptive strategies: During the inference phase, Transformer² uses three different adaptive strategies to combine expert vectors trained by SVF. These strategies can dynamically adjust the weight of LLM according to the conditions during testing, thereby achieving self-adaptation.
Transformer² Advantages
Dynamic Adaptability: Transformer² is able to evaluate and modify its behavior based on changes in the operating environment or internal state, without the need for external intervention.
Parameter efficient: Compared with methods such as LoRA, SVF uses fewer parameters but has higher performance.
Modularity capabilities: Expert vectors provide modularity capabilities, while adaptive strategies can dynamically determine and combine the most appropriate vectors to handle input tasks.
Reinforcement learning optimization: With reinforcement learning, task performance can be directly optimized without relying on expensive fine-tuning procedures and large data sets.
Cross-model compatibility: SVF expert vectors can be transferred between different LLM models thanks to their inherent ranking structure.
Experimental results
Experiments on multiple LLMs and tasks show that SVF consistently outperforms traditional fine-tuning strategies such as LoRA.
Transformer²’s adaptive strategy has shown significant improvements across a variety of unknown tasks.
Using classification experts for task classification achieves higher classification accuracy than directly using hint engineering.
The contribution of the adaptation coefficient (αk) is uneven across different model and task combinations.
future outlook
While Transformer² has made significant progress, there is still room for further improvements. Future research can explore model merging techniques to merge different specialized models into a more powerful model. In addition, it is possible to examine how the CEM approach can be extended to address more specialized areas.
All in all, Transformer² represents a major leap forward in the field of adaptive LLM, paving the way for building truly dynamic, self-organizing AI systems.
Paper address: https://arxiv.org/pdf/2501.06252
AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.
The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.
Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).
You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.
You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.