In recent years, training large language models (LLMs) has become increasingly expensive and complex, and only a few large technology companies have the corresponding computing resources. However, Google recently launched a new method called SALT (Small Model Assisted Large Model Training), an innovation that may completely change the landscape of AI training.
According to the latest research paper from Google Research and DeepMind, "A little help goes a long way: Efficient LLM training by leveraging small language models," SALT introduces a new two-stage training process. This method is not only efficient, but also more practical, changing the way we train in the past.
The first stage of SALT is knowledge distillation. In this stage, the small language model (SLM) acts as a teacher, passing on its understanding to the larger model. Small models share their learned knowledge through "soft labels" to help large models master basic concepts in the early stages of learning. This stage is especially suitable for "simple" tasks where small models have strong prediction confidence in the learning area.
The second stage is self-supervised learning. Large models begin to learn independently at this stage, focusing on mastering more complex patterns and challenging tasks. This transition requires carefully designed strategies, including linear attenuation and linear proportional attenuation, which ensure a smooth transition for large models and gradually reduce dependence on small models.
Google researchers found in experiments that using a small model with 1.5 billion parameters to train a large model with 2.8 billion parameters reduced the training time on the "stack data set" by 28%. After fine-tuning, the accuracy of the large model on math problems increased from 31.84% to 34.87%, and the accuracy of reading comprehension also increased from 63.7% to 67%. This new method not only improves training efficiency, but also achieves significant improvements in performance.
The emergence of SALT is expected to lower the threshold for AI development, allowing many small research institutions and companies that are originally limited by resources to participate in the development of AI models. Opportunities for research and development will become more widespread, which may lead to the creation of more unique and specialized AI solutions, driving innovation and applications in related fields.
AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.
The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.
Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).
You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.
You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.