Recently, the field of artificial intelligence has caused another wave. Moonshot announced a new open source version of Muon, which has successfully increased computing efficiency to twice the traditional AdamW. The launch of this new optimizer coincides with DeepSeek's upcoming open source of multiple code libraries, which has attracted high attention and discussion in the industry.
The Muon optimizer was originally proposed by OpenAI researcher Keller Jordan and others in 2024 and performed well when training small-scale models. However, as the model size expanded, the original Muon encountered bottlenecks in performance improvement. To this end, the Dark Side team of Moon has made in-depth technical improvements, mainly including adding weight decay and consistent root mean square (RMS) updates to support the application of Muon in large-scale training without any hyperparameter adjustments.
The new Muon optimizer has been applied to the latest Moonlight model, a hybrid expert (MoE) model with 3B/16B parameters, which has significantly improved performance after 5.7 trillion tokens training, becoming the current one "Pareto Frontier". This result means that the Moonlight model surpasses other models in all performance metrics under the same training budget.
The Dark Side of Moon also open sourced the implementation code of Muon and released corresponding pre-training and intermediate checkpoints, providing valuable resources for researchers' subsequent research. Research shows that the Muon optimizer requires only 52% of the FLOPs of AdamW during training, which further verifies its efficiency in large-scale language model training.
The Muon optimizer of the Dark Side of the Moon not only surpasses traditional optimizers in performance, but also injects new vitality into the development of the entire AI field through open source. With more and more researchers and developers participating, this optimizer is expected to drive further advances in artificial intelligence technology.
Paper address: https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf