Current location: Home> Ai News

Microsoft releases Phi-4 multimodal and mini model: voice and visual text processing is upgraded

Author: LoRA Time: 27 Feb 2025 947

Recently, Microsoft further expanded the Phi-4 family and launched two new models: Phi-4 multimodal and Phi-4 mini. The unveiling of these two models will undoubtedly provide more powerful processing capabilities for various AI applications.

The Phi-4 multimodal model is Microsoft's first unified architectural model integrating voice, vision and text processing, with 56 million parameters. This model performs well in multiple benchmarks, surpassing many competitors on the market today, such as Google's Gemini2.0 series. In automatic speech recognition (ASR) and speech translation (ST) tasks, the Phi-4 multimodal model performed particularly well, successfully defeating professional speech models such as WhisperV3 and SeamlessM4T-v2-Large, and its word error rate ranked first in the Hugging Face OpenASR rankings with a score of 6.14%.

In terms of visual processing, the Phi-4 multimodal model also performed well. Its ability to understand documents, charts and perform optical character recognition (OCR) is impressive. Compared with popular models such as Gemini-2-Flash-lite-preview and Claude-3.5-Sonnet, this model performs comparable and even better.

Another newly released Phi-4 mini model focuses on text processing tasks, with a parameter volume of 38 million. In terms of text reasoning, mathematical calculations, programming and instructional compliance, the Phi-4 mini performs outstandingly, surpassing a number of popular large language models. To ensure the security and reliability of the new model, Microsoft invited internal and external security experts to conduct comprehensive testing and optimized according to the standards of Microsoft's Artificial Intelligence Red Team (AIRT).

Both new models can be deployed on different devices via ONNX Runtime, suitable for a variety of low-cost and low-latency applications. They are available in Azure AI Foundry, Hugging Face, and NVIDIA API directories for developers. There is no doubt that the new model of the Phi-4 series marks a major advancement in Microsoft's efficient AI technology, opening up new possibilities for future artificial intelligence applications.