Recently, the artificial intelligence development platform Hugging Face team released two new AI models, SmolVLM-256M and SmolVLM-500M. They confidently claim that these two models are the smallest AI models so far, capable of processing images, short videos and text data at the same time, and are especially suitable for devices with less than 1GB of memory, such as laptops. This innovation allows developers to achieve greater efficiency at lower costs when processing large amounts of data.
The parameters of these two models are 256 million and 500 million respectively, which means that their ability to solve problems has also been improved accordingly. The more parameters, the better the performance of the model. The SmolVLM family is capable of performing tasks such as describing images or video clips and answering questions about PDF documents and their contents, such as scanning text and graphics. This makes them have broad application prospects in education, research and other fields.
During the training process of the model, the Hugging Face team utilized a dataset of 50 high-quality images and text called "The Cauldron" and a dataset called Docmatix paired with document scans and detailed descriptions. Both datasets were developed by Hugging Face’s M4 team, which focuses on the development of multi-modal AI technology. Notably, the SmolVLM-256M and SmolVLM-500M outperformed many larger models, such as Idefics80B, in various benchmark tests, especially in the AI2D test, where they excelled in their ability to analyze elementary school students' science diagrams.
However, while small models are affordable and versatile, they may not perform as well as larger models on complex inference tasks. A study from Google DeepMind, Microsoft Research, and the Mila Institute in Quebec shows that many small models perform disappointingly on these complex tasks. The researchers speculate that this may be because small models tend to recognize superficial features of the data and struggle to apply that knowledge in new situations.
Hugging Face's SmolVLM series models are not only compact AI tools, but also show impressive capabilities in handling various tasks. This is undoubtedly a good choice for developers who want to achieve efficient data processing at low cost.
AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.
The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.
Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).
You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.
You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.