CompassArena, a large model evaluation platform jointly launched by the Shanghai Artificial Intelligence Laboratory's Sinan OpenCompass team and ModelScope, has recently received a major upgrade, aiming to provide users with a more scientific and comprehensive model evaluation experience.
Since its launch, CompassArena has attracted a large number of community users to participate and continues to optimize the platform through user-contributed data. This upgrade includes the following highlights:
Judge Copilot is the core new feature of this upgrade. Utilizing the powerful evaluation model Compass-Judger-1-32B-Instruct , this function provides users with a full range of comparative analysis capabilities to help accurately and efficiently evaluate the performance of dialogue models. The advantages of Judge Copilot are:
Multi-dimensional evaluation : Provide a more objective evaluation by comprehensively evaluating model performance from different perspectives.
Real-time comparison analysis : Supports instant comparison between multiple models to help users make quick choices.
Intelligent decision-making assistance : intelligently recommend the best model based on the evaluation results to improve the scientificity and efficiency of evaluation decisions.
This upgrade also comprehensively improved the platform’s ranking algorithm . Based on the original Bradley-Terry algorithm, control variable technology is added to reduce the interference of confounding factors, thereby making the model ranking more scientific and accurate. This optimization makes the model ranking more representative and more in line with actual application needs.
This upgrade of CompassArena also adds more than 20 new models , covering domestic and foreign commercial models and open source models, which greatly enriches the platform's battle experience. New models include:
Domestic business models : such as 360gpt2-pro , deep-seek-v2.5-chat , doubao-pro-32k-240828 , etc.
Foreign business models : such as claude-3.5-sonnet-20241022 , gemini-exp-1121 , etc.
Open source models : The platform also introduces a series of open source models, further improving the diversity and comparability of models.
The new models involve organizations including 360 , DeepSeek , Doubao , etc., providing users with more diverse battle options to meet the needs of different application scenarios.
In this upgrade, CompassArena has strengthened the user feedback mechanism of the Judge model . Users can directly evaluate the model by clicking the "Like" and "Dislike" buttons to help the platform further optimize the model's performance. At the same time, by introducing the Bradley-Terry statistical model that fits the control variables, the platform can accurately estimate the impact of external factors on the model evaluation results and display its impact in the form of odds ratios .
If you want to experience the newly upgraded CompassArena, you can visit the platform: CompassArena experience address
This upgrade marks a further breakthrough for CompassArena in the field of large model evaluation. It not only improves the accuracy and scientific nature of the evaluation, but also enriches users' choices and further promotes the popularization and application of artificial intelligence technology.
AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.
The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.
Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).
You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.
You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.