CompassArena large model evaluation platform upgrades and launches new Judge Copilot function

Author: LoRA Time: 20 Dec 2024 1013

gallery+cover_副本.png

CompassArena, a large model evaluation platform jointly launched by the Shanghai Artificial Intelligence Laboratory's Sinan OpenCompass team and ModelScope, has recently received a major upgrade, aiming to provide users with a more scientific and comprehensive model evaluation experience.

Since its launch, CompassArena has attracted a large number of community users to participate and continues to optimize the platform through user-contributed data. This upgrade includes the following highlights:

New Judge Copilot feature

Judge Copilot is the core new feature of this upgrade. Utilizing the powerful evaluation model Compass-Judger-1-32B-Instruct , this function provides users with a full range of comparative analysis capabilities to help accurately and efficiently evaluate the performance of dialogue models. The advantages of Judge Copilot are:

Multi-dimensional evaluation : Provide a more objective evaluation by comprehensively evaluating model performance from different perspectives.
Real-time comparison analysis : Supports instant comparison between multiple models to help users make quick choices.
Intelligent decision-making assistance : intelligently recommend the best model based on the evaluation results to improve the scientificity and efficiency of evaluation decisions.

Optimize list algorithm

This upgrade also comprehensively improved the platform’s ranking algorithm . Based on the original Bradley-Terry algorithm, control variable technology is added to reduce the interference of confounding factors, thereby making the model ranking more scientific and accurate. This optimization makes the model ranking more representative and more in line with actual application needs.

Added more than 20 new models

This upgrade of CompassArena also adds more than 20 new models , covering domestic and foreign commercial models and open source models, which greatly enriches the platform's battle experience. New models include:

Domestic business models : such as 360gpt2-pro , deep-seek-v2.5-chat , doubao-pro-32k-240828 , etc.
Foreign business models : such as claude-3.5-sonnet-20241022 , gemini-exp-1121 , etc.
Open source models : The platform also introduces a series of open source models, further improving the diversity and comparability of models.

The new models involve organizations including 360 , DeepSeek , Doubao , etc., providing users with more diverse battle options to meet the needs of different application scenarios.

User engagement and feedback

In this upgrade, CompassArena has strengthened the user feedback mechanism of the Judge model . Users can directly evaluate the model by clicking the "Like" and "Dislike" buttons to help the platform further optimize the model's performance. At the same time, by introducing the Bradley-Terry statistical model that fits the control variables, the platform can accurately estimate the impact of external factors on the model evaluation results and display its impact in the form of odds ratios .

Experience address

If you want to experience the newly upgraded CompassArena, you can visit the platform: CompassArena experience address

This upgrade marks a further breakthrough for CompassArena in the field of large model evaluation. It not only improves the accuracy and scientific nature of the evaluation, but also enriches users' choices and further promotes the popularization and application of artificial intelligence technology.

FAQ

Who is the AI course suitable for?

AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.

How difficult is the AI course to learn?

The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.

What foundations are needed to learn AI?

Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).

What can I learn from the AI course?

You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.