On April 2, 2025, OpenAI announced the launch of PaperBench, a new benchmark aimed at evaluating AI agents’ ability to reproduce cutting-edge AI research. PaperBench requires agents to reproduce 20 ICML from scratch. In 2024, it was selected as Spotlight and Oral papers. The tasks include understanding the contribution of the paper, developing relevant code bases and successfully executing experiments.
In multiple tests conducted on PaperBench, the Claude 3.5 Sonnet (new version) performed well, combining the open source framework with an average reproduction score of 21.0%. Although Claude 3.5 has performed well, OpenAI found that it has not surpassed human baseline performance. Further testing was conducted by top machine learning doctors, showing that there is still room for improvement in the reproduction ability of the agent.
According to foreign media reports, the ChatGPT paid subscription users under OpenAI have exceeded 20 million, an increase of 30% from 15.5 million at the end of 2024.
ChatGPT has reached at least $415 million in monthly revenue and about $5 billion in annual revenue, while OpenAI is also promoting the $200/month Pro version, which may be more real revenue.