Recently, OpenAI released a study on its latest inference model o3, showing how large language models (LLMs) can grow from beginner competition programmers to top competitors in the world. o3 scored a 2724 rating on the well-known programming platform CodeForces, which is in the top 99.8% percentile, performs quite well and won a gold medal level in the 2024 International Olympics of Informatics (IOI).
Research shows that the o3 model surpasses the o1-ioi model that is specifically fine-tuned for this activity in the IOI competition, and this result shows that achievements through reinforcement learning are better than manual design solutions. In the IOI2024 event, o3 participated in the competition under standard conditions and successfully crossed the threshold of gold medals. At the same time, it is also among the top 200 programmers in the world in CodeForces, and can compete with top human programmers.
“The universal reasoning capabilities developed through reinforcement learning are now outperforming those well-designed domain-specific solutions. Rather than building a dedicated system for a specific task, it is better to use stronger Inference ability to enable large general models to achieve better results.”
This study is part of OpenAI's evaluation of its model's performance in competitive programming and the wider software engineering field. In addition, another company, Anthropic, also released a report on the impact of AI on the workplace this Monday. The report notes that about 36% of occupations use AI in at least 25% of work tasks, while 57% of AI applications improve human capabilities and 43% focus on automation. Nevertheless, only 4% of occupations, AI is used for at least 75% of work tasks.
This study also shows that software development and technical writing are the main areas of AI applications, and that AI has a relatively small role in tasks involving physical interaction with the environment.