Current location: Home> Ai News

OpenAI research reveals: state-of-the-art AI is still difficult to comply with complex coding tasks

Author: LoRA Time: 24 Feb 2025 323

Recently, OpenAI researchers admitted in a newly released paper that although the current AI technology is quite advanced, these models are still incomparable to human programmers. OpenAI CEO Sam Altman has said that AI is expected to defeat "low-level" software engineers by the end of this year, but the research results show that these AI models still face significant challenges.

Code Internet (1)

In the study, the OpenAI team used a new benchmark called SWE-Lancer to evaluate the performance of more than 1,400 software engineering tasks extracted from the freelance website Upwork. The test focused on the coding capabilities of three large language models (LLMs), including OpenAI's o1 inference model, flagship GPT-4o, and Anthropic's Claude3.5Sonnet.

These models are required to complete two types of tasks: one is a single task, which mainly focuses on fixing errors in the program; the other is to manage tasks, which requires the model to make higher-level decisions. During the testing process, these models do not have access to the Internet, meaning they cannot directly find answers online.

Although the total value of tasks these models undertake is as high as hundreds of thousands of dollars, they can only fix superficial problems and make it difficult to find deeper errors and root causes in complex projects. This situation reminds you of the experience of using AI: While AI can quickly generate seemingly correct information, it often reveals shortcomings in deeper testing.

The paper points out that while these three LLMs are far faster than humans in processing tasks, they often fail to fully understand the broadness and context of errors, which leads to the solutions they give often inaccurate or incomplete enough. The researchers said that Claude3.5Sonnet performed better than OpenAI's two models and earned higher returns, but its answers were still not as accurate as they could be.

Research shows that although these advanced AI models can operate quickly on certain specific tasks, they are still insufficient in overall software engineering capabilities and are far from reaching the level that can replace human programmers. However, this has not stopped some companies from replacing human programmers with immature AI models.