Recently, there is news that OpenAI is about to launch an AI tool called "Operator", which has the ability to control a personal computer and perform tasks on its behalf. Software engineer Tibor Blaho revealed the news on social media, saying he had discovered the latest clues about the tool. Previously, many media, including Bloomberg, had reported rumors of the “Operator”, saying it could autonomously complete multiple tasks such as writing code and booking travel.
According to Blaho, OpenAI plans to release “Operator” in January 2025. He discovered that OpenAI's ChatGPT macOS client has a new hidden option that can define shortcut keys for "switching Operator" and "force quit Operator". In addition, information related to “Operator” has also appeared on OpenAI’s website, although this information has not yet been made public.
Blaho also mentioned that there are some tables on the OpenAI website that compare the performance of "Operator" with other computer-based AI systems, and these tables may be just placeholders. If the data in the table is accurate, the performance of displaying "Operator" is not always reliable, depending on the task being performed.
In a benchmark test by OSWorld, the "OpenAI Computer Usage Agent (CUA)" scored 38.1%. Although it surpassed Anthropic's computer-controlled model, it was still far lower than the human score of 72.4%. Operator's performance exceeded human performance in WebVoyager's test, but fell below human performance in WebArena's test. For some simple tasks, such as registering with a cloud service provider and starting a virtual machine, the Operator's success rate is only 60%; and for the task of creating a Bitcoin wallet, its success rate is only 10%.
OpenAI's entry into the AI agent market comes at a time when other competitors such as Anthropic and Google are also rushing to launch similar technologies. Although AI agents are still in their infancy, market analysis firm Markets and Markets predicts that the AI agent market will be worth $47.1 billion by 2030.
Although current AI agent technology is still relatively basic, some experts have expressed concerns about its potential security risks. Data disclosed by Blaho shows that Operator performed well in some security assessments and was able to effectively respond to tests that attempted to make the system perform "illegal activities" or search for "sensitive personal data." Security testing is considered one of the reasons for the long development cycle of Operator.
Wojciech Zaremba, co-founder of OpenAI, has criticized the lack of security of the agents released by Anthropic on social media. He said that if OpenAI releases a similar product, it may trigger a negative response.