OpenAI’s operator tool could soon take control of your PC

Đăng bởi: techai • Ngày: 22/01/2025

OpenAI may be on the brink of unveiling its much-anticipated AI tool, known as Operator, designed to take control of personal computers and perform actions autonomously on behalf of users. This development comes on the heels of allegations by Tibor Blaho, a software engineer with a track record of accurately leaking information about forthcoming AI products. According to Blaho, newly uncovered code suggests that the Operator tool is not only imminent but that it is also making significant strides towards functionality.

Previously, multiple publications, including Bloomberg, have reported on the Operator tool, labeling it as an
“agentic” system capable of independently executing a variety of tasks, such as writing code and making travel arrangements, without real-time human intervention. Reports suggest that the month of January has been targeted for the release of Operator, further fueled by Blaho’s findings this past weekend, which included hidden options within OpenAI’s ChatGPT client for macOS to define keyboard shortcuts for “Toggle Operator” and “Force Quit Operator.” Intriguingly, Blaho also indicated that OpenAI’s website features references to the Operator tool that have not yet become publicly visible.

Compellingly, additional information from Blaho states that there are comparative tables on OpenAI’s website, which measure Operator’s performance against other AI systems specifically designed for computer tasks. Although these tables might be placeholders, they imply that the effectiveness of Operator may not be universally reliable, varying by task.

In a benchmark simulating a real computer environment, known as OSWorld, early indications show that the tool—tentatively branded as the “OpenAI Computer Use Agent (CUA)”—achieved a score of 38.1%. This score is positioned above Anthropic’s computer-controlling model yet significantly lags behind the human score of 72.4%. Interestingly, while the OpenAI CUA pulls ahead in tasks related to website navigation and interaction as gauged by the WebVoyager benchmark, it does not fare as well in another evaluation known as WebArena. This raises questions about the extent of Operator’s capabilities and highlights potential gaps in its application, specifically in tasks that a human user would find straightforward.

For instance, during a test where Operator was asked to register with a cloud provider and initiate a virtual machine, the tool was deemed successful merely 60% of the time. In a more complex assignment involving the creation of a Bitcoin wallet, its success rate dwindled to an uninspiring 10%. Such statistics invite scrutiny regarding the tool’s reliability and overall effectiveness.

As the news of Operator’s anticipated launch circulates, it finds itself in a competitive landscape where rival tech giants like Anthropic and Google are also vying for dominance in the emerging segment of AI agents. While the concept of AI agents is still in its infancy, industry experts assert that they could represent the next significant advancement in artificial intelligence technology. According to analytics firm Markets and Markets, this sector could potentially reach a market valuation of $47.1 billion by 2030, illustrating the commercial potential embedded within such innovations.

Despite the optimistic commercial forecasts, the capabilities of AI agents, including Operator, are currently regarded as somewhat primitive. Safety concerns have arisen about the operational risks associated with these technologies, particularly if they evolve rapidly without adequate oversight or regulations. However, one of the leaked charts suggests Operator’s strong performance in specific safety evaluations designed to detect illicit activities and safeguard sensitive personal data. This focus on safety is posited as a primary reason for the drawn-out development timeline of the Operator tool.

OpenAI co-founder Wojciech Zaremba, in a recent statement on the social platform X, articulated his apprehensions regarding the reckless release of AI agents that lack stringent safety measures, referring specifically to Anthropic’s recent product. Zaremba expressed that any similar release by OpenAI would likely provoke significant backlash from the community.

However, OpenAI has faced criticism from AI researchers and former employees, claiming that the organization may be prioritizing expedience in developing its technology over ensuring adequate safety protocols are in place. As the launch of Operator approaches, many will be watching closely to see how OpenAI navigates both the promise of its new tool and the broader challenges inherent in the fast-evolving AI landscape. In conclusion, while the arrival of the Operator tool represents a compelling step forward in AI technology, it also brings significant implications that must be carefully considered before deployment, particularly around reliability and safety.

Until more definitive information emerges from OpenAI regarding the Operator tool and its capabilities, the technology community remains on alert, eager to understand the real-world applications of this innovative advancement in artificial intelligence.