The AI community has recently raised eyebrows over Epoch AI, a nonprofit organization dedicated to developing mathematical benchmarks for artificial intelligence, after it was revealed that they had received financial support from OpenAI. This disclosure, which came on December 20, 2025, sparked allegations of a lack of transparency from some contributors and observers within the industry, as many felt that such funding details should have been made known earlier.
Epoch AI is primarily supported by Open Philanthropy, a foundation aimed at improving the world through researched grants. The organization developed FrontierMath, a comprehensive metric designed to assess the mathematical capabilities of AI systems—particularly relevant as OpenAI prepared to showcase its forthcoming flagship model, known as o3. The benchmarks set by FrontierMath included expert-level problems that serve to evaluate an AI’s understanding and performance in mathematical reasoning. According to information released in conjunction with OpenAI’s o3 announcement, the organization had considerable access to the problems and solutions that form the FrontierMath dataset. However, this relationship was not disclosed until after the model was unveiled, raising concerns about potential biases in the benchmarking process.
A contributor to FrontierMath, who identified themselves on the social platform LessWrong as “Meemi,” lamented the lack of communication regarding OpenAI’s financial contribution. Meemi expressed that many individuals involved in developing the benchmark were completely oblivious to OpenAI’s backing until the official announcement. “The communication about this has been non-transparent,” they wrote. “Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information regarding their work and its potential implications.” This sentiment echoed across various forums and social media outlets, with many fearing that the undisclosed funding could taint the credibility of FrontierMath as a neutral test.
Carina Hong, a Stanford PhD mathematics student, amplified these concerns in a post on X, stating that multiple mathematicians who played significant roles in forming FrontierMath were unaware that OpenAI would have exclusive access to the benchmark’s results. This revelation caused discontent among contributors, with many suggesting that had they been aware of OpenAI’s influence and access, they may not have chosen to participate in the project.
In response to the growing discontent, Tamay Besiroglu, the associate director at Epoch AI and co-founder, acknowledged the organization’s missteps in mismanaging the communication of their relationship with OpenAI. “We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight, we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible,” he stated. Besiroglu affirmed that while OpenAI had access to the FrontierMath benchmark, there was a verbal agreement that they would not utilize its problem sets to train their AI, a concept likened to teaching to a test. Furthermore, Epoch AI maintains a separate dataset designed to safeguard independent verification of the FrontierMath results against potential bias or manipulation.
However, the matter remains complex. Ellot Glazer, a lead mathematician at Epoch AI, mentioned in a Reddit post the challenges of verifying OpenAI’s results independently. Though Glazer expressed confidence in OpenAI’s legitimacy, he mentioned, “We can’t vouch for them until our independent evaluation is complete.” This situation illustrates the broader dilemma faced by organizations striving to create empirical benchmarks for AI evaluation while drawing necessary funding without incurring the perception of conflict of interest.
As AI technology continues to evolve and impact a multitude of sectors, the importance of transparency, integrity, and objectivity in evaluating these systems cannot be overstated. Epoch AI’s experience serves as a critical reminder of the need for rigorous standards in benchmarking practices to ensure the trust of contributors and the community at large. Ensuring that relationships involving funding and influence are properly disclosed is essential to safeguard the credibility of evaluation efforts.