Within the three years since ChatGPT’s explosive debut, OpenAI’s know-how has upended a outstanding vary of on a regular basis actions at residence, at work, in colleges—anyplace folks have a browser open or a telephone out, which is in every single place.
Now OpenAI is making an express play for scientists. In October, the agency introduced that it had launched an entire new staff, referred to as OpenAI for Science, devoted to exploring how its giant language fashions might assist scientists and tweaking its instruments to help them.
The final couple of months have seen a slew of social media posts and academic publications during which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI’s GPT-5 particularly) have helped them make a discovery or nudged them towards an answer they may in any other case have missed. Partially, OpenAI for Science was set as much as have interaction with this group.
And but OpenAI can be late to the celebration. Google DeepMind, the rival agency behind groundbreaking scientific fashions similar to AlphaFold and AlphaEvolve, has had an AI-for-science staff for years. (After I spoke to Google DeepMind’s CEO and cofounder Demis Hassabis in 2023 about that staff, he instructed me: “That is the explanation I began DeepMind … In reality, it’s why I’ve labored my entire profession in AI.”)
So why now? How does a push into science match with OpenAI’s wider mission? And what precisely is the agency hoping to realize?
I put these inquiries to Kevin Weil, a vp at OpenAI who leads the brand new OpenAI for Science staff, in an unique interview final week.
On mission
Weil is a product man. He joined OpenAI a few years in the past as chief product officer after being head of product at Twitter and Instagram. However he began out as a scientist. He bought two-thirds of the way in which by means of a PhD in particle physics at Stanford College earlier than ditching academia for the Silicon Valley dream. Weil is eager to spotlight his pedigree: “I believed I used to be going to be a physics professor for the remainder of my life,” he says. “I nonetheless learn math books on trip.”
Requested how OpenAI for Science matches with the agency’s present lineup of white-collar productiveness instruments or the viral video app Sora, Weil recites the corporate mantra: “The mission of OpenAI is to attempt to construct synthetic normal intelligence and, you realize, make it helpful for all of humanity.”
The influence on science of future variations of this know-how may very well be superb, he says: New medicines, new supplies, new units. “Give it some thought serving to us perceive the character of actuality, serving to us suppose by means of open issues. Perhaps the most important, most constructive influence we’re going to see from AGI will really be from its capacity to speed up science.”
He provides, “With GPT-5, we noticed that changing into potential.”
As Weil tells it, LLMs are actually ok to be helpful scientific collaborators, spitballing concepts, suggesting novel instructions to discover, and discovering fruitful parallels between a scientist’s query and obscure analysis papers revealed many years in the past or in international languages.
That wasn’t the case a yr or so in the past. Because it introduced its first reasoning mannequin, o1, in December 2024, OpenAI has been pushing the envelope of what the know-how can do. “You return just a few years and we had been all collectively mind-blown that the fashions might get an 800 on the SAT,” says Weil.
However quickly LLMs had been acing math competitions and fixing graduate-level physics issues. Final yr, OpenAI and Google DeepMind each introduced that their LLMs had achieved gold-medal-level efficiency within the Worldwide Math Olympiad, one of many hardest math contests on this planet. “These fashions are now not simply higher than 90% of grad college students,” says Weil. “They’re actually on the frontier of human talents.”
That’s an enormous declare, and it comes with caveats. Nonetheless, there’s little doubt that GPT-5 is a giant enchancment on GPT-4 in relation to difficult problem-solving. GPT-5 features a so-called reasoning mannequin, a sort of LLM that may break down issues into a number of steps and work by means of them one after the other. This method has made LLMs much better at fixing math and logic issues than they was.
Measured towards an trade benchmark generally known as GPQA, which incorporates greater than 400 multiple-choice questions that check PhD-level data in biology, physics, and chemistry, GPT-4 scores 39%, nicely beneath the human-expert baseline of round 70%. In accordance with OpenAI, GPT-5.2 (the most recent replace to the mannequin, launched in December) scores 92%.
Overhyped
The thrill is obvious—and maybe extreme. In October, senior figures at OpenAI, together with Weil, boasted on X that GPT-5 had discovered options to a number of unsolved math issues. Mathematicians had been fast to level out that in actual fact what GPT-5 appeared to have carried out was dig up present options in previous analysis papers, together with no less than one written in German. That was nonetheless helpful, but it surely wasn’t the achievement OpenAI appeared to have claimed. Weil and his colleagues deleted their posts.
Now Weil is extra cautious. It’s usually sufficient to search out solutions that exist however have been forgotten, he says: “We collectively stand on the shoulders of giants, and if LLMs can form of accumulate that data in order that we don’t spend time struggling on an issue that’s already solved, that’s an acceleration all of its personal.”
He performs down the concept that LLMs are about to give you a game-changing new discovery. “I don’t suppose fashions are there but,” he says. “Perhaps they’ll get there. I’m optimistic that they may.”
However, he insists, that’s not the mission: “Our mission is to speed up science. And I don’t suppose the bar for the acceleration of science is, like, Einstein-level reimagining of a whole subject.”
For Weil, the query is that this: “Does science really occur quicker as a result of scientists plus fashions can do rather more, and do it extra shortly, than scientists alone? I believe we’re already seeing that.”
In November, OpenAI revealed a sequence of anecdotal case research contributed by scientists, each inside and outdoors the corporate, that illustrated how they had used GPT-5 and how it had helped. “A lot of the circumstances had been scientists that had been already utilizing GPT-5 instantly of their analysis and had come to us a method or one other saying, ‘Have a look at what I’m in a position to do with these instruments,’” says Weil.
The important thing issues that GPT-5 appears to be good at are discovering references and connections to present work that scientists weren’t conscious of, which typically sparks new concepts; serving to scientists sketch mathematical proofs; and suggesting methods for scientists to check hypotheses within the lab.
“GPT 5.2 has learn considerably each paper written within the final 30 years,” says Weil. “And it understands not simply the sector {that a} explicit scientist is working in; it may possibly convey collectively analogies from different, unrelated fields.”
“That’s extremely highly effective,” he continues. “You may at all times discover a human collaborator in an adjoining subject, but it surely’s troublesome to search out, you realize, a thousand collaborators in all thousand adjoining fields that may matter. And along with that, I can work with the mannequin late at evening—it doesn’t sleep—and I can ask it 10 issues in parallel, which is form of awkward to do to a human.”
Fixing issues
A lot of the scientists OpenAI reached out to again up Weil’s place.
Robert Scherrer, a professor of physics and astronomy at Vanderbilt College, solely performed round with ChatGPT for enjoyable (“I used to it rewrite the theme tune for Gilligan’s Island within the fashion of Beowulf, which it did very nicely,” he tells me) till his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, instructed him that GPT-5 had helped remedy an issue he’d been engaged on.
Lupsasca gave Scherrer entry to GPT-5 Professional, OpenAI’s $200-a-month premium subscription. “It managed to resolve an issue that I and my graduate pupil couldn’t remedy regardless of engaged on it for a number of months,” says Scherrer.
It’s not excellent, he says: “GTP-5 nonetheless makes dumb errors. In fact, I do too, however the errors GPT-5 makes are even dumber.” And but it retains getting higher, he says: “If present traits proceed—and that’s a giant if—I believe that each one scientists shall be utilizing LLMs quickly.”
Derya Unutmaz, a professor of biology on the Jackson Laboratory, a nonprofit analysis institute, makes use of GPT-5 to brainstorm concepts, summarize papers, and plan experiments in his work learning the immune system. Within the case research he shared with OpenAI, Unutmaz used GPT-5 to investigate an previous knowledge set that his staff had beforehand checked out. The mannequin got here up with contemporary insights and interpretations.
“LLMs are already important for scientists,” he says. “When you possibly can full evaluation of knowledge units that used to take months, not utilizing them isn’t an possibility anymore.”
Nikita Zhivotovskiy, a statistician on the College of California, Berkeley, says he has been utilizing LLMs in his analysis because the first model of ChatGPT got here out.
Like Scherrer, he finds LLMs most helpful once they spotlight surprising connections between his personal work and present outcomes he didn’t find out about. “I imagine that LLMs have gotten an important technical instrument for scientists, very like computer systems and the web did earlier than,” he says. “I count on a long-term drawback for individuals who don’t use them.”
However he doesn’t count on LLMs to make novel discoveries anytime quickly. “I’ve seen only a few genuinely contemporary concepts or arguments that will be value a publication on their very own,” he says. “Thus far, they appear to primarily mix present outcomes, typically incorrectly, slightly than produce genuinely new approaches.”
I additionally contacted a handful of scientists who are usually not related to OpenAI.
Andy Cooper, a professor of chemistry on the College of Liverpool and director of the Leverhulme Analysis Centre for Purposeful Supplies Design, is much less enthusiastic. “We have now not discovered, but, that LLMs are basically altering the way in which that science is completed,” he says. “However our latest outcomes counsel that they do have a spot.”
Cooper is main a challenge to develop a so-called AI scientist that may fully automate parts of the scientific workflow. He says that his staff doesn’t use LLMs to give you concepts. However the tech is beginning to show helpful as a part of a wider automated system the place an LLM might help direct robots, for instance.
“My guess is that LLMs would possibly stick extra in robotic workflows, no less than initially, as a result of I’m unsure that persons are able to be instructed what to do by an LLM,” says Cooper. “I’m actually not.”
Making errors
LLMs could also be changing into an increasing number of helpful, however warning continues to be suggested. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that made its way into a scientific journal. “OpenAI management are selling a paper in Physics Letters B the place GPT-5 proposed the primary concept—presumably the primary peer-reviewed paper the place an LLM generated the core contribution,” Oppenheim posted on X. “One small drawback: GPT-5’s concept checks the incorrect factor.”
He continued: “GPT-5 was requested for a check that detects nonlinear theories. It supplied a check that detects nonlocal ones. Associated-sounding, however totally different. It’s like asking for a COVID check, and the LLM cheerfully fingers you a check for chickenpox.”
It’s clear that numerous scientists are discovering revolutionary and intuitive methods to interact with LLMs. It’s also clear that the know-how makes errors that may be so delicate even specialists miss them.
A part of the issue is the way in which ChatGPT can flatter you into letting down your guard. As Oppenheim put it: “A core challenge is that LLMs are being skilled to validate the consumer, whereas science wants instruments that problem us.” In an excessive case, one particular person (who was not a scientist) was persuaded by ChatGPT into pondering for months that he’d invented a new branch of mathematics.
In fact, Weil is nicely conscious of the issue of hallucination. However he insists that newer fashions are hallucinating much less and fewer. Even so, specializing in hallucination is likely to be lacking the purpose, he says.
“Considered one of my teammates right here, an ex math professor, stated one thing that caught with me,” says Weil. “He stated: ‘After I’m doing analysis, if I’m bouncing concepts off a colleague, I’m incorrect 90% of the time and that’s form of the purpose. We’re each spitballing concepts and looking for one thing that works.’”
“That’s really a fascinating place to be,” says Weil. “When you say sufficient incorrect issues after which any individual stumbles on a grain of fact after which the opposite individual seizes on it and says, ‘Oh, yeah, that’s not fairly proper, however what if we—’ You steadily form of discover your path by means of the woods.”
That is Weil’s core imaginative and prescient for OpenAI for Science. GPT-5 is sweet, however it’s not an oracle. The worth of this know-how is in pointing folks in new instructions, not developing with definitive solutions, he says.
In reality, one of many issues OpenAI is now taking a look at is making GPT-5 dial down its confidence when it delivers a response. As a substitute of claiming Right here’s the reply, it’d inform scientists: Right here’s one thing to think about.
“That’s really one thing that we’re spending a bunch of time on,” says Weil. “Attempting to be sure that the mannequin has some kind of epistemological humility.”
One other factor OpenAI is taking a look at is the right way to use GPT-5 to fact-check GPT-5. It’s usually the case that in case you feed certainly one of GPT-5’s solutions again into the mannequin, it’s going to choose it aside and spotlight errors.
“You may form of hook the mannequin up as its personal critic,” says Weil. “Then you may get a workflow the place the mannequin is pondering after which it goes to a different mannequin, and if that mannequin finds issues that it might enhance, then it passes it again to the unique mannequin and says, ‘Hey, wait a minute—this half wasn’t proper, however this half was fascinating. Hold it.’ It’s virtually like a few brokers working collectively and also you solely see the output as soon as it passes the critic.”
What Weil is describing additionally sounds loads like what Google DeepMind did with AlphaEvolve, a instrument that wrapped the LLM Gemini inside a wider system that filtered out the nice responses from the dangerous and fed them again in once more to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems.
OpenAI faces stiff competitors from rival companies, whose personal LLMs can do most, if not all, of the issues it claims for its personal fashions. If that’s the case, why ought to scientists use GPT-5 as a substitute of Gemini or Anthropic’s Claude, households of fashions which can be themselves bettering yearly? Finally, OpenAI for Science could also be as a lot an effort to a flag in new territory as anything. The true improvements are nonetheless to come back.
“I believe 2026 shall be for science what 2025 was for software program engineering,” says Weil. “At the start of 2025, in case you had been utilizing AI to put in writing most of your code, you had been an early adopter. Whereas 12 months later, in case you’re not utilizing AI to put in writing most of your code, you’re most likely falling behind. We’re now seeing those self same early flashes for science as we did for code.”
He continues: “I believe that in a yr, in case you’re a scientist and also you’re not closely utilizing AI, you’ll be lacking a possibility to extend the standard and tempo of your pondering.”

