Can AI Do Scientific Research? It sounds like a simple question that should be met with a resounding YES! By all accounts this is exactly what we assume (and hope) AI is best at doing. But, understanding how AI performs in scientific research is important so this week we’re taking a closer look.
Back in 2024, the Nobel Prize in Chemistry went to two researchers at Google DeepMind for building AlphaFold, an AI system that figured out the 3D shapes of nearly every known protein. That’s around 200 million structures, a problem biologists had been chipping away at for fifty years. AlphaFold cracked it, and more than two million researchers across 190 countries now use it.
That’s a real Nobel, for real science, done with AI. Not a toy demo, but the kind of thing that has the potential to reshape drug discovery pipelines, biotech companies, and eventually what treatments show up in your local hospital.
The natural follow-up question is whether AI can actually do scientific research more broadly, not just one heroic task.
The honest answer is yes, no, and it depends on what part of science you mean.
Researchers tend to sort AI’s role in the lab into three tiers, and walking through them is the cleanest way to see where things stand in 2026.
Tier 1, AI as a tool
This is the mature, working tier: narrow AI systems that speed up specific tasks inside an existing research workflow. AlphaFold lives here. So does PaperQA2 from a nonprofit called FutureHouse, which can search and summarize scientific papers more accurately than PhD-level researchers in head-to-head tests.
Math and computer science benefit the most. In late 2023, DeepMind’s FunSearch found new solutions to open math problems. In 2025, a follow-up called AlphaEvolve discovered matrix multiplication algorithms that improved on a result from 1969. It also wrote a better scheduling algorithm for Google’s data centers that quietly recovers 0.7% of Google’s worldwide compute, every single day. That sounds small until you do the math. Outside analysts estimate the value at somewhere between $50 million and a few hundred million dollars per year, roughly enough to cover the training cost of one flagship Gemini model. Around the same time, Gemini Deep Think scored gold at the 2025 International Math Olympiad, getting five out of six problems perfect.
The reason AI does so well in math and code is that the feedback is clean. A proof either works or it doesn’t. Code either runs or it crashes. The AI can try millions of options and the computer instantly tells it which ones are right. If you work with formulas, code, or data, that’s the part to pay attention to: anywhere reality gives fast, unambiguous feedback, AI tools are getting very good, very quickly. Wet‑lab biology doesn’t work that way.
Tier 2, AI as a collaborator
This is where things get interesting. Here, AI works alongside scientists, generating hypotheses, suggesting experiments, and helping interpret results, while humans still call the shots.
A few recent examples stand out. In October 2025, a Google AI model called C2S-Scale worked with researchers at Yale and predicted that combining a drug called silmitasertib with interferon would help immune cells better spot cancer. The hypothesis was new. The Yale team tested it in human cells and it worked, boosting cancer cell visibility to the immune system by about 50%.
Then in May 2026, DeepMind published its AI Co-Scientist in Nature. Built on Gemini, it generates and critiques research ideas using a tournament-style debate among multiple AI agents. It has already proposed drug candidates for acute myeloid leukemia that researchers validated in the lab, and identified new targets for liver fibrosis where an existing FDA-approved drug knocked down disease markers by 91% in human organ tissue models.
Over at FutureHouse, a system called Robin recently suggested repurposing a glaucoma drug for dry macular degeneration. Researchers tested it in patient cells. It worked.
These aren’t small wins. These are AI systems proposing things humans hadn’t tried, and those things turning out to be right. The collaborator tier is no longer hypothetical.
The catch is that hypothesis generation is only part of science. AI can suggest a thousand interesting ideas, but separating real insight from clever recombination of existing knowledge is still hard. A 2025 Stanford study found that AI-generated hypotheses tend to oversell their own importance and underestimate how hard the experiments will actually be. In practice, that looks like grand claims on thin evidence and very optimistic timelines—familiar to anyone who’s read enough startup decks. Humans still have to do the judgment work.
Tier 2 really works, but only in labs that treat AI as a brainstorming partner with a good memory and fast hands, not as an oracle.
Tier 3, AI as an autonomous scientist
This is the ambitious tier: AI systems that run the whole research cycle from start to finish, with little or no human input. And this is where the gap between hype and reality gets widest.
Sakana AI built a system called the AI Scientist that writes its own research papers. One of them passed peer review at a machine learning workshop in 2025. That sounds impressive until you look closer. An independent evaluation found that 42% of its experiments failed due to coding errors, it cited a median of only five papers per write-up (most of them outdated), and several papers contained made-up numerical results. One reviewer called the work “a rushed undergraduate paper.”
The harder cautionary tale is Berkeley’s A-Lab, an autonomous chemistry system that made headlines in 2023 for synthesizing 41 supposedly new materials in 17 days. Chemists from Princeton and University College London later looked at the data and found that the materials weren’t new at all. They already existed in standard chemistry databases. In January 2026, Nature formally corrected the paper.
Meanwhile, a 2026 audit identified roughly 147,000 fake AI-generated citations across major science archives just in 2025. The pace of AI-generated “science” is starting to outrun the systems we use to check it.
The pattern in all three stories is similar. AI can crank out scientific‑looking artifacts at scale, but our usual filters—peer review, domain expertise, citation checks—weren’t designed for that volume or that style of error. The risks aren’t just theoretical; they’re showing up as retractions, bogus references, and wasted lab time.
The big disagreement
Even the experts don’t agree on where this is headed. Demis Hassabis, who runs DeepMind, has said AI could “cure all diseases” within a decade—a statement that functions more as an ambition than a forecast. Yoshua Bengio, one of the founding figures of modern AI, is building a system he calls Scientist AI that helps researchers without taking actions on its own, because he worries about safety.
On the more skeptical side, Yann LeCun left Meta in late 2025 and raised over a billion dollars to build a different kind of AI based on world models, not language models. His argument is that today’s AI doesn’t really understand physical reality, so it can’t do science that depends on physical reality. Cognitive scientist Gary Marcus agrees, pointing out that AI systems still hallucinate at high rates on basic verifiable questions.
The split isn’t about whether AI is useful for science. Everyone agrees it is. The split is about whether today’s systems are on the path to genuine scientific judgment, or whether they’re powerful tools that still need new architectures and stronger checks. Put less technically: should we treat these systems like very fast graduate students who sometimes make things up, or like industrial equipment that needs guardrails, audits, and clear operating procedures?
Where this leaves us
If you zoom out, the picture is pretty clear. AI is already doing real science in places where the answer can be checked quickly. It’s an emerging research partner in areas where humans can still validate its ideas. And it’s mostly an aspiration in domains where the messy physical world has the final say.
Over the next five years, the most likely outcome is AI as a standard collaborator. Not a replacement, not a fraud, but a serious member of the research team. The math, code, structural biology, and computational chemistry folks already work this way. The wet‑lab folks are catching up as robots, assays, and compute get cheaper and more tightly integrated.
For the rest of us, the practical filter is straightforward. When you read that AI “discovered” something new, ask which tier the work belongs to and whether anyone independent has validated the result in the real world. The wins are real—you can see them in proteins, proofs, and pipelines. The corrections are real too, and they’re a useful reminder that good science still needs skepticism, replication, and time.