open-source-ai

Open Source AI in 2025

Kai Gray·
Open Source AI in 2025

You're reading this on a device that probably runs on open source software, even if you've never thought about it that way.

Mac? That's BSD Unix from the 1970s under the hood. Android phone? Linux. Your smart TV, router, car's entertainment system? Linux again. Most of the world's infrastructure—web servers, databases, development tools—runs on software that anyone can inspect, modify, and share.

We take this for granted now, but it wasn't always obvious that giving away valuable software would work. Microsoft's Steve Ballmer called Linux "a cancer" in 2001. He was spectacularly wrong. Open source became the foundation of modern computing.

We're watching the same debate play out with AI now. Should the most powerful AI models be controlled by a handful of tech giants, or should they be open for everyone to use, study, and improve? The gap between proprietary models and their open source cousins shrank from about 8% to less than 2% over the past year. But unlike early open source software, where "open" had a clear meaning, AI has sparked a fight over what "open" even means.

The Open Source Initiative spent two years defining "open source AI." They landed on three requirements: the model's architecture and code, the trained parameters, and detailed information about the training data.

That third requirement caused problems. Some argue you need the actual training data, not just a description. Others point out that's legally impossible for healthcare data or copyrighted books that companies are already being sued for using.

Most AI models everyone calls "open source" don't actually meet this definition. Meta's Llama models have been downloaded over 650 million times and are widely considered open source. But they fail the test. What we're really talking about are "open weights" models. You get the final trained model, which you can run and fine-tune. Companies save 40-60% on costs compared to proprietary APIs. But you don't get the training data or the full recipe. You can't reproduce it from scratch or audit it for bias. You're getting the finished cake without the recipe.

The models making the biggest impact tell an interesting story. Meta's Llama 3.1 with 405 billion parameters matches GPT-4 on several benchmarks. A model you can download and run yourself performing at OpenAI's level changes the economics completely. Meta isn't doing this from charity. They're trying to commoditize the model layer and compete on infrastructure where they're strong.

DeepSeek might be the most important story of 2025. This Chinese company claims they built a reasoning model matching OpenAI's performance for just $5.6 million in training costs instead of $100 million. These frontier models—the most advanced AI systems at the cutting edge of what's currently possible—typically cost hundreds of millions to train. The figure is disputed, but if even partially accurate, it proves you don't need infinite money to build cutting-edge AI. The model was downloaded over a million times within weeks.

About 89% of organizations using AI incorporate open source models somewhere, and 63% run them in production serving real customers. The reasons are straightforward: massive cost savings, keeping sensitive data on their own infrastructure, and the ability to customize models for specific domains. Most companies use both open and closed models strategically. Closed models for customer-facing chatbots where polish matters. Open models for internal tools and high-volume processing.

The safety debate gets uncomfortable. Open models are more vulnerable to jailbreaking and adversarial attacks. You can't patch an open model once it's released. Anyone can strip out safety features. But closed models aren't exactly safe either. GPT-4 gets jailbroken successfully 87.2% of the time in certain tests. Open models offer transparency. Security researchers can study them, find vulnerabilities, and develop defenses. More access means more potential for misuse but also more transparency and collective oversight.

The money situation is tricky. Training frontier AI models costs a fortune that keeps growing. Since 2020, closed-source AI companies raised $37.5 billion. Open-source alternatives got $14.9 billion. Only Meta has the resources to sustainably develop truly open frontier models as a strategic investment. Smaller developers face tough questions about funding development while giving models away.

The legal landscape adds complexity. The EU AI Act creates exemptions for open source AI, but most practical applications fall into categories that aren't actually exempt. In the US, there's no comprehensive federal framework yet. China has emerged as a major player, adding international complications.

You might wonder why this matters if you're not training AI models yourself.

It matters because AI is becoming infrastructure like electricity or the internet. If AI remains controlled by a handful of companies, those companies decide what applications are allowed, what content gets filtered, what data gets collected, what prices get charged.

Open source AI distributes that power. Researchers can study these systems independently. Small companies can build competitive products. Developing countries can access cutting-edge technology without dependence on Silicon Valley. Communities can create AI serving their specific needs and languages. But it also means less centralized safety controls and more potential for misuse.

Open source AI has moved from experimental curiosity to production infrastructure. Enterprises depend on it. Researchers rely on it. Communities worldwide build on it. The tension between openness and control, innovation and safety, access and concentration won't resolve cleanly. We're learning to live with that tension, building systems that balance competing values rather than choosing one side absolutely.

The open source AI movement in 2025 isn't about one paradigm defeating another. It's about expanding possibilities so no single approach monopolizes our most transformative technology. The choices we make in the next few years will determine whether that expansion serves broad human interests or narrow ones.

Share

Weekly Insights

AI innovation and education delivered to your inbox

Subscribe