p.enthalabs

The unbearable cheapness of open weight models

jamesoclaire.com · Read Story HN original

Comments

It would not be surprising if GPT and Claude get cheaper too as inference gets cheaper. Two years ago, o1 was the strongest model and cost much more than Fable, while being nowhere near as smart as a Qwen 3.6 35B that you can now run on a DGX Spark without much trouble.
True, outside of the dark tactics I imagined in the article, they will have to compete at lower costs. It's just that the current iteration does not feel cost competitive yet.
Probably they will, unless Claude and GPT become luxury brands like Gucci. Currently it makes no sense for them to invest into efficiency. They need to put everything into competing for the top spot as long as they still have a shot.
> It would not be surprising if GPT and Claude get cheaper too as inference gets cheaper

No because the biggest factor in their current price is VC subsidization which has likely peaked if OpenAI is now serving ads and Anthropic has increased their API pricing

This is what concerns me about how AI giants are planning to make money. Their product has already been commoditized at prices which for them are still subsidized to grab market share. Unless the giants invent a technological leap, their prices are going to be dragged down by open weight models and I don't see how they'll turn a profit.
Reach AGI to leapfrog whoever is behind. Burn everything to get there faster.
'Reach AGI', the same way SpaceX will put data centers in orbit. A pipe dream.
I'm currently writing a blog post about data centres in orbit, and my current conclusion is that even though they can build one, they definitely can't put 1 million up there and would have better things to do if they could.

AGI? Too loosely defined. They lack a lot of competences which humans recognise when we see them but find it hard to put into words; on the other hand what they can do they already do faster than any human (and have greater breadth than any single human, but this usually doesn't matter because "coder" and "economist" and "translator" gets solved in human teams by hiring three people).

I do not think current ML has the tools to solve for quality. But we know it's possible for a really mediocre intelligence to make human level intelligence, because evolution made us, so for me the question of AGI is more a practical one: is it affordable?

(I also think not at the present time, but that's an "I think" not "I am analyzing it carefully").

Maybe you missed the part where starlink / orbiting datacenters don't really have to even make money as long as they partially fund rocket launch tests.

Or maybe you don't take Elon seriously when he talks about Mars.

> Maybe you missed the part where starlink / orbiting datacenters don't really have to even make money as long as they partially fund rocket launch tests.

I am only dismissing the orbital data centres, I do see a future for Starlink. One with competition, but a future nonetheless.

I'm old enough to remember the dot.com bubble and "we lose money on each unit and make up for it in scale":

If they don't make sense, they don't help. Putting a single one in space, or even a handful, is physically possible! But even optimistic Alphabet researchers (and Alphabet owns more of SpaceX than the entire IPO) say this only makes sense at $200/kg, while early Starship launch costs while they sort out reusability be at best $400/kg and the researchers don't expect $200/kg until the mid-2030s even with a high launch rate:

  If the learning rate is sustained—which would require∼180 Starship launches/year—launch prices could fall to <$200/kg by∼2035
- section 2.4, https://arxiv.org/abs/2511.19468

At $200/kg, and using the payload estimates elsewhere in the paper (the learning rate is based on mass rather than launch count), they'd need to launch 370,000 tons (4.4 ibid); even at the "good enough" cost, $200/kg, they'd need to spend $200/kg * 3.7e8 kg = $7.4e10. That's a hell of an R&D spend for the next 10 years of a company whose lifetime revenue (not profit) is reportedly $4.6e10.

My current draft has a few thousand words of additional problems, plus a bunch of things which I mention only to say why they are not, and some more where I say the research has yet to be done.

> Or maybe you don't take Elon seriously when he talks about Mars.

Used to, not any more. Has been too slow with Starship even before the fact that iteration with hardware is necessarily slowed down by a 2-year gap between launch windows.

There's not even been any news about demonstration models of either Mars-rated or Starship-rated Sabatier processors, which would be an easy win and also win points for both environmentalism and energy independence viz. Iran/Hormuz.

Ok so you're ignoring the entire thing. Sigh.
On the contrary: I've paid a lot of attention, causing me to look at it closely and determine it is a terrible idea worthy of an illustrated 5,000 word blog post explaining exactly how terrible.

If you build the DC satellites as currently specified, you're strictly better off not launching them. That's how bad the idea is.

> will put data centers in orbit. A pipe dream.

Cheap access to space was once a pipe dream.

Reusable boosters were once a pipe dream.

A new player beating Boeing to the ISS was once a pipe dream.

LEO constellations were once a pipe dream.

Launching thousands of satellites was once a pipe dream.

You should know that a) they are already running "AI" chips on their current sats. and b) they are already producing kW of power on orbit and have ~10k sats on orbit. You can watch Scott Manley's video on it, where he does some rough calculations and explains the overall architecture. There is nothing stopping them to do this, from an engineering perspective. If it makes commercial sense, that's another question, but 5-10-20 years in the future things might change there as well.

I don't think people's argument is that it's impossible to put data centers into space. The argument is that the downsides (radiation, cooling, maintenance, power) are so severe that it is pointless to do it at scale.
Go back to the megathreads when this came up. Even here on HN. Plenty of people used the argument that it can't be done, for various reasons.

And my point was that at one point or the other there were many "downsides" for all the tech that SpaceX already has. Reusable boosters were seen as "uneconomical" and "pointless unless they can fly 10 times" by industry experts. They're now flying 30+times a booster.

LEO constellations were similarly "full of downsides" plus "all the companies that tried it went bankrupt in the 90s", so "it's pointless". And so on.

Reusable boosters have clear upsides, though.

Pretty much everything about data centers in space is worse than having them on Earth. Apart from niche use cases, the only reason you'd talk about data centers in space is if you had a company with rocket ships and needed a story to tie your rocket ships to the current AI craze.

And you had a lot of stock to sell to bagholders.
Yet spacex is losing money … only StarLink is profitable.
These guys aren't aware of all the "impossible" problems Elon already solved. They're too invested in the propaganda about him being a big dumb idiot who accidentally fell backwards into a pile of 1 trillion dollars.
If just Elon was taking about data centers in space, you could take it with a grain of salt. But there are other serious players talking about it like Google and blue origin that it should be pretty clear it can't just be dismissed with "you didn't think about cooling!"
Yeah, and there's already been tech demonstrators for this. Starcloud-1 launched in '25 (on a F9) and demoed a CotS H100 in a ~60kg bus w/ 1kW of power. They ran inference on a "gemini" model (probably something small) and trained a GPT2 version LLM as a tech demonstrator.
Google also wanted to deliver internet from balloons and put everyone's real name on their YouTube comments. Not all their ideas are winners.
Microsoft tried to put datacenters into ocean [1] and then shelved the idea, because even that you have lower amount of failures, you still have failures and somebody has to go there and fix them. Which turns out to be problem.

And in ocean you don't have to solve for radiation nor cooling.

[1] https://www.tomshardware.com/desktops/servers/microsoft-shel...

> You can watch Scott Manley's video on it, where he does some rough calculations and explains the overall architecture.

I'm currently writing a blog post, and there's one big thing everyone, including Scott Manley, missed.

Once I realised it, I wondered what took me so long to spot this issue.

care to share the one glaring obstacle ?

slightly related .. I saw a talk on DCs in space, and it said median Earth orbit had a latency of 500ms .. but back of envelope seems to be : 15,000km above Earth would have around 100ms latency, comparable to internet ping times.

Not an expert, feel free to weigh in.

> care to share the one glaring obstacle ?

I'm still working on the blog, but as a quickie: it's the lesson of the Datasaurus dozen, that sometimes you need to look at the actual distribution rather than statistics.

Here's what the safety exclusion zone around a million of them in orbit looks like, if arranged something like the current plan: https://raw.githubusercontent.com/BenWheatley/blog/refs/head...

There's no (safe) gaps. Plenty of physical space, but the safety margin eats it all up. Nothing else is allowed to use those orbital shells or anything between them.

Also, this is what happens if you put them all in a single orbit at the same altitude:

https://raw.githubusercontent.com/BenWheatley/blog/refs/head...

> slightly related .. I saw a talk on DCs in space, and it said median Earth orbit had a latency of 500ms .. but back of envelope seems to be : 15,000km above Earth would have around 100ms latency, comparable to internet ping times.

500ms means ~150,000 km travel distance; for that distance as round-trip time from origin to destination and back again means the one-way distance is 75,000 km, so if it's via a single satellite bounce then the average distance to the satellite would be 37,500 km: [You]-37.5Mm-[Satellite]-37.5Mm-[Them]-37.5Mm-[Satellite]-37.5Mm-[You].

I think they must be assuming all comms are via geostationary satellites. In some talks, this is what the speaker actually meant, though they may not have been clear about it; other times, there's talks from people who copied the former but perhaps didn't understand.

For DCs in space, even in GEO, it would be half the distance because you're communicating with the satellite itself not with someone else somewhere else on the ground.

My gut says another obstacle is maintenance. How long can a datacenter on the ground run without maintenance? How will this be affordable in orbit?
People already talk about that, so I wouldn't be adding much new. That said, had already put in a bit about cost of launching.

TL;DR: Alphabet researchers (and Alphabet owns more of SpaceX than the entire IPO so if anything they're biased to optimism), recon it will take SpaceX launching about 370,000 tons to orbit before they've even figured out how to get the costs down to the point it makes sense to put these in orbit.

Sure, you can do it. I bet humans could fly to Mars if we invested a massive amount of resources into it. Why, though? That’s the “easy” if you throw sufficient amounts of money at problems over sufficient periods of time you can solve them.

If you don’t care about making any more from it. How exactly would datacenters in space would be more profitable than those on earth?

I think it's such a vague term. If you showed someone in 2010 what we have now they would say it's science fiction.
If Anthropic announced AGI tomorrow, how much better would that model be than Fable 5? It's looking like the road to AGI is gradual and moat-less. Models seem capable of improving other models, and even without illegal distillations many are nipping at the heels of Anthropic.
Yeah, I think we're learning that we overestimated the relevance of recursive self-improvement in a singularity/intelligence takeoff scenario. We thought that once an AI could start improving itself, it would cause an exponential, self-reinforcing intelligence explosion.

Turns out that scaling up compute is much more important and also limits the upper end of intelligence.

The bigger mistake is assuming it would be better at everything all at once.

Suppose it can do 80% of what the 20th percentile human can do. That's a huge advance and very useful, but it means there are still things it's not very good at. If any of those things is (or becomes) a bottleneck, you're not getting the hockey stick graph.

Why would the creator of AGI sell it to anyone, when they could keep it to themselves and corner dozens of markets?
What is an "illegal" distillation? Terms of service are not laws, and clearly copyright laws are no barriers to developing AI models.
If AGI = Data from Star Trek, it would be a huge leap. Frankly, anything less I wouldn’t consider as AGI.
If you used a time machine to go back to 2021 and showed someone the best open source LLMs from 2026 they would surely say “yeah that’s AGI”
With cache hit rates being effectively free, harnesses like Reasonix have let me do a month of work for less than 2 dollars. It's not even the subsidies making it cheap, American providers like Digital Ocean or Cloudflare host the same model with similar pricing.
I think this is very likely and something that everyone seems to be missing when valuing these AI firms. AI is not the new industrial revolution, it's the new cloud VM: a very useful commodity software offering.
The parallels to the Industrial Revolution are so close that we even have a new generation of Luddites. (Not saying they don’t have some valid points; so did the original group.)

The reason it’s like the Industrial Revolution is simply that there’s no question it’s going to completely transform jobs. It can make a very similar difference to the difference between a craftsman and a factory worker. The latter is massively more productive.

Cloudflare's Deepseek V4 Pro prices are 4x more than Deepseek's for input and output tokens, and 100x more for cached input tokens, which is crucial for the tool uses of agents which cause multi-turn conversations.
Cache hit is less than a cent with Deepseek Flash and 3 cents with Cloudflare, it's free vs almost free. Where are you finding the statistics on Deepseek Pro? I don't see Cloudflare as a provider on openrouter for Pro, only flash.
How does caching help here? How much repetition is there in queries?
It probably depends on what you're doing, but imagine you're something in the shape of a search engine. How many user queries are unique vs. the same thing someone else searched for an hour ago?
Agent loops (particularly coding agents) have a huge amount of repetition, because the entire context is included in every model request. So long as it's at the start of the input and doesn't change, it will be able to hit the KV cache (assuming the model provider actually has the prefix in cache).

This only works because prompt caching is done by matching prefixes, not the entire input.

In a typical agent loop your N-th LLM request naturally becomes prefix for the (N+1)-th request. As the thread grows longer, cache hit rate converges to 100% and unit pricing for cached tokens is 10-100x cheaper.
The giants knew this was coming, and soon 95% of AI tasks will be able to be done by open models (coding, research, cowork style work). So why pay a premium? Why use them at all? This leaves the labs with two options:

1) push the frontier in a way only massive scale can, and cash in on it (mythos level cyber security, recursive training, frontier science work). There’s big money for never before possible capabilities.

2) own the app layer with their edge in reputation and powered by their infrastructure. Be apple where everyone else is Linux. Do design, coding, research, SMBs, legal, finance, healthcare and more (they are doing all of this).

Will it be enough to justify a Google level valuation? We’ll see how fast they can push it.

Won’t all they need to do is say “best in class, latest models, fastest” and wine and dine a few execs and those enterprise deals will be signed?

In this case the people tasked with using the product won’t actually mind.

Yes, exactly that. Be Azure and Office 365 and Sharepoint and AWS where everyone else is Debian Stable on a USB thumbdrive.
Office 365? Ew, Google docs, please.
No one is getting fired for using SotA.
If the price difference is 2x? Sure.

If the price difference is 50x? No way.

So long as the benefit:cost ratio is still sufficiently high, I don't think anyone gets fired for not scrimping. Better to encourage positive EV behaviour by your employees than to scare them away by firing them for not being perfectly optimal.
The CEO won't get in trouble, but the employee who can't justify a bad result/prompt?
Tell that to Oracle
Accenture says "yeah totally CEOs will pay a lot for literal nothing"
Laughs in 2005-era VMWare and EMC...
Well, getting laid off during the bankruptcy spiral is a form of firing.

But that is months away, so not my problem?

> own the app layer with their edge in reputation and powered by their infrastructure. Be apple where everyone else is Linux. Do design, coding, research, SMBs, legal, finance, healthcare and more (they are doing all of this).

The problem with this is that there are incumbents in all those spaces doing their own AI agents / platforms, and they're the ones choosing the models they use internally and they sell to their own customers. The margins and the possibility to fine tunie using open weight models, as well as the guarantee they'll keep running at predictable costs (no US orders yanking access), make them a very appealing option.

And if you're a company that needs an AI powered legal software, would you buy it from OpenAI/Anthropic, or from someone who you've already bought legal software from before and has the domain knowledge?

3) Buy all the RAM, increasing the barrier to entry to push back the tide a bit, in time for a juicy IPO.
4) Make it illegal to use anything but regulated models.
a: If making it illegal fails, make it a Federal procurement requirement to use regulated models. Come up with an audit standard that only fits regulated models. Watch the preference trickle down.
License the training corpus and encourage copyright suits against outputs from models trained on unlicensed corpora.
This won't work if the courts decide that training is fair use, which certainly seems the direction they are going.
Output is a separate issue from training. Courts will never decide that a identical copy spit out by an LLM is non-infringing simply because it went through an LLM stage. Copyright laundering is wishful thinking by tech folks.
I like to think of llms as seamless plagiarism machines.
pretty much what altman and amodei mean when they say 'safety'.
Then they will leave the huge advantage in cost to the competition, I mean their customers competitors. Hard to fathom how US companies will not want to use the cheaper option when EU and Asian companies can.
Why illegal, just pass these 3000 pages FAA-level certification, export controls and KYC. We're free country, after all!
Buying all the RAM can't work forever. Scarcity increases prices, high prices increase supply, improves RAM R&D budgets, and forces users to find ways to economize around low RAM availability.
It doesn't need to work forever. You just need to delay your competitors long enough that you can IPO to great fanfare, and then leave retail investors holding the bag. Founders and big investors get to cash out, everyone else gets screwed.
I doubt that works today. Look at SpaceX the fanfare lasted 3 days before most of the insiders could offload to the retail bag holders. That AI company had the benefit of being attached to the largest technical moat.

The existing AI companies can't even prevent their moat from being distilled by the Chinese token reselling industry.

This is what it feels like they went with.
Google already owns the app layer, and hardware, and they are a frontier-level AI research firm.

I don't see how Anthropic or OpenAI survives being eaten by DeepSeek et al from the bottom of the stack and Google from the top.

The only reason people use google apps is because they are cheap and reliable. The user experience is awful. Have you ever tried to find a document you had open yesterday in drive?
Uh? Recently and frequently opened documents always show up on the first screen as soon as I open the app or website.
I used their enterprise chat the other week coz one of the clients used it

It is truly amazing how bad it is. Made me miss using MS Teams. No software should make anyone miss using MS Teams

Anthropic is at least renting their datacenters, not owning, so all the capital accounting bullshit is getting laundered by someone else, who will wind up holding that bag.

And Anthropic is currently cornering the enterprise coding market, and they were smart to avoid video. Under current economic conditions they're a lot closer to being profitable than anyone else, and they can take advantage of crashing prices for compute if we hit a datacenter-buildout-glut.

Mythos was outperformed by small, specific local models in multiple oss project.
i'd love to hear about this! do you have examples?
I see "LLM discovers vulnerability in curl" and I get skeptical, given how Daniel Stenberg has talked about the flood of claimed vulnerabilities that weren't real issues once he looked into them (as most HN readers already know, I'm sure). But it looks like these 6 were real issues, that curl patched once they received the reports. Five ended up rated low and one medium, but given the amount of attention curl gets, I'd honestly be surprised if there were any high-severity issues; in fact, having even one medium-severity issue remaining is slightly surprising to me.
To be fair (and for people who didn't click the link), i think most of the vuln were in libcurl, not in curl itself.
It might be kind of overlooked when people read about the big scary results from mythos; the real breakthrough was probably just as much the application of the (very decent) model through a well engineered wrapper (harness). Other models including codex or glm result in significant findings as well.

Harness example: https://github.com/evilsocket/audit

> Be apple where everyone else is Linux.

Apple and Linux barely even compete in the same markets. Linux runs on the servers and embedded devices, Apple on the smartphones. Android is technically Linux but not in the "is a good analogy for open weight models" sense because Android is so deeply under the thumb of Google. The main place Linux and Apple actually compete is for PCs and laptops, and that's the market where the thing with 65% market share is Microsoft.

Apple tried to make servers(they were awesome btw) but lost to Linux.

Linux are on more phones than iOS.

youre missing the point entirely and opted to entertain your own framework
It's meaningless to suggest doing what Apple does when faced with Linux when the vast majority of Apple's business isn't competing with Linux. The majority of Apple's revenue is from hardware when Linux is software -- that can run on Apple's hardware.