Previewing GPT‑5.6 Sol: a next-generation model

1,133 points · 108 visible top comments · 2026-06-26 17:06:55 UTC

openai.com · Read Story HN original

System card: https://deploymentsafety.openai.com/gpt-5-6-preview

Comments

ChrisArchitect · 2026-06-26 17:09:21 UTC

Pre-official discussions:

https://news.ycombinator.com/item?id=48678789

https://news.ycombinator.com/item?id=48683021

rvz · 2026-06-26 17:11:06 UTC

Other than the worst naming I have ever seen (Sol / Terra / Luna), the pricing is still expensive:

> GPT‑5.6 is priced per 1M tokens across three model sizes:

> Sol is $5 input / $30 output;

> Terra is $2.50 input / $15 output

> Luna is $1 input / $6 output.

The OpenAI casino has never been more ready to take your money on gambling even more tokens.

minimaxir · 2026-06-26 17:15:21 UTC

Note that GPT 5.5 currently is $5 input / $30 output (short context) so Sol is in the same class, while Terra if the benchmarks are as claimed is indeed a half-price GPT 5.5 at comparable performance.

andrethegiant · 2026-06-26 17:15:56 UTC

What don't you like about the naming?

lwansbrough · 2026-06-26 17:20:51 UTC

I feel like going with Space + Latin is LLM-level creativity.

Edit: yeah. https://claude.ai/share/06fefe02-4299-44da-8c5a-42607f54ca77

arikrahman · 2026-06-26 17:17:58 UTC

Can't buy cheaper as a selling point when Deepseek is basically free when hitting cache? Unsubsidized too, cloudflare and digital ocean can be the model provider for similar pricing.

Stitch4223 · 2026-06-26 17:18:41 UTC

With the $200/month plan I’ve never ran into any limits or issues. The product can be used every day for extensive sessions and development. What is everyone doing that makes them talk about tokens versus dollars?

minimaxir · 2026-06-26 17:20:14 UTC

If you've never hit the limits, why not do the $100/mo plan?

nsingh2 · 2026-06-26 17:33:01 UTC

From what my own experiences are, and what's on their checkout page, $100 is 5x base usage and $200 is 20x. If $100 was 10x, then I personally would drop down. They want people to go to the highest tier.

aeonik · 2026-06-26 21:08:35 UTC

You can hit limits with $100 if you use it all day.

You can do it easily if you use in fast mode.

I bet you could hit the limits of the $200/month using fast mode if you were using multiple sessions at the same time all day on fast mode.

The OpenAI tiers seem pretty well tuned.

I used to use the plus ($20/month), and that was good for a few sessions every once in a while.

But now that I'm using it to configure my network, monitoring, maintenance, I'm using it every day and I'm on the $100 plan. And I do pretty consistently hit the limits, but it's easy to pace myself.

I'mam thinking about upgrading to $200/month though. It would be nice not to have to ration it.

ai_slop_hater · 2026-06-26 17:26:22 UTC

I ran out of usage using GPT-5.5 and had to buy a second subscription. I now switched to GPT-5.4 which is basically 2x usage.

fph · 2026-06-26 18:51:22 UTC

But let's put it in perspective: what you're paying them is more than the average salary in many poorer countries.

Stitch4223 · 2026-06-26 20:53:43 UTC

Fair. From a business perspective said amount is very reasonable in Europe / USA. For personal use it’s already different. Sometimes the answer is simple, thanks.

kingstnap · 2026-06-26 21:25:25 UTC

Don't forget this.

> For GPT‑5.6 and later models, cache writes are billed at 1.25x the model’s uncached input rate

Charging for cache writes is cringe and literally only Anthropic did it. Anyway this does mean the "real" prices are +25% on top of what you wrote there.

loufe · 2026-06-26 17:13:07 UTC

"Next generation model"

If it was the next generation, why isn't it a major version change..?

ryangst_1 · 2026-06-26 17:17:14 UTC

LLM devs can't do version control

psychoslave · 2026-06-26 17:18:13 UTC

Semantic is passé, word models moved to the next generation.

dominotw · 2026-06-26 17:19:00 UTC

vibe versioning

cruffle_duffle · 2026-06-26 17:32:29 UTC

To be fair, versioning has always been vibes based.

appplication · 2026-06-26 17:19:59 UTC

Honestly LLMs are the ideal candidate for CalVer. It’s not like there’s any real API so there’s no backwards compatibility to maintain.

Even Apple adopted and standardized on it for their latest platform releases.

andy12_ · 2026-06-26 17:45:27 UTC

I think it makes more sense to make it so that major versions are different pretraining runs, and minor versions are simply the same pretraining run that was finetuned to different degrees. But it seems that that isn't cool anymore.

Kiro · 2026-06-26 20:13:03 UTC

LLM versioning is entirely feelings driven. The ideal versioning is probably just names.

kaizenite · 2026-06-26 17:23:51 UTC

Because if it sucks, they can just default to "It was a minor version change anyways"

goldenarm · 2026-06-26 17:51:30 UTC

They could hold the GPT-6 name for the IPO

GTP · 2026-06-26 17:53:59 UTC

Some assume it was to try to slip under the radar and avoid being limited by the government as they did with Fable.

therepanic · 2026-06-26 17:59:28 UTC

By all appearances, they did not succeed in doing so.

HarHarVeryFunny · 2026-06-26 18:07:45 UTC

AFAIK there is no difference between "generation" and "version". Version naming/numbering depends on how good it turns out to be, and competition. If the competition releases something then you need to push something out too.

Calling it 5.6 creates the least possible expectations, and therefore more potential for positive feedback.

The Sol/Terra/Luna naming is interesting. I wonder what Anthropic are considering for their next models? "Terminator", "Armageddon"?

wincy · 2026-06-26 18:26:42 UTC

You gotta check out the new ChatGPT 6.3 Betelgeuse bro

rolph · 2026-06-26 19:28:21 UTC

Heliopause

cyral · 2026-06-26 19:10:27 UTC

If they called it 6.0 and it wasn't AGI, you'd see a lot of complaining here too

tasuki · 2026-06-26 19:39:29 UTC

What is AGI? (I know what the shortcut expands to, I'm curious about your definition. Don't the current models fit?)

ChrisLTD · 2026-06-26 17:13:19 UTC

If it's a new generation why isn't it GPT-6?

win311fwg · 2026-06-26 17:20:29 UTC

It does not introduce incompatibilities with earlier 5.x models? Frontier models are at a point now that there will never be a need for another major version bump, aside from those chasing marketing gimmicks. They are smart enough to adapt.

ChrisLTD · 2026-06-26 17:26:05 UTC

What would it mean to be incompatible with the other 5.x models?

paxys · 2026-06-26 17:31:53 UTC

New request/response schema, new capabilities, or really anything that would break your existing workflows if you changed “5.5” to “5.6” in your application.

There have been many leaps forward in the past - tool calling, reasoning, agentic loops etc. 5.6 doesn’t have any of this. More intelligence doesn’t necessarily warrant a major version bump.

jurgenburgen · 2026-06-26 17:32:56 UTC

Only speaks Klingon

peab · 2026-06-26 17:27:36 UTC

not true. multimodality is still far from being solved

malnourish · 2026-06-26 17:27:46 UTC

A major bump will be warranted if/when we can truly separate prompt from data.

win311fwg · 2026-06-26 17:33:07 UTC

That is a different product line. It may be recorded as a version bump for marketing purposes, as already mentioned, but semantically begins at 0.

charcircuit · 2026-06-26 19:42:36 UTC

Why would incompatibilities have anything to do with a major version bump?

alcasa · 2026-06-26 17:24:27 UTC

They forgot how to do pretraining.

cleaning · 2026-06-26 17:47:11 UTC

5.5 was a new pretraining run.

paxys · 2026-06-26 19:17:01 UTC

Given the expectations everyone has created GPT-6 has to pretty much be AGI.

tasuki · 2026-06-26 19:36:32 UTC

What is your definition of AGI that the current LLMs don't fit?

paxys · 2026-06-26 19:47:43 UTC

As the old saying goes, I’ll know it when I see it. The current 5.x generation isn’t it.

gordonhart · 2026-06-26 19:53:54 UTC

Autonomously Generating Income (which is why it will never be released to the general public)

koolala · 2026-06-27 05:24:04 UTC

Hopefully it stands for AC Generation Improvements. If it prioritizes income it will bleed the planet dry. It needs to solve how expensive our cost is on the planet first or its entire existence was a mistake.

ThrowawayTestr · 2026-06-26 22:45:55 UTC

When it understands why 6 7 is funny

isomorphic_duck · 2026-06-26 23:01:09 UTC

Continual Learning? Why is this even a question? Isn’t it a well-known glaring issue with the current models? They cannot learn/adapt to new skills (in any permanent sense) once they are deployed.

FromTheFirstIn · 2026-06-26 23:19:51 UTC

You’d have to really stretch the definition of AGI to make the current models fit

LordDragonfang · 2026-06-27 02:53:32 UTC

The definition has already been stretched to not fit the previous models. There is no meaningful, static definition that significantly predates current capabilities.

There's a reason why ai xrisk doomers had to come up with the term ASI.

I would seriously suggest that everyone take a look at the wikipedia page for AGI from the month before ChatGPT was released, compare it to the current version, and not come to that conclusion.

https://en.wikipedia.org/w/index.php?title=Artificial_genera...

FromTheFirstIn · 2026-06-27 03:34:49 UTC

The first sentence is “understand or learn any intellectual task that a human can.” Whatever you think of the benefits of LLMs, they don’t understand and they can only learn during the training period and with very minor adjustments in post training. So, no I don’t think any of these models are generally intelligent.

LordDragonfang · 2026-06-27 07:41:54 UTC

> they don’t understand

I have not seen any instance of this frequently-made assertion which is at all justified. It seems to rely on a definition of "understand" which is more about spirituality than actual observable evidence (they clearly can comprehend even complex tasks well enough to execute on them, and if you won't call that "understanding", you're playing word games rather than stating an objective fact).

Likewise, agents can literally come to a greater understanding of a problem through trial and error, and there are plenty of mechanisms to retain that knowledge. If you don't want to call that "learning", you're just making a choice to define it in a way more restrictive than how we use it for humans, and intentionally making communication more difficult.

mellosouls · 2026-06-27 10:11:44 UTC

It seems to rely on a definition of "understand" which is more about spirituality than actual observable evidence

"Understanding" has enough philosophical leeway in its use to allow at least the possibility of sentience as a prerequisite.

This is where the discussion about LLM capabilities becomes genuinely difficult, and dismissing that difficulty as "word games" or "spirituality vs evidence" is not helpful.

LordDragonfang · 2026-06-29 04:56:57 UTC

Considering that "sentience" has enough "philosophical leeway" that it's just as reasonable to assert that LLMs are sentient (and at extremes, that they have been sentient for years) -- especially if we are, as you suggest, supposed to include any philosophically possible definition -- I don't think that's a meaningful rebuttal. If no one can agree on whether it's sentient, it's bad faith to choose a fringe definition that hands off its definition to such a nebulous term.

In fact, I'd argue that statements about what "is" and "is not" sentient relies on even more spirituality and word games for anything that isn't a terran tetrapod.

For a meaningful -- "helpful" -- discussion on such things, one has to assume that everyone is choosing a definition which is closer to the median usage and relies on not being totally subjective. Furthermore, given the breadth of options, it should be assumed to be a definition which allows which permits the form of the question to be meaningful, rather than begging the question -- if your definition is tautological enough that non-biological entities can't have understanding, you're just expressing dogma rather than having a discussion.

Anything else is bad faith, or assuming bad faith on the part of the participants.

mellosouls · 2026-06-29 10:37:32 UTC

it's bad faith to choose a fringe definition that hands off its definition to such a nebulous term.

I do not think it is at all unreasonable or "fringe" to regard understanding as involving intentionality: ie a directedness of thought toward the object-relations being "grasped". That may not be the only possible conception of understanding but it is a mainstream philosophical idea.

In fact, I'd argue that statements about what "is" and "is not" sentient relies on even more spirituality and word games for anything that isn't a terran tetrapod.

Then you seem to be confusing "hard to understand" with "meaningless".

you're just expressing dogma rather than having a discussion.

Anything else is bad faith, or assuming bad faith on the part of the participants

Have a think about that (repeated) tone before responding.

Fwiw I am a long-time believer in consciousness being fully realisable in machines; I think the jury is still out on LLMs.

FromTheFirstIn · 2026-06-27 12:07:09 UTC

Agents are always combining the same underlying weights to their inputs, relying on the same maps of semi-semantic space and the relationships between those that it was leaning towards at training time. The fact that it’s successful in making lots of people have an Eliza effect doesn’t make it understand something. It’s simulating understanding based on an enormous corpus of text, much of which is people working through things or sharing an understanding of something. Unless you believe that all intellectual activity is about finding the space between words you shouldn’t believe LLMs have any chance at understanding anything.

knollimar · 2026-06-27 13:57:05 UTC

The "it's not X it's Y" where Y qnd X are the same indicates a lack of understanding.

LordDragonfang · 2026-06-29 04:32:43 UTC

Consider the number of humans I've seen make statements that fit that description (about AI, no less!), I don't think that's a strong argument against it.

FromTheFirstIn · 2026-06-30 00:51:54 UTC

Would you say those humans understand what they’re talking about?

LordDragonfang · 2026-06-30 15:42:24 UTC

Touché

mellosouls · 2026-06-27 10:04:06 UTC

From that same page:

Various criteria for intelligence have been proposed (most famously the Turing test) but to date, there is no definition that satisfies everyone

0x696C6961 · 2026-06-26 23:35:18 UTC

Always one goalpost away from what we have.

UltraSane · 2026-06-27 02:07:54 UTC

AGI should be able to do every job a human can do using a computer at least as well as the average human.

LordDragonfang · 2026-06-27 02:52:10 UTC

That's already been true for a while, you're overestimating the average human. They just have different failure modes.

UltraSane · 2026-06-29 17:14:38 UTC

It isn't even close to true. The biggest problem is that humans performance improves over time.

https://www.linkedin.com/pulse/announcing-aa-briefcase-bench...

AA-Briefcase is a new benchmark for testing models on realistic knowledge work tasks in complex projects built by industry experts. Models are evaluated on multi-week knowledge work projects, each with many linked tasks and thousands of input source files. AA-Briefcase combines rubric and pairwise grading to evaluate verifiable task success, analytical quality, and presentation quality, giving a holistic view of overall agentic capability in knowledge work.

Tasks with many messy input files, conflicting information, and complex deliverables remain difficult for all models. Under a strict all-or-nothing grading scheme per task, Claude Fable 5 leads overall, but achieves a perfect task score on only 3% of tasks. On 31 of 91 tasks, no model scores above 50%.

Davidzheng · 2026-06-27 02:53:40 UTC

And what is it worse at than an average human today that can be done on a computer?

UltraSane · 2026-06-27 04:14:24 UTC

almost everything? AGI has to be able to completely replace a human in any information worker role indefinitely.

virgildotcodes · 2026-06-27 07:58:31 UTC

I think you're speeding past the word "average" in the sentence. I'd argue that current frontier models already exceed the abilities of average humans across the majority of tasks you can do on a computer, although you might be able to argue that they tend to be a bit slower?

That latter part is debatable though - have you seen a non-technical person try to figure out something new on a computer?

UltraSane · 2026-06-27 08:23:33 UTC

" I'd argue that current frontier models already exceed the abilities of average humans " for things that fit in their context window sure but LLMs can't learn over time the way humans can. One example is LLMs are very good at writing a few thousands line of code but they absolutely cannot write coherent million line codebases. By average human I meant the average skill level for the job. AGI would need to be able to pass a interview and get hired and the perform well enough to not get fired.

Davidzheng · 2026-06-27 10:21:28 UTC

Yeah it's not true that for every job, it is better than median worker of that job. But it is conceivable that for almost all jobs it is already better than the median human (not just workers of that job).

isomorphic_duck · 2026-06-27 14:19:10 UTC

You have to understand that the median human is terrible at (almost) everything. Humans, the only examples of general intelligence we know, are economically valuable precisely because they can train themselves to specialise at a (relatively) narrow task over time. You don’t measure how good a coding model is by how well it programs relative to Doctors, or how well it can prove theorems relative to baristas, or how well it can write coherent novels relative to programmers. That would be a dumb metric.

tasuki · 2026-06-27 19:23:18 UTC

> Humans, the only examples of general intelligence we know

Our intelligence only seems "general" to us, because we're viewing it through our own eyes. Our "intelligence" is specialized to our survival, and we're terrible at most tasks outside that scope.

isomorphic_duck · 2026-06-28 11:48:09 UTC

We operate and think about subjects like Higher Topos Theory, Information Geometry and Algebraic Topology, which are several layers of abstractions removed from anything that can be termed as a skill “specialised to our survival”.

Davidzheng · 2026-06-27 10:22:53 UTC

But in any case, I think more than 10% of information workers today can be replaced by current-generation models indefinitely.

ChrisLTD · 2026-06-27 15:23:04 UTC

It's decent at rote coding tasks, but I haven't seen these things be reliable enough outside of that specific task to make the claim that it can do the work of any information worker.

UltraSane · 2026-06-27 22:09:06 UTC

https://www.linkedin.com/pulse/announcing-aa-briefcase-bench...

UltraSane · 2026-06-27 22:09:32 UTC

https://www.linkedin.com/pulse/announcing-aa-briefcase-bench...

leumon · 2026-06-26 17:13:43 UTC

> We plan to make them more broadly available to people using ChatGPT, Codex, and the API soon.

I hope this means then fable will also get released again.

lanthissa · 2026-06-26 17:27:05 UTC

why would it? if you're the us gov and sam&greg your good boy giving you 25m

and dario's you naughty boy who you dont agree with politically.

Let 5.6 free, keep fable chained and anthropic instantly sees rev loss and has to cave.

osti · 2026-06-26 17:14:02 UTC

Sol? Looks like openai is jealous of anthropics good model naming ability and wants to emulate it.

dominotw · 2026-06-26 17:21:42 UTC

sol has no soul

taytus · 2026-06-26 17:25:27 UTC

It's missing u

alcasa · 2026-06-26 17:25:43 UTC

They should have used Figher Jet codenames instead. The MiG-15 one has a nice ring to it.

arizen · 2026-06-26 17:59:39 UTC

Sol Goodman

MrCheeze · 2026-06-26 17:53:17 UTC

TBF, they did it first with ada/babbage/curie/davinci. "Sol" is a much weaker branding, though.

ddp26 · 2026-06-26 17:14:06 UTC

I'm going to pre-register my prediction that GPT-5.6 Sol is significantly behind Claude Fable 5, as evaluated by general consensus once time has passed for people to get familiar with both.

hmate9 · 2026-06-26 17:15:06 UTC

What is this prediction based on?

gpm · 2026-06-26 17:16:46 UTC

I suspect the same just based on their versioning scheme fwiw.

jstummbillig · 2026-06-26 17:19:26 UTC

solid