GPT-5.5 Codex reasoning-token clustering at 516/1034/1552 may be leading to degraded performance on complex tasks

Source: https://github.com/openai/codex/issues/30364

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

- [x] Include my email address so I can be contacted

Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel Create saved search

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert

Uh oh!

There was an error while loading. Please reload this page.

openai/**codex**Public

- NotificationsYou must be signed in to change notification settings

- Code

- Actions

- Security and quality 1

- Insights

Additional navigation options

- Code

- Issues

- Pull requests

- Discussions

- Actions

- Security and quality

- Insights

GPT-5.5 Codex reasoning-token clustering at 516/1034/1552 may be leading to degraded performance on complex tasks#30364

New issue

Copy link

New issue

Copy link

Open

GPT-5.5 Codex reasoning-token clustering at 516/1034/1552 may be leading to degraded performance on complex tasks#30364

Copy link

Labels

bug Something isn't workingSomething isn't workingmodel-behavior Issues related to behaviors exhibited by the modelIssues related to behaviors exhibited by the modelrate-limits Issues related to rate limits, quotas, and token usage reportingIssues related to rate limits, quotas, and token usage reporting

Description

![Image 1: @vguptaa45](https://github.com/vguptaa45)

vguptaa45

opened on Jun 27, 2026

Issue body actions

Summary

I found an aggregate pattern in Codex `token_count` metadata: `gpt-5.5` responses disproportionately land at exactly `reasoning_output_tokens = 516`, with additional fixed-boundary spikes around `1034` and `1552`.

This appears model-specific and coincides with lower overall reasoning-token intensity, which may help explain degraded performance on complex/high-stakes Codex tasks.

This is related to #29353, which reported a task-level reproduction where `gpt-5.5` runs ending at exactly 516 reasoning tokens returned the wrong answer. This issue adds aggregate evidence across a larger Feb-Jun window.

I am not claiming this proves hidden chain-of-thought truncation. The narrower claim is that Codex telemetry shows a GPT-5.5-specific fixed-token clustering anomaly that looks consistent with thresholded reasoning-budget behavior.

Environment

- Product: Codex

- Model most implicated: `gpt-5.5`

- Data source: Codex `token_count` metadata

- Time window analyzed: Feb 1-Jun 27, 2026 UTC

Evidence

| Metric | Value | | --- | ---: | | Response-level token records analyzed | 390,195 | | Sessions represented | 865 | | Exact `reasoning_output_tokens = 516` events | 3,363 | | GPT-5.5 share of all responses | 19.3% | | GPT-5.5 share of exact-516 events | 82.0% | | GPT-5.5 exact-516 / >=516 ratio | 44.0% | | Non-GPT-5.5 exact-516 / >=516 ratio | 1.3% |

Model-level result:

| Model | Response records | Exact 516 / >=516 | | --- | ---: | ---: | | `gpt-5.5` | 75,401 | 44.0% | | `gpt-5.4` | 25,214 | 19.8% | | `gpt-5.2` | 247,575 | 0.34% | | `gpt-5.3-codex` | 13,333 | 0.0% | | `gpt-5.3-codex-spark` | 26,179 | 0.0% |

Monthly exact-516 clustering increased sharply:

| Month | Exact 516 / >=516 | | --- | ---: | | Feb 2026 | 0.11% | | Mar 2026 | 2.45% | | Apr 2026 | 4.25% | | May 2026 | 53.30% | | Jun 2026 | 35.84% |

At the same time, overall reasoning-token intensity decreased:

| Month | Mean reasoning tokens | P90 reasoning tokens | | --- | ---: | ---: | | Feb 2026 | 268.1 | 772 | | Mar 2026 | 256.8 | 723 | | Apr 2026 | 228.7 | 669 | | May 2026 | 106.9 | 344 | | Jun 2026 | 168.5 | 515 |

Why this looks suspicious

The anomaly is not simply higher reasoning-token usage overall. Mean and P90 reasoning-token intensity fell from February-April to May-June, while exact-516 clustering rose sharply.

The clustering is also not evenly distributed across models. `gpt-5.5` accounts for only 19.3% of responses but 82.0% of exact-516 events. Its exact-516 / >=516 ratio is about 33.6x higher than the non-GPT-5.5 baseline.

The fixed values are also notable: `516`, `1034`, and `1552` look like repeated threshold boundaries rather than a naturally varying reasoning-token distribution.

Expected behavior

Reasoning-token counts for complex Codex tasks should vary naturally with task complexity and should not disproportionately cluster at exact fixed values for one model family.

Actual behavior

`gpt-5.5` responses cluster heavily at exactly 516 reasoning tokens, with related spikes around 1034 and 1552. This pattern is much weaker or absent in several other models.

Ask

Could the Codex team investigate whether `gpt-5.5` has a reasoning-budget, routing, truncation, fallback, or scheduler behavior that causes responses to terminate around 516/1034/1552 reasoning tokens?

If this is expected behavior, it would be useful to know whether exact 516 indicates a normal stopping point, a budget cap, a degraded tier, or another internal threshold.

Useful internal validation checks:

1. Query `token_count` events with `reasoning_output_tokens` by model. 2. Compare exact-value counts for `0`, `516`, `1034`, and `1552`. 3. Compute `count(reasoning_output_tokens = 516) / count(reasoning_output_tokens >= 516)` by model and day. 4. Compare `gpt-5.5` against `gpt-5.2`, `gpt-5.4`, and Codex-specific variants. 5. Replay matched complex tasks across GPT-5.2 and GPT-5.5 with quality evals, especially separating exact-516 responses from longer-reasoning responses.

👍React with 👍48 revantmalani, juhaase, loner2403, partment, jianhongyu136 and 43 more😕React with 😕4 92645417d9e5c763259dbebc306e3e, YMingF, H-Sofie and Sing303👀React with 👀8 gydx6, Lionel233, lujunjiehhh, Sing303, guidedways and 3 more

Activity

![Image 2github-actions](https://github.com/apps/github-actions)

added

bug Something isn't workingSomething isn't working

model-behavior Issues related to behaviors exhibited by the modelIssues related to behaviors exhibited by the model

rate-limits Issues related to rate limits, quotas, and token usage reportingIssues related to rate limits, quotas, and token usage reporting

on Jun 27, 2026

github-actions commented on Jun 27, 2026

![Image 3: @github-actions](https://github.com/apps/github-actions)

github-actionsbot

on Jun 27, 2026 – with GitHub Actions

Contributor

More actions

Potential duplicates detected. Please review them and close your issue if it is a duplicate.

- gpt-5.5 xhigh sometimes short-circuits with reasoning_output_tokens=516 and wrong final_answer in Codex Desktop#29353

_Powered by Codex Action_

👎React with 👎8 sebas1111111, cr-zhichen, gziqt, vguptaa45, jb2519 and 3 more

revantmalani commented on Jun 28, 2026

![Image 4: @revantmalani](https://github.com/revantmalani)

revantmalani

on Jun 28, 2026

More actions

I've been facing the same issue and am very frustrated as well

bluecat1997 commented on Jun 28, 2026

![Image 5: @bluecat1997](https://github.com/bluecat1997)

bluecat1997

on Jun 28, 2026

More actions

meet same problem, desire openAI to feedback!

🚀React with 🚀4 YMingF, pingzhihe, 016 and cocofoxfox

bluecat1997 commented on Jun 28, 2026

![Image 6: @bluecat1997](https://github.com/bluecat1997)

bluecat1997

on Jun 28, 2026

More actions

> Potential duplicates detected. Please review them and close your issue if it is a duplicate. > > > * gpt-5.5 xhigh sometimes short-circuits with reasoning_output_tokens=516 and wrong final_answer in Codex Desktop#29353 > > > _Powered by Codex Action_

This is a much more data driven report than the previous one

👍React with 👍1 tanseydavid

vguptaa45 commented on Jun 28, 2026

![Image 7: @vguptaa45](https://github.com/vguptaa45)

vguptaa45

on Jun 28, 2026

Author

More actions

> > Potential duplicates detected. Please review them and close your issue if it is a duplicate. > > > > > > * gpt-5.5 xhigh sometimes short-circuits with reasoning_output_tokens=516 and wrong final_answer in Codex Desktop#29353 > > > > > > _Powered by Codex Action_ > > > This is a much more data driven report than the previous one

I agree, the previous one was closed for no reason. I hope this takes their attention

👍React with 👍1 tanseydavid

Lionel233 commented on Jun 28, 2026

![Image 8: @Lionel233](https://github.com/Lionel233)

Lionel233

on Jun 28, 2026

Last edited by Lionel233

More actions

Exactly — this matches what I found, and it's clearly not an isolated case. I shared my initial finding on Reddit earlier (Half of Your High-Stakes Codex Requests May Be Silently Downgraded by Truncated Reasoning), and it's great that you've now dug deeper with model-specific and monthly data.

I've added a link to this GitHub issue in that Reddit post, so readers can cross-reference and upvote here.

Thanks for the thorough testing!

❤️React with ❤️4 vguptaa45, juhaase, Barometer-2002 and tanseydavid

loner2403 commented on Jun 28, 2026

![Image 9: @loner2403](https://github.com/loner2403)

loner2403

on Jun 28, 2026

More actions

Same issue

partment commented on Jun 28, 2026

![Image 10: @partment](https://github.com/partment)

partment

on Jun 28, 2026

More actions

Same issue

Suvmaker commented on Jun 28, 2026

![Image 11: @Suvmaker](https://github.com/Suvmaker)

Suvmaker

on Jun 28, 2026

More actions

same problem

lujunjiehhh commented on Jun 28, 2026

![Image 12: @lujunjiehhh](https://github.com/lujunjiehhh)

lujunjiehhh

on Jun 28, 2026

More actions

Same issue

haowang02 commented on Jun 28, 2026

![Image 13: @haowang02](https://github.com/haowang02)

haowang02

on Jun 28, 2026

More actions

same issue

owiofwm2i commented on Jun 28, 2026

![Image 14: @owiofwm2i](https://github.com/owiofwm2i)

owiofwm2i

on Jun 28, 2026

More actions

> > > Potential duplicates detected. Please review them and close your issue if it is a duplicate. > > > > > > > > > * gpt-5.5 xhigh sometimes short-circuits with reasoning_output_tokens=516 and wrong final_answer in Codex Desktop#29353 > > > > > > > > > _Powered by Codex Action_ > > > > > > This is a much more data driven report than the previous one > > > I agree, the previous one was closed for no reason. I hope this takes their attention

You should post this on the official OpenAI forums too:

https://forum.openai.com/

https://community.openai.com/

Those are more likely than Github to be seen by people who actually have the authority to investigate or escalate model-quality issues internally.

14 remaining items

MioQuispe commented on Jun 30, 2026

![Image 15: @MioQuispe](https://github.com/MioQuispe)

MioQuispe

on Jun 30, 2026

More actions

Its completely unusable! It cannot be that I pay 200$ a month and select XHigh but it gets outperformed by a free-tier chinese model...

I dont want any resets. They are collecting dust because of this.

Stop handing them out if you cant handle the capacity.

Id rather have -50% less usage if it meant the model actually spent any time thinking at all.

👍React with 👍6 Alexius66, partment, momadacoding, pingzhihe, vguptaa45 and 1 more

DanielMulec commented on Jun 30, 2026

![Image 16: @DanielMulec](https://github.com/DanielMulec)

DanielMulec

on Jun 30, 2026

More actions

Same issue

![Image 17github-actions](https://github.com/apps/github-actions)

mentioned this on Jun 30, 2026

- Wrong answer in turn #30731

white54503 commented on Jun 30, 2026

![Image 18: @white54503](https://github.com/white54503)

white54503

on Jun 30, 2026

More actions

Same issue.

92645417d9e5c763259dbebc306e3e commented on Jun 30, 2026

![Image 19: @92645417d9e5c763259dbebc306e3e](https://github.com/92645417d9e5c763259dbebc306e3e)

92645417d9e5c763259dbebc306e3e

on Jun 30, 2026

More actions

an effective mitigation measure is to modify the system prompt(Share intermediary updates in `commentary` channel.) to

- Do not send optional commentary.

- You do not need to use the commentary channel to report progress to me.

- Use tools normally.

- Put user-facing text in final only.

or directly use the gpt-5.2-codex prompt

insanowsky commented on Jun 30, 2026

![Image 20: @insanowsky](https://github.com/insanowsky)

insanowsky

on Jun 30, 2026

More actions

> an effective mitigation measure is to modify the system prompt(Share intermediary updates in `commentary` channel.) to > > > * Do not send optional commentary. > * You do not need to use the commentary channel to report progress to me. > * Use tools normally. > * Put user-facing text in final only. > > or directly use the gpt-5.2-codex prompt

system prompt does not manage juice value

![Image 21github-actions](https://github.com/apps/github-actions)

mentioned this on Jul 1, 2026

- 📊 AI CLI 工具社区动态日报 2026-07-01 zx0828/big_model_radar#204

016 commented on Jul 1, 2026

![Image 22: @016](https://github.com/016)

016

on Jul 1, 2026

More actions

Same issue, it's for $200?

MaShouo commented on Jul 1, 2026

![Image 23: @MaShouo](https://github.com/MaShouo)

MaShouo

on Jul 1, 2026

More actions

Same issue, edit system prompt can relieve a bit

![Image 24github-actions](https://github.com/apps/github-actions)

mentioned this in 3 issues on Jul 2, 2026

- 📰 Hacker News AI Digest 2026-07-02 kakapez/agents-radar#589

- 📰 Hacker News AI 社区动态日报 2026-07-02 96loveslife/big_model_radar#94

- 📰 Hacker News AI 社区动态日报 2026-07-02 litang9/big_model_radar#153

haowang02 commented on Jul 2, 2026

![Image 25: @haowang02](https://github.com/haowang02)

haowang02

on Jul 2, 2026

Last edited by haowang02

More actions

Because of this issue, I haven't been able to use Codex for any real work for quite a while now.

If this doesn't get fixed, I don't see any reason to keep paying for the subscription.

It's pretty disappointing that GPT-5.5 xhigh is now delivering a worse experience than some budget open-source models.

**This is outright fraud!!!**

👍React with 👍5 vguptaa45, pingzhihe, yk-liang, cocofoxfox and MioQuispe

vguptaa45 commented on Jul 2, 2026

![Image 26: @vguptaa45](https://github.com/vguptaa45)

vguptaa45

on Jul 2, 2026

Author

More actions

> Because of this issue, I haven't been able to use Codex for any real work for quite a while now. > > If this doesn't get fixed, I don't see any reason to keep paying for the subscription. > > It's pretty disappointing that GPT-5.5 xhigh is now delivering a worse experience than some budget open-source models.

YES LITERALLY THIS. Gpt-5.5xhigh thinking for only 30 seconds regularly is abysmal. Im holding for gpt5.6 and see if this issue is resolved, otherwise ill shift my team too.

👍React with 👍1 haowang02

MaShouo commented on Jul 2, 2026

![Image 27: @MaShouo](https://github.com/MaShouo)

MaShouo

on Jul 2, 2026

More actions

The same test question, when asked in Opencode, achieves a 100% accuracy rate, but in Codex, the accuracy drops to nearly 0%. I don't know what the OpenAI team is doing. This is clearly not an issue of model intelligence—a single system prompt can ruin an entire model.

Metadata

Assignees

No one assigned

Labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Participants

![Image 28: @BobDLA](https://github.com/BobDLA)![Image 29: @MioQuispe](https://github.com/MioQuispe)![Image 30: @partment](https://github.com/partment)![Image 31: @Lionel233](https://github.com/Lionel233)![Image 32: @016](https://github.com/016)

+21

Issue actions

- !Image 33Open in GitHub Copilot app

Footer

Footer navigation

- Terms

- Privacy

- Security

- Status

- Community

- Docs

- Contact

- Manage cookies

- Do not share my personal information

You can’t perform that action at this time.

GPT-5.5 Codex reasoning-token clustering at 516/1034/1552 may be leading to degraded performance on complex tasks

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPT-5.5 Codex reasoning-token clustering at 516/1034/1552 may be leading to degraded performance on complex tasks#30364

Description

Summary

Environment

Evidence

Why this looks suspicious

Expected behavior

Actual behavior

Ask

Activity

github-actions commented on Jun 27, 2026

revantmalani commented on Jun 28, 2026

bluecat1997 commented on Jun 28, 2026

bluecat1997 commented on Jun 28, 2026

vguptaa45 commented on Jun 28, 2026

Lionel233 commented on Jun 28, 2026

loner2403 commented on Jun 28, 2026

partment commented on Jun 28, 2026

Suvmaker commented on Jun 28, 2026

lujunjiehhh commented on Jun 28, 2026

haowang02 commented on Jun 28, 2026

owiofwm2i commented on Jun 28, 2026

14 remaining items

MioQuispe commented on Jun 30, 2026

DanielMulec commented on Jun 30, 2026

white54503 commented on Jun 30, 2026

92645417d9e5c763259dbebc306e3e commented on Jun 30, 2026

insanowsky commented on Jun 30, 2026

016 commented on Jul 1, 2026

MaShouo commented on Jul 1, 2026

haowang02 commented on Jul 2, 2026

vguptaa45 commented on Jul 2, 2026

MaShouo commented on Jul 2, 2026

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Participants

Issue actions

Footer

Footer navigation