That's because Claude is on a lunch break and decided to take a short breather.
phishin · 2026-04-28 18:12:22 UTC
Bro deserves it.
rikthevik · 2026-04-28 18:19:22 UTC
I think we all deserve a little break right now.
sebastiennight · 2026-04-28 18:22:05 UTC
I'm experimenting with a simple ritual: if Claude is out, I'm out.
I'll just go for a walk outside.
And I don't mean "if I can't access Claude to do my work", I mean, just in general - I'll just ping claude.ai from time to time and use Claude's breaks as a break reminder.
Up-time girl, she's been living in her up-time world...
burnte · 2026-04-28 19:13:02 UTC
I bet she's never had a downtime guy, I bet her momma never told her why.
SilverElfin · 2026-04-28 19:55:01 UTC
Is there a word for the phenomenon where you automatically read something in someone’s voice or in the rhythm of a song?
jplona · 2026-04-28 18:38:01 UTC
Sadly not colorblind friendly
happytoexplain · 2026-04-28 18:42:02 UTC
Yeah, to me it looks like, I think red, and then at least two similar shades of green, and grey.
rdtsc · 2026-04-28 18:36:28 UTC
From 5 9s to 9 5s
2ndorderthought · 2026-04-28 18:40:38 UTC
The question is is it DNS or an AI outage. Hmmmm
EForEndeavour · 2026-04-28 18:48:46 UTC
Just another Mythos breakout. Excuse us while we airgap the affected DC and send in a team to drive framing nails into every storage device in the building.
lousken · 2026-04-28 19:30:33 UTC
Can't they use Mythos to figure out their uptime?
scosman · 2026-04-28 19:33:08 UTC
Mythos prompt: Hey Mythos, make me 20,000 H100s.
Hamuko · 2026-04-28 20:16:26 UTC
They weren't able to use it to prevent Claude Code source code from leaking, or from some random Discord server from gaining access to Mythos.
sva_ · 2026-04-28 20:40:53 UTC
> prevent Claude Code source code from leaking
That's silly. It's a JavaScript app, they are more or less open source by design. There was no secret sauce in Claude Code.
delusional · 2026-04-28 21:03:43 UTC
Odd hos tyey still DMCA'd the rehosts of the leak. Clearly they dont consider it "open source".
apetresc · 2026-04-28 19:43:28 UTC
Not so fast, it's currently 98.59%. That's technically two 9s!
xvedejas · 2026-04-28 20:57:06 UTC
If 90% is one nine and 99% is two nines, we can use the logarithm to compute how many fractional nines we have at 98.59%: about 1.9788 nines (almost two!)
hit8run · 2026-04-28 18:23:43 UTC
Impossible! I heard Mythos is so goooood they can only give it to big corporations because it makes no mistakes and shit.
jtfrench · 2026-04-28 18:41:00 UTC
Hopefully Mythos didn't go rogue and hold production hostage.
beernet · 2026-04-28 18:24:49 UTC
More than by the downtime I am much more surprised by the actual uptime. Hard to imagine how difficult this must be, given the speed of growth.
nippoo · 2026-04-28 18:34:13 UTC
Truly! As someone who's worked with HPC and GPUs in a scientific research context, trying to get a service like this to work reliably is a different ballgame to your usual webapp stack...
CSSer · 2026-04-28 18:40:22 UTC
Can you speak a little more to this? I'm curious what kind of parameters one must consider/monitor and what kind of novel things could go wrong.
aleksiy123 · 2026-04-28 19:12:07 UTC
My guesses are:
hardware capacity constraints is going to be the big one
Effective caching is another, I bet if you start hitting cold caches the whole things going to degrade rapidly.
The ground is probably shifting pretty rapidly.
Power users are trying to get the most out of their subscriptions and so are hammering you as fast as they possibly can. See Ralph loops.
Harnesses are evolving pretty rapidly, as well as new alternatives harnesses. Makes the load patterns less predictable, harder to cache.
The demand is increasing both from more customers, but also from each user as they figure out more effective workflows.
Users are pretty sensitive to model quality changes. You probably want smart routing, but users want the best model all the time.
Models keep getting bigger and bigger.
On top of that they are probably hiring more onboarding more, system complexity and codebase complexity is growing.
lostlogin · 2026-04-28 18:42:26 UTC
But… imagine that same scientific research but you have an unlimited budget. I’d imagine that helps.
Some of the comments here mention their monthly spend, and it’s eye watering.
rvnx · 2026-04-28 19:45:10 UTC
I think you have to see this as a bunch of stateless requests, and this makes the problem way easier.
LLM requests that do not call tools do not need anything external by definition.
No central server, nothing, they can even survive without the context cache.
All you need is to load (and only once!) the read-only immutable model weights from a S3-like source on startup.
If it takes 4 servers to process a request, then you can group them 4 by 4, and then send a request to each group (sharding).
Copy-paste the exact same-setup XXX times and there you have your highly-parallelizable service (until you run out of money).
It's very doable, any serious SRE can find a way setup "larger than one card" models like Kimi or DeepSeek (unquantized) if they have a tightly-coupled HPC (or a pair of very very beefy servers).
If you run out of servers, then again a money problem, but not an architectural problem (and modern datacenters are already scalable).
Take the best SRE, but no budget, and there is no solution.
So inference is the easy part.
Codex or Claude Code if it takes lot of time or have slow cold latency, it's considered very acceptable.
Some users would probably not even see the difference if a request takes 2 minutes versus 3 minutes.
The real difficult part is to have context caching and external tools, because now you are depending on services that might be lagging.
Executing code, browsing the web, all of that is tricky to scale because they are very unreliable (tends to timeout, requires large cache of web pages, circumventing captchas, etc).
These are traditional scaling problems, but they are more difficult because all these pieces are fragile and queues can snowball easily.
wrs · 2026-04-28 18:44:46 UTC
On the other hand, the status page is blaming the authentication system, which one would think is not a frontier-class problem.
gordon_freeman · 2026-04-28 18:26:04 UTC
I am getting an error that selected model (I selected Opus 4.6 and 4.7 later) is unavailable but when I tried Sonnet it worked for me.
neosat · 2026-04-28 18:26:10 UTC
"We are investigating an issue preventing users from reaching Claude.ai, and will provide an update as soon as possible."
Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
cloud-oak · 2026-04-28 18:27:21 UTC
You can always ask Codex to fix Claude, issue solved!
The_Blade · 2026-04-28 18:29:59 UTC
> Who is We?
Adam Neumann is back!
in agent form
lacy_tinpot · 2026-04-28 18:31:06 UTC
I've never really understood this kind of sneer comment.
Kiro · 2026-04-28 18:52:24 UTC
The amount of unfunny reddit snark in this thread is embarrassing.
Overpower0416 · 2026-04-28 18:26:51 UTC
I almost uninstalled the Claude app because I thought they started blocking VPNs. Lol
Good thing I checked Hacker News first
ai-tamer · 2026-04-28 19:15:11 UTC
Same here. Spent 5 minutes blaming my VPN before HN saved me.
Imustaskforhelp · 2026-04-28 18:27:15 UTC
just tried it, can confirm claude.ai is down.
So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.
We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.
Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?
My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
plodman · 2026-04-28 18:27:21 UTC
Literally just got an email about connecting GitHub to the iOS app and now it’s down. Spike in traffic perhaps?
152334H · 2026-04-28 18:27:28 UTC
why does this even occur? if it's merely compute limitations, why not just 429 some requests?
ryanisnan · 2026-04-28 18:30:02 UTC
Have you run a system in production? There are a multitude of reasons that a system can go down. There's no indication so far from Anthropic that this was merely compute limitations.
consumer451 · 2026-04-28 18:33:28 UTC
Yeah, this is not just inference. First thing for me was an MCP I use went down in Claude Code, models still worked. Now "API Error: 529 Authentication service is temporarily unavailable."
lionkor · 2026-04-28 18:36:47 UTC
Its most likely a "You're totally right, this fix broke production! Let me fix it"
KronisLV · 2026-04-28 18:40:36 UTC
> There are a multitude of reasons that a system can go down.
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
SpicyLemonZest · 2026-04-28 19:22:54 UTC
I have no inside knowledge of Anthropic. But having done a lot of postmortems in general, one of the key dynamics that routinely comes up is "we know we keep shipping breakages, and we know these new procedures would prevent many of them, but then we wouldn't be able to deliver new stuff so quickly". Given where Anthropic is at and what they believe about the future of software development, that's a tradeoff that they may very well be intentionally not making.
MavisBacon · 2026-04-28 18:27:54 UTC
Glad I started using the desktop app which is still working. Gotta say though, all of these difficulties with Claude are making me nervous as I use it a lot for work and really don't like ChatGPT/OpenAI for functional and personal reasons. Zo Computer has been my main fallback when Claude is failing, I'll use one of their many models temporarily within Zo's interface.
threepts · 2026-04-28 18:30:59 UTC
A trillion dollar valuation.
They should ask Codex now that Claude Code is down.
2ndorderthought · 2026-04-28 19:00:24 UTC
Careful, the next week codex could have all their products for sale shortly after.
btbuildem · 2026-04-28 18:31:57 UTC
They better fix that today, I need to downgrade my account before the subscription renews.
Congeec · 2026-04-28 18:48:25 UTC
hopefully their billing server is also available
simonerlic · 2026-04-28 18:32:46 UTC
Someone should tell Anthropic that 89.999 is the wrong "four nines" of uptime
Cider9986 · 2026-04-28 18:33:37 UTC
How are they going to fix it if the AI that designed it isn't working?
Hamuko · 2026-04-28 18:34:22 UTC
Gemini.
ge96 · 2026-04-28 18:36:22 UTC
ouroboros
mproud · 2026-04-28 18:43:05 UTC
Let’s ask AI
sodapopcan · 2026-04-28 18:53:47 UTC
You're absolutely right! AI could be very helpful in this situation!
Oh no wait... the outage is with out AI itself, so how can AI help? Allow me to re-evaluate.
Fublutenuating...
Yes, let's ask AI!
Oh no wait... the outage is with AI itself, I already correctly identified this above.
Bubbluating...
It seems you will have to rely on your engineering skills to solve this problem yourself, ie, you're cooked! I will auto-renew your subscription to ensure you can be sure you'll have access to AI to solve this problem if it ever comes back online.
rvnx · 2026-04-28 19:10:02 UTC
Sorry AI is not responding, enable /fast to activate per-request pricing.
No!
Comboculating...
I apologize for the misunderstanding, I have deleted your project. I am sorry, would you like me to restart everything from scratch ?
shmatt · 2026-04-28 18:43:06 UTC
Sam, Dario, and Sundar have the opportunity to create one of the funniest on call rotations in history
mrguyorama · 2026-04-28 20:46:44 UTC
Large telcos often have a chunk of subscriptions with their biggest competitor so that when they absolutely explode and everything is down, they can still communicate to bring it back up.
Clearly, half of Anthropic should have subscriptions to OpenAI or Mistral or whatever China sells.
netdur · 2026-04-28 18:33:55 UTC
they should just swap it with Qwen 3.6 27B, no one would tell the different
SimianSci · 2026-04-28 18:34:32 UTC
The spend at my organization has reached beyond the $200,000 per month level on Anthropic's enterprise tier.
The amount of outages we have had over these past few months are astounding and coupled with their horrendous support it has our executive team furious.
its alot of money to be spending for a single 9 of reliablility.
deadbabe · 2026-04-28 18:40:42 UTC
We are spending the equivalent of 32 monthly software engineer salaries on Claude per month.
cactusplant7374 · 2026-04-28 18:44:02 UTC
Is it worth it?
lolive · 2026-04-28 18:50:58 UTC
He was fired before answering.
[but as his manager I can tell you:] YES !!!!
SimianSci · 2026-04-28 19:12:41 UTC
Our expense is roughly around 12.3 software developers when you break it down across all people related expenses. But we've spent alot of time and energy prior to this focusing on our ability to measure our software development output across multiple teams.
The delivery improvements are not evenly applied across all teams, but the increases that we have seen suggest a better ROI than if we had hired 12 developers.
protonbob · 2026-04-28 19:24:02 UTC
I guess if you think about your teammates as purely inputs and outputs and not people that can improve and contribute in the workplace in other ways.
SimianSci · 2026-04-28 19:39:51 UTC
Respectfully,
After a certain level of compensation, you are indeed judged purely off of input and output.
Workplace improvement does not justify your salary.
You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them.
Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.
oarsinsync · 2026-04-28 19:49:12 UTC
> Respectfully, After a certain level of compensation, you are indeed judged purely off of input and output. Workplace improvement does not justify your salary.
I'd have to disagree. There's a narrow band in the middle where that's true, but once you exceed that, your personal inputs and outputs matter less and less, and the contributions you make to the overall workplace, and how well you enable those around you, make a larger part of why you're compensated.
Even as an IC, the more you're able to mentor and elevate the people around you, the more your compensation will grow (if you're in the right place, and thus already at the right earnings bracket)
paganel · 2026-04-28 19:59:38 UTC
> you are indeed judged purely off of input and output
That's not how successful (software, in this case) teams are made.
SimianSci · 2026-04-28 20:51:57 UTC
I would agree if the team im on were still growing/scaling.
However we are well past our scaling phase, and at this point our concern is maintaining multi-million dollar contracts with a tight well-compensated team.
midasz · 2026-04-28 20:05:03 UTC
It's genuinely hilarious how the same leadership pushing for RTO because getting people together creates magic, seems to have no issues trading those same people out for LLM's churning at specs.
maxrev17 · 2026-04-28 20:25:09 UTC
Haha nail on head so the motive for ‘get your ass back in the office’ was never the motive we all heard
jonny_eh · 2026-04-28 20:01:45 UTC
Info like this is useless without context like, how much revenue does the company earn? How many engineers do they employ? etc.
cactusplant7374 · 2026-04-28 18:41:08 UTC
Imagine how much money they would save if they switched to Codex.
subscribed · 2026-04-28 19:44:15 UTC
Not everyone can (due to the corporate compliance requirements, eg the ease of making the LLM not to train on anything).
Besides, codex wasn't always the answer.
noosphr · 2026-04-28 18:41:25 UTC
A single nine so far. If github is any guide thing will get worse.
smt88 · 2026-04-28 18:42:32 UTC
Why would Github be a guide? It's also terrible, but it's a radically different stack from an unrelated company
StableAlkyne · 2026-04-28 18:48:17 UTC
That, and even before AI, MS was having trouble with GH reliability
shimman · 2026-04-28 18:56:53 UTC
GitHub, along with MSFT in general, have massive copilot mandates where workers are being shamed into using slop tools to fix serious on-going issues. GitHub seems wholly incapable of resolving their issues: money isn't a problem, talent isn't a problem, but business leadership is definitely a major problem.
Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.
Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.
Comments
I'll just go for a walk outside.
And I don't mean "if I can't access Claude to do my work", I mean, just in general - I'll just ping claude.ai from time to time and use Claude's breaks as a break reminder.
Why should AI get a breather and not us?
I'm looking into how to structure my work to run some autonomous-safe jobs overnight to take advantage of it.
That's silly. It's a JavaScript app, they are more or less open source by design. There was no secret sauce in Claude Code.
hardware capacity constraints is going to be the big one
Effective caching is another, I bet if you start hitting cold caches the whole things going to degrade rapidly.
The ground is probably shifting pretty rapidly.
Power users are trying to get the most out of their subscriptions and so are hammering you as fast as they possibly can. See Ralph loops.
Harnesses are evolving pretty rapidly, as well as new alternatives harnesses. Makes the load patterns less predictable, harder to cache.
The demand is increasing both from more customers, but also from each user as they figure out more effective workflows.
Users are pretty sensitive to model quality changes. You probably want smart routing, but users want the best model all the time.
Models keep getting bigger and bigger.
On top of that they are probably hiring more onboarding more, system complexity and codebase complexity is growing.
Some of the comments here mention their monthly spend, and it’s eye watering.
If you run out of servers, then again a money problem, but not an architectural problem (and modern datacenters are already scalable).
Take the best SRE, but no budget, and there is no solution.
So inference is the easy part.
Codex or Claude Code if it takes lot of time or have slow cold latency, it's considered very acceptable.
Some users would probably not even see the difference if a request takes 2 minutes versus 3 minutes.
The real difficult part is to have context caching and external tools, because now you are depending on services that might be lagging.
These are traditional scaling problems, but they are more difficult because all these pieces are fragile and queues can snowball easily.Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
Adam Neumann is back!
in agent form
Good thing I checked Hacker News first
So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.
We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.
Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?
My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
They should ask Codex now that Claude Code is down.
Oh no wait... the outage is with out AI itself, so how can AI help? Allow me to re-evaluate.
Fublutenuating...
Yes, let's ask AI!
Oh no wait... the outage is with AI itself, I already correctly identified this above.
Bubbluating...
It seems you will have to rely on your engineering skills to solve this problem yourself, ie, you're cooked! I will auto-renew your subscription to ensure you can be sure you'll have access to AI to solve this problem if it ever comes back online.
No!
Comboculating...
I apologize for the misunderstanding, I have deleted your project. I am sorry, would you like me to restart everything from scratch ?
Clearly, half of Anthropic should have subscriptions to OpenAI or Mistral or whatever China sells.
its alot of money to be spending for a single 9 of reliablility.
[but as his manager I can tell you:] YES !!!!
You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them. Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.
I'd have to disagree. There's a narrow band in the middle where that's true, but once you exceed that, your personal inputs and outputs matter less and less, and the contributions you make to the overall workplace, and how well you enable those around you, make a larger part of why you're compensated.
Even as an IC, the more you're able to mentor and elevate the people around you, the more your compensation will grow (if you're in the right place, and thus already at the right earnings bracket)
That's not how successful (software, in this case) teams are made.
Besides, codex wasn't always the answer.
Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.
Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.