p.enthalabs

GLM 5.2 beats Claude in our benchmarks

semgrep.dev · Read Story HN original

Comments

One can also try https://neuralwatt.com using it in opencode.

I think they give $5 trail credits to test with any of the open weight models.

Initially, I was confused where to find their open weight model offering. It's here: https://portal.neuralwatt.com
You can use GLM in OpenCode with a z.ai subscription by default as well. Also it'd be good if you mentioned you were involved with nemesis8.
I think it would be good not to suggest someone run a new Chinese agent on their bare metal.

When I posted the comment I was both the first commentor as well as the first person to upvote the submission. That matters. My name is ALSO on the open source repo that allows Opencode to be run in a container.

That's transparency, maybe not here, but on a clickthrough to Github it is immediately obvioius.

> I think it would be good not to suggest someone run a new Chinese agent on their bare metal.

Not sure a project nobody knows or uses is much better in this regard?

GLM export controls incoming? I predict Commerce will force OpenRouter, HuggingFace to take some open models down within the next few months.

Not that it would make any sense.

>GLM export controls incoming?

US imposing export restrictions on a model from China?

While unlikely , it is not without precedent , there are restrictions on ASML a Dutch company to sell EUV machines
ASML complies as an ally, why would China comply?

The weights are already available and downloaded, is it going to be a crime to have them, run them, make them available? Constitutional rights still exist (I hope)

> is it going to be a crime to have them, run them, make them available?

Now you're getting it! Commerce will call it a munition and those harboring it as harboring illegal/foreign munitions.

No business will take the hit, so they will quickly deplatform the models.

No end user has the GPU capacity to use GLM 5.2 or similar models at full precision so the government will call the problem "mostly solved." But they might choose to "make examples" out of a few people using p2p software to download the weights if they choose to.

Or we use the models to work on fixing vulns and stop over-blowing the doom scenarios. Gotta save the kids and kill the terrorists though!

I'm for making software better instead of banning it based on what the rich and powerful claim.

I suspect the real fear is that open weight models undermine the financials and token prices they thought were going to pay off their ludicrous spending because they have all raced and raised hardware prices.

> making software better instead of banning it

That would be the rational thing to do.

> financials and token prices

I do not think the government thinks this deeply. Market manipulation might be a rational, if unethical reason to ban open source models.

But this admin banned Anthropic models to "own the libs." They will continue to ban what they want for whatever reason they want. I don't think those reasons will be particularly coherent.

Yeah, the current admin is reactionary, they appear to put little thought in, or at least disregard input they dislike. I don't think Ant's ban was about "owning the libs" as much as it was asserting dominance over someone who spoke up counter to the admin's aims and claims. They do listen to money, which is where I see Big Ai paying for executive orders (because the admin forgot what it means to compromise as part of legislating for all americans).
> making software better instead of banning it

We're still in the middle of the cambrian explosion.

If Anthropic was capable of developing Opus 4.49-4.5 2H 2025.... then any company with a research team capable of reading all the papers and press releases will be capable of producing Opus 4.8 by the end of 2027, either raw model competency, or in a harness like claude code (or better with both). I guess what I am trying to say is that Opus 4.5 does not represent the edge of agentic capability, merely somewhere in the thick meaty layer of "functional and achievable".

We can draw the line at Sonnet 4.6 in the US but much like encryption export restrictions in the 1980s, the line drawn will be laughably low within a few years and simply unthinkable in a decade.

> it going to be a crime to have them, run them, make them available?

Yeah. Illegal numbers.

DeCss was short enough to fit in a t-shirt. Americans are larger these days, but not by enough to fit a decent LLM's weights on an XXXXL shirt, even double sided.
That too has precedence , there is long history of controls of cryptographic algorithms up until the 90s. It wasn't abstract either, older greybeards would remember browsers like Netscape had two versions International and U.S. for this reason.

If you classify AI as a weapon which seems to be the direction that we are all heading towards, they yes first amendment rights won't likely apply.

That’s because the Department of Energy originally funded and contributed IP to the EUV Corp joint venture between several semiconductor companies (including ASML and Intel). Their ability to export control EUV was part of that original agreement that the entire technology is built on.
It’d be restrictions on Americans and American companies, and probably also pressure on America’s allies.
Token smuggler sounds like a profession coming soon. For distillation and stuff.
I mean, there are already places where you can buy tokens at 10% of their original cost.
How would that even work for an open-weight model?
Go after the hosts, 99% of people won't be able to run this locally even if they wanted to.
They can easily issue an order for any American company to stop hosting/serving the models. If the model was a threat to national security because of its capabilities then a lot of other countries would follow, including China. No nation will allow some vibe coder with a rogue AI to pose a threat to their systems.

The reason GLM-5.2 hasn't been banned is that despite these cherry picked use cases, GLM-5.2 isn't even close to Opus in all use cases. These vibe benchmarks are ran by companies that are not part of the cyber services offered by Anthropic and OpenAI where they can use the models without the safeguards and refusals so their actual cyber capabilities can be utilized.

These guys that wrote the article compared a gimped Opus to GLM-5.2, knew full well it's misleading, and got the clicks regardless. They don't have enough clout to be a part of something like Project Glasswing, GPT Cyber, etc.

If that happens it'll be an absolute disaster. Imagine a scenario where Anthropic and OpenAI prohibit most US companies from using their latest models because of safety.. And meanwhile attackers use equivalent open source models to attack US companies.

Any prohibition on open source models will do nothing to fix the problem.. since attackers will never feel bound to the law. All advanced models must be available for defensive purposes.

Right, but is there any evidence of intelligence behind any of these (government) decisions? It’s just regulatory capture + marketing (plus some people living out an imaginary fantasy that they’re in Neuromancer or something), absolutely no reason to think they won’t try and target open models as part of this.
There's at least one reason: much harder to make a profit in policing non-american companies and open-source models without huge (or even any) MRR.

If the real motive is profit, then open source models are likely simply not a viable means to that end.

> since attackers will never feel bound to the law.

But that's the whole point.

Fall out of favor with the admin and you lose access to the good American models, aren't allowed to use Chinese ones, and fall prey to the attackers and behind your competitors.

It'd be less about "safety" and more "we've spent trillions developing these AI tools only to have the Chinese, once again, copy them and offer them for pennies on the dollar, and no one seems to care about the impact that has on the long-term sustainability of this sector of the American economy as a whole, so we're yanking the models."
"I'm going to take this box razor and make some really deep cuts around the middle of my face because my tech sector is too good and that's actually a bad thing because $foreigners."
I'm not saying it's necessarily a good thing. I'm also not saying it's about foreigners at this point. It's about seeing a bet through. They've burned a metric crapload of capital on developing AI models and the infrastructure to host them. They want that money back and then some. Remember, the fine shareholders of OpenAI think that 100x returns just aren't reasonable and want that cap lifted. If this kind of thing continues, they'd be lucky to make their money back at all, let alone 100x.

Which would be fine, but as we know, people securitize the crap out of their investments these days, and least some people probably leveraged themselves on some US AI companies, so now the risk is spreading outside of the sector to the economy in general, which is made worse by the sheer amount of spending on AI.

And someone will start a competing company in a sane environment.
OpenAI and Anthropic are already unable to make SOTA models generally available (and support this, oddly enough).

If huggingface or whatever is forced to take down open source licensed weights, there’s always bittorrent.

Export controls are one thing, but the US doesn’t really have import controls, and there’s no copyright issue, so DMCA, etc don’t come into play.

It’d take the courts years to decide how to contort the law to ban open weight models, and by then, it’ll be too late (and also pointless).

They did the same by banning strong encryption. Never underestimate the stupidity of politicians
I think state-of-the-art AI is going to be defense industry only from now on. We can have our toy drones but not the Predators and Reapers.
the things that empower modern toy drones were export restricted for years before hand.
Turns out toy drones are more useful in war than multi million dollar planes anyway.
Reaper and Predator are both drones and there’s really no comparison to toy drones in terms of sheer destruction and capabilities in general, the comparison is actually quite apt imo.
Which ones are the ones Ukraine has used to bomb Moscow?
You're right. Toy drones have proven vastly more effective IRL.

The others are a waste of taxpayer money. Extraordinarily low return on investment (kill-on-investment?)

The Americans may ban the use of the Chinese models in America. But like the Chinese car ban, everyone else will use them.
That's not necessarily a good thing for everyone else, mind.

Yes, you get your free model, but the cost of this is not developing your own capability and tying your fate to a country which may or may not have your best interests as a nation in mind.

This is just the deindustrialization that occurred in my home region (the American Midwest) playing out on a global scale in different sectors. It was originally driven by the Japanese, who, to their credit, acted more as partners than competition. Eventually that desire for larger margins went to China, and now you basically can't build anything of consequence without at least some Chinese parts, because there's "no economic case" for it. This means that you have to play Beijing's game if you want access to any sort of modern market.

You see this happening with Volkswagen's restructuring, next you'll see it with non-American, non-Chinese AI.

It's not really the same because we already have the model. If China stopped letting us have it tomorrow I'd doesn't matter because... We have it already
So... how's that any different from using American stuff for those of us in the rest of the world?

Over the last decade, the US has been way more unreliable than China. There's been a near constant negative impact from the US doing something.

At least with China, we are very good at winning trade wars with them here in Australia.

You might feel differently if you were a Filipino or Vietnamese fisherman whose family relied on the income from the stocks of the South China Sea, or a Uighur person living in Western China, or a Ukrainian soldier who has to deal with drones built with Chinese components, or a democracy advocate in Hong Kong, or arguably, a person who had plans for 2020-2021.

Or, on a more local note, an Australian automotive worker who worked for a company that figured out 10 years ago that they wouldn't be able to pay him a decent wage, compete with the then-upcoming Chinese EVs, and remain profitable.

You might feel different if you're a palestinian who's getting american bombs dropped on him, or an afghani collateral damage or...

There is no good guys in general, and whataboutism and making the scope bigger doesn't help.

The thing is that if the models you are building on are open source whether hosted on chinese / american / whatever service at least give you an option to switch provider easier vs a fable / chatgpt 5.6 that gets banned for none americans etc...

2 years ago america would have had the branding/perception advantage but right now that is well and truly gone...

> palestinian who's getting american bombs dropped on him

So far as I know, there have not been any offensive American operations in the West Bank or Gaza. Do you have sources for any?

> There is no good guys in general, and whataboutism and making the scope bigger doesn't help. The thing is that if the models you are building on are open source whether hosted on chinese / american / whatever service at least give you an option to switch provider easier vs a fable / chatgpt 5.6 that gets banned for none americans etc...

Then it'd make far more sense to use a native open-source AI project than to use Chinese AI. Personally, I've been looking into Mistral AI for my own uses.

> 2 years ago america would have had the branding/perception advantage but right now that is well and truly gone...

Hardly. Let's not pretend that much of the world, particularly Europe, hasn't had a long and storied history of staring down its nose at Americans. At a certain point you just stop caring, regardless of your political persuasion.

More what aboutism American Indians, Aborigines, Māori, Sami, New Caledonia, the Kanak people what do they all have in common? Sent to re-education camps at some point in time, some of them sterilized, And all treated his second class citizens. One of the reasons most countries are relatively quiet about the Chinese is that so many other countries have indigenous people that were treated pretty much the same at some point in time in their history…

Stop pretending there’s some type of moral high ground there isn’t. Disgusting.

> Or, on a more local note, an Australian automotive worker who worked for a company that figured out 10 years ago that they wouldn't be able to pay him a decent wage, compete with the then-upcoming Chinese EVs, and remain profitable.

I don't understand what your point is? This seems like a perfect example of comparative advantage - Australia can produce iron ore cheaper than anywhere else in the world and even when China launched a trade war against Australia the Australian economy kept growing.

There wasn't even any bump in unemployment from the closing of the car industry.

Once that trade war was settled, Australia got cheaper cars, China got cheaper iron ore and both economies won.

The rational behavior on both parts there is in stark contrast to current US policy, which is unpredictable and capricious.

> You might feel differently if you were a Filipino or Vietnamese fisherman whose family relied on the income from the stocks of the South China Sea, or a Uighur person living in Western China, or a Ukrainian soldier who has to deal with drones built with Chinese components, or a democracy advocate in Hong Kong, or arguably, a person who had plans for 2020-2021.

This seems like a random list of complaints about China and I agree with them in general.

I think you'll find most major powers have similar complaints. There certainly are against the US - I think you might find that both the Philippines and Vietnam(!) have fairly mixed feelings about the US for example.

> or a Ukrainian soldier who has to deal with drones built with Chinese components

man you're gonna be disappointed when you learn where the components for Ukrainian drones come from (spoiler alert, it's China 95% of Ukrainian drone manufacturers use Chinese components. Both Ukrainian and Russian drones are Chinese components glued together, the vendors in China literally stagger Russian and Ukrainian buyers on the factory floors to not have them run into each other). The largest trade partner of Vietnam and the Phillipines is China.

The kind of thinking that assumes that rivalry implies deglobalization or bloc politics is exactly what's 30 years out of date. It's projecting how Americans think on the entire world, but that's not how the world works any more. The rest of the world continues to globalize, even through war.

America is undergoing Sovietization and erecting an Iron Curtain, and China ironically enough is simply doing what the US used to do. If Americans think the rest of the world will follow them into isolation they're going to make the same discovery the Russians did in the last century.

Technically speaking, Chinese cars have not been banned. They are subject to a 100% tariff. They’d still be price competitive, but the manufacturers haven’t bothered jumping through the regulatory hoops.

I’ll happily pay a 100% tariff on open weight models, and there are no regulatory hurdles for them to jump through (yet).

Cool then everyone will just change their config to route through a provider overseas for an added 50-100ms latency. Who cares.
Countries and businesses that don't want to be sanctioned by the US government or the US financial system care - so all western countries and corporations.
> GLM export controls incoming? I predict Commerce will force OpenRouter, HuggingFace to take some open models down within the next few months.

I’m sceptical they could find the legal framework to do this even if they wanted to

They have legal authority to (a) prevent export of US goods/services; (b) ban imports of physical goods; (c) ban transactions (including purchasing services or license agreements) with foreign firms

But I’m not aware of any legal authority which lets them ban US firms from running a Chinese-developed open source AI model in the United States, if they are at arms length from the vendor, and aren’t using it for government contracts or regulated applications

Possibly they could order HuggingFace/etc to suspend Chinese accounts. But if someone in the US (or a third country) downloads the model from China then reuploads it to a US server, completely independently of the vendor - where is the legal hook to prohibit that?

They could ban payment processors from processing payments to any hosts of GML 5.2, despite the open weights the vast majority of people will be using cloud providers to get access since it is to heavy to host for 99% of people.

This would be extremely heavy handed and probably end up accelerating the loss of the virtual US monopoly of payment network. The reast of the world isn't going to let the US dictate that only they get the frontier models whether their US made or otherwise

> They could ban payment processors from processing payments to any hosts of GML 5.2

Can they actually though? Do they have legal authority to tell a payment processor that it has to block transactions of a legal US company, just because the company is hosting a Chinese-developed open source model? I’m sceptical

And what about companies (e.g. AWS) that let you “bring your own model”?

It would be extremely heavy handed but the administration has sanctioned the International Criminal Court judges such that they basically have no access to the Wests modern financial system. I think domestic US providers would have to deal with different ways but someone like Herzner could easily be cut off from the financial system if the administration doesn't feel that they are adequately blocking the model
Swapping the footgun for a huge long-range boomerang doesn’t mean it’s not going to eventually swing around and whack you in the back of the head.
100% agree and don't think it will come to that but I won't completely put it past this administration
> It would be extremely heavy handed but the administration has sanctioned the International Criminal Court judges

That's sanctioning specific individuals for specific acts they performed which the US claims contravene its interests and those of its allies.

I don't agree with the ICC sanctions, but it really can't be compared with the proposal "sanction any company, even US domestic entities, which use a Chinese-developed open source model".

In fact, I think part of what enables the US to sanction them (under US law) is the fact they are neither US citizens nor residents; if they were US citizens living in the United States, I don't think the President would have the legal authority to impose those kinds of sanctions.

They could sanction Hetzner–because it is a German firm based in Germany. I don't see how they could sanction a US firm based in the US whose owners and staff were US citizens.

Also, the 5th Circuit Court of Appeal decision Van Loon v Treasury (Nov 2024) is relevant–it held that IEEPA (the law used to sanction ICC officials) couldn't be used to sanction the Tornado Cash smart contract system, since open source code wasn't "foreign property" under IEEPA.

Label AI as porn and the payment processors will cut their ties automatically.
> I’m sceptical they could find the legal framework to do this even if they wanted to

I agree, my only caveat is that the current administration has shown it's willing to go beyond aggressive regulatory interpretations to questionable and outright implausible interpretations. As we've seen recently, the federal courts and SCOTUS are overturning most of these but that can take a year or more to resolve. The one positive light is they seem to push the hardest on certain culture war issues (immigration, voting, districting, etc). AI doesn't seem like a core hot button issue for the White House and there is a strong pro-AI / business faction.

OpenRouter or Huggingface should consider moving to Switzerland
Obvious answer: build all your open source LLMs into firearms, get the SC to grant 2A protections.
Here, it appears they compare a single prompt "find IDOR", against a multi-agent system. However, one can also start far more sophisticated skills that spin up subagents and mostly do the same in Claude Code, Codex, OpenCode, Pi, etc.

Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?

I think the point is less "how can we throw shade on the OP" and more "a harness can enable a lot of models to do very serious cybersec, glm 5.2 is one of them"
Are you replying to a response to the original comment? I looked but i didn't see anyone saying he's throwing shade.
You have to forgive the GLM bot. It's not very good.
It reads like an ad.

Secondly these are "just" IDORs, arguably the easiest class of vulnerabilities.

Thirdly it compares to GPT 5.5 and Opus 4.8.

No, we don't have Mythos at home.

>Thirdly it compares to GPT 5.5

mythos is <10% ahead of gpt 5.5 on all benchmarks, which it gains by being several times the size of opus. had it been economical to provide, it would've been released to the public on day one instead of the marketing circus those effective altruism clowns had exhibited. admitting that it costs >1000% to run inference on a <10% better model would've been very damning.

> it costs >1000% to run inference

do you have a source for this claim? i thought LLM providers earn high margins from inference (charged by token). is this no longer the case?

This was just theorised. The leaked OpenAI financials suggest otherwise (because of shady naming of losses)

The only ones who seem to profit are the ones running smaller Chinese models. Even NVIDIA seems to have to "reinvest" their profits into sponsoring companies to buy their cards now.

if a $6000000 cabinet can generate 10000/s tokens of Opus but only 1000/s tokens of Mythos, then Mythos costs 1000% to run no matter the markup.

no one has a source, because no one knows closed model parameter counts. we have only heuristics which strongly indicate that Mythos is simply a big fucking model that any other lab could make an equivalent of.

In my experience, GLM 5.2 is extremely good at finding vulnerabilities, and more importantly, unlike Opus, I've never seen it refuse a command. It genuinely is a very strong model for finding and fixing vulnerabilities.
More importantly, unlike Mythos and Fable, you can actually use GLM 5.2! It's not just marketingware that got its founder in hot water with the government.
Yeah they straight up say that their criteria is narrow and primarily important for their specific use case. Never let rationality cause your pitchfork to be cast away though!
Technically we don't have Mythos at all? You guys have access. This tells me we have Opus at home (open weights).
> Thirdly it compares to GPT 5.5 and Opus 4.8.

> No, we don't have Mythos at home.

That's still useful. To paraphrase the kids these days, GLM5.2 is in the room with us, today. Mythos is not. And for us in the EU, it's even more complicated, as Mythos might be with us in the room one day, and go poof the next day, on the whims of political entities that we have 0 control over.

Knowing where open, accessible, local models are is important. We know they're behind. But there comes a time when "good enough" is useful. Even if they're "just IDORs" today, and even if they're behind SotA today.

As someone else said above, GLM5.2 (and other models in the same tier like kimi, dsv4, etc) is / are slowly becoming "good enough" to assist in automated repo prepare work (download, install, test, edit, re-test, etc). And that translates in RL traces ready to be trained into the next generations. That might be more important than x% behind on benchmarks.

> beats Claude in our Cyber Benchmarks

Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).

It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.

Opus 4.8 according to TFA. Whether or not the safety guardrails were responsible for the difference is an open question but for a dev who wants to secure their software who doesn’t work at one of the blessed Glasswing companies it doesn’t really matter why, it matters what the best tool you actually have is.
They say "Claude Opus 4.8" in the first paragraph.
We're supposed to read the article?

How are we supposed to stay skeptical of everything if we read anything!?

Anthropic's own models perform differently under the same version depending on how much they've decided to quietly downgrade them.
These numbers are seem pretty low compared to what I was able to achieve specifically around windows kernel, win32k<->win32u to be exact. It honestly wouldn't surprise me anymore if china started surpassing models that US makes public, at least in specific categories such as cyber.

GLM 5.2 is already capable enough to assist in self-training which is similar to what we saw happen with frontier models and they appear to be getting there at a significantly lower cost than openai/anthropic.

It will almost for sure surpass the models which Trump will allow US "allies" (which he just considers client states) to use. This, together with China's growing dominance in PV, rechargeable batteries, EV, could really be the nail in the coffin for the post WWII economic world order.
Honestly, it's becoming increasily hard to disagree with such sentiment when china is preparing itself to lead in energy, manufacturing, research, chip production and so on while there's an entire group of people trying to put datacenters in space.
You are delusional if you think China is going to let Europe have access to Mythos level models for free.
To hurt the US, maybe. I have not tried it, but GLM here seems already pretty capable.
What does "free" have to do with anything?