If you're already used to your TUI coding agent, you don't need the desktop agent. Although it is nice that it is there for folks who prefer the Codex App/Claude App UI approach.
m3h · 2026-07-01 19:43:22 UTC
Also, kudos to the Z.ai team for adding Linux support from day one.
InsideOutSanta · 2026-07-01 19:56:51 UTC
Yeah, I use GLM 5.2 in OpenCode, running in a Docker container with CodeNomad as the web-based GUI. It works perfectly; I can access it from anywhere, and it runs all models (except for Anthropic's subscriptions).
owentbrown · 2026-07-01 20:00:06 UTC
From your experience, is it comparable to Claude Code with Opus 4.8? How does it feel? How do the two differ?
InsideOutSanta · 2026-07-01 20:15:21 UTC
It's comparable, but not the same.
For some tasks, it's better. Opus refuses tasks for me pretty regularly. GLM 5.2 has never refused a task. So for anything security-related or that touches on topics that trigger Opus's safety guardrails, I use GLM 5.2.
OTOH, for anything related to UI design, I use Opus 4.8. It's much better at taking relatively vague descriptions of user interfaces and a mockup of a related UI and combining them into an immaculate design.
For anything else, I tend to run tasks in Opus and then have GLM review them and write a Markdown file with anything it finds. Then I have Opus review the markdown file and fix the issues it agrees with. The reason I usually go with Opus 4.8 first is mainly that it's faster. Opus 4.8 is, on average, about twice as fast as GLM 5.2 running on z'ai's infrastructure for the same task. There's a large variance (sometimes GLM 5.2 is pretty fast and Opus 4.8 is pretty slow), but on average it's a very noticeable difference.
When I run into Anthropic's Quota, I switch to GLM 5.2 rather than Sonnet. I don't think there's much reason to ever use Sonnet for anything if you can use GLM 5.2 instead.
This is all pretty subjective, of course. On average, I think Opus 4.8 is still a better, more reliable, and faster model, but if it went away tomorrow and I only had GLM 5.2, I wouldn't be too sad about it; I'd get things done with GLM 5.2 just fine.
sparkling · 2026-07-01 20:28:49 UTC
Thank you, this is the type of hands-on experience report i was looking for.
drschwabe · 2026-07-01 20:35:20 UTC
Are you micromanaging your GLM costs? It seems the best bang for buck strategy right now is a Opencode Go subscription to get the subsidized rate and then switch to Openrouter's model above and beyond that + make use of a dual model strategy by having GLM 5.2 do planning and Deepseek V4 Flash for implementation.
InsideOutSanta · 2026-07-01 20:40:01 UTC
No. I got the yearly highest-end GLM subscription when it was available for a few hundred bucks. I haven't run into quota limits even once.
drschwabe · 2026-07-01 20:51:35 UTC
Nice, lucky! The Opencode Go GLM 5.2 quota gets used up so fast. It's an expensive model. And while impressive for being open weight, it seems slower than Opus and GPT. So I typically only use it after exhausting quotas of discounted GPT5.5 or Opus 4.6^ paid plans.
InsideOutSanta · 2026-07-01 21:26:51 UTC
Yeah, it's definitely slower.
andy99 · 2026-07-01 20:50:57 UTC
Do you guys use it through open router? Do you have any concerns about how the data you send is being intercepted? Not that I trust Anthropic but it’s widely agreed that it’s kosher to use them for commercial work, I can’t see comfortably sending any customer data to openrouter.
Edit- I see down-thread you use z.ai directly. Same concern, aren’t you worried about using it for professional stuff.
InsideOutSanta · 2026-07-01 21:26:12 UTC
I'm worried, but I'm worried about all of these providers. There's a good chance Anthropic and OpenAI will go bankrupt in the next five to ten years, and all of their data will go to the highest bidder.
There's no customer data sent to anyone, though. I run OpenCode and Claude Code in a Docker container that only has access to a subset of my code base. There are no secrets in there, and I'm vaguely ok with z.ai using this to train their models.
binarymax · 2026-07-01 20:53:22 UTC
What kinds of tasks does Opus refuse? I’m a light daily user for the past 3 months and Opus has never refused a task for me.
andy99 · 2026-07-01 21:05:21 UTC
I’ve never had a refusal coding, and in some areas (AI red teaming specifically) I’ve found it quite good at recognizing and discussing “white hat” stuff that in the past I think would have got refusals.
But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.
binarymax · 2026-07-01 21:13:12 UTC
Yeah, I’ve had Opus (and Fable) perform full security audits on my codebases that would run for 30mins. That’s what I think would have tripped it but went just fine.
InvertedRhodium · 2026-07-01 22:45:12 UTC
Try using it as an agent to perform black box security testing on a live instance of your codebase (assuming it's a hosted service).
vidarh · 2026-07-02 07:33:59 UTC
I had it debug why Firefox crashed on my prototype X11 server and got a refusal when it started digging into what exact payload triggered the crash.
But that's the only refusal I managed to get.
InsideOutSanta · 2026-07-01 21:20:49 UTC
One project I have deals with countries, and any time it touches code related to countries, it stops.
I've also had it refuse security-related tasks, and occasionally it stops without any discernible reason.
raesene9 · 2026-07-02 09:24:16 UTC
The later Opus models (4.7/4.8), Sonnet 5, and particularly Fable 5 will refuse to do tasks related to offensive security.
One example I've hit is working on a benchmark of how well LLMs handle Kubernetes security tasks, there's a section on them exploiting security misconfigurations. Opus 4.6 was fine with that section, 4.7 and 4.8 saw some refusals and Fable point blank refused to do any of it.
The only other model I've seen refuse is OpenAI GPT-5.5, all the open weight models seem fine with it.
Ofc if you need to do that kind of work a lot you might be able to get on OpenAI/Anthropics allow-list for cyber work.
Havoc · 2026-07-02 12:25:14 UTC
I believe the incentive here is more tokens. I recall limits being more generous with their inhouse harness
seizethecheese · 2026-07-01 19:44:34 UTC
I'm somewhat surprised that this is not open source (from what I can tell). Compare to Mimo Code https://github.com/XiaomiMiMo/MiMo-Code (which is a CLI, while this is a desktop app).
dizhn · 2026-07-01 19:55:07 UTC
It's only a cli because they yanked out the opencode desktop code. (As well as the opencode go/zen model provider)
Edit: my theory is they wanted to mimic being the primary provider in a quick way with a lot of string replace. Though they could have added opencode back as a regular provider.
versteegen · 2026-07-01 23:54:18 UTC
MiMo Code adds a lot of cool orchestration features to OpenCode! It definitely is NOT a quick find-replace job, it's genuinely someone's research project to create a better agent harness building on top of free software, and that's awesome. See https://mimo.xiaomi.com/blog/mimo-code-long-horizon
dizhn · 2026-07-02 09:35:22 UTC
They did remove the opencode provider though and the desktop and web interfaces. I was trying to be charitable.
By the way, their repo was a bit weird with no changelogs at all. It seems to be picking up speed now with their communication. I actually read in the changelog just now that their Compose (plan/executre/review etc. something like that) flow is now deterministic with software instead of just prompts. That could be really good.
SwellJoe · 2026-07-01 20:08:17 UTC
I don't even know what I would do with a desktop app. I'm running these things in headless VMs, so I can run them with `--dangerously-skip-permissions` or whatever. I don't trust them, even without that flag, on my desktop/laptop.
InsideOutSanta · 2026-07-01 20:49:51 UTC
Zcode allows you to connect to a Docker container, or to a VM using ssh.
teaspoon · 2026-07-01 20:52:32 UTC
Good desktop apps in this category can manage agents across any number of remote SSH hosts.
SwellJoe · 2026-07-01 21:20:39 UTC
But, it's still running on my desktop/laptop. I don't trust them to run on my machine. But, I guess I could run one VM with a desktop to contain the desktop app. Or, just keep using CLI agents.
scorpioxy · 2026-07-01 22:20:18 UTC
Is the trust concern for the agent running in any form on your machine? Like in a VM on your machine as well or do you mean on the host itself?
I have read about people giving an agent full access to their main system saying they have nothing of value. To me, that's a strange opinion to have with the distinction between what's private and what's secret.
SwellJoe · 2026-07-01 22:52:20 UTC
I don't run agents directly on my desktop/laptop machine. I run them in VMs or containers (sometimes in containers on VMs). There have been too many credentials stealing exploits via prompt injection and the like for me to be willing to let an agent roam around on my personal system.
I've also started creating new github deploy keys for each repo in use on a VM, so the blast area for any given agent disaster is "a couple/few github repos and whatever credentials were needed for the agent/model".
I wouldn't let a coworker, even one I know pretty well, log into my personal account on my machines...why would I let an agent that can be tricked into uploading all my credentials to an attackers web server?
The agents have sandboxes, but those are loose. Not enforced by anything outside of the agent harness itself.
notshore · 2026-07-01 23:04:43 UTC
I'm working on a credential broker that would keep credentials vaulted and parcel out access on a per-grant basis. Is that something you'd find useful or is your setup comprehensive enough? We would be allowing people to draft access policies with natural language, I figured it would be useful for things like vercel, stripe access etc.
0gs · 2026-07-02 01:04:43 UTC
fwiw, i built something simple like this into my harness thing (github.com/0gsd/enough). may not be complicated enough to do per application nowadays vs. needing a modularized outside solution, but it is certainly a good idea that seems to work!
UnlockedSecrets · 2026-07-02 03:51:18 UTC
Not at all would i ever within the current technology constraints trust a "natural language model" to secure access to my own credentials, i will always keep it as completely isolated from anything at all i would consider 'risky' and pre-define before it begins what it could possibly access through a brand new VM with only the absolute minimal access to any git repo's and completely restrict to the extent that is allowable, it's ability to do anything outside of it's own playground. The playground is disposable, the potential for the LLM to access any of my own accounts and wreak havoc on the trust in my network is unacceptable under any rules....
scorpioxy · 2026-07-01 23:48:28 UTC
Oh yeah, that sounds wise to me. Some people don't run the agents on a VM on their own machine and opt for a VPS somewhere. And I was wondering if privacy and security had anything to do with their decision.
Avicebron · 2026-07-02 01:17:55 UTC
This is what I do, VMs in proxmox. It works really well.
chrisweekly · 2026-07-02 02:27:30 UTC
Have you seen smolvm (from smolmachines)?
drnick1 · 2026-07-02 04:41:41 UTC
Do you not find a dedicated UNIX user to be sufficient for the sake of protecting personal files, SSH keys, etc?
Operyl · 2026-07-02 04:56:36 UTC
It's all fun and games until the model is smart enough to figure out privilege escalation, i.e. a lot of people don't realize Docker enabled on a regular user is enough for privilege escalation if you "follow the tutorials."
krzyk · 2026-07-02 05:39:01 UTC
Agent that can apt-get is more useful.
QuantumNomad_ · 2026-07-02 09:01:35 UTC
When I was in university in 2009, the student union I was in had set up their Linux computers with a small program that one of the members wrote, that had the suid bit set and would exec apt-get install passing the arguments along.
This way, all members of the student union were able to install any software they wanted to on the student union computers without having to give out blanket root access to the members. Only a select few members had full root access.
There’s other ways to achieve the same too.
And you can do this exact same sort of thing for the user that your agent runs as too, without having to give it access to do everything that root can.
edouard-harris · 2026-07-02 09:45:54 UTC
> The agents have sandboxes, but those are loose. Not enforced by anything outside of the agent harness itself.
You might want to check out Ant's open source srt [0], I use it to contain my local coding agents. It's strict by default and enforced at the OS layer.
For local tasks you can only give agents delegated that execute your deterministic read or write on an allowed set of files(e.g pi does this) and execute rights only on containers with no network access. That should get you 95% unblocked for most tasks you want to do with an LLM pretty safely.
You can do a brainstorming with web on a remote container prototyping based on that brainstorm on another container with no network access.
The one thing that is less trustworthy is using local agents for service management, you definitely want to have them scoped to dev/testing. I would never trust an agent to execute any command in production or sensitive data at all
csomar · 2026-07-02 01:22:28 UTC
I mean, if the execution happen on the VM then the problem is trust on the programs and then you can't trust any program by that logic? That or you think AI-companies software is serious slop.
jen20 · 2026-07-02 02:38:57 UTC
Slop is less of a problem than the incentive such companies have to “accidentally” hoover up whatever data is accessible.
miroljub · 2026-07-02 05:41:59 UTC
Do you also run your browser in the VM? Why would an agent be less trusted than any other piece of software?
SwellJoe · 2026-07-02 05:57:16 UTC
I don't run anything but the agent and the project it's working on and the tools it needs to work on the project in the VM.
You can't see how the agent having no access to anything other than what it's working on is safer than the agent having access to my home directory with all of my credentials?
Look, you do whatever you want to do with your agents and your computer. I'm going to...contain them.
Seriously, you dont see any difference? A agent is non deterministic and may delete or change you data as a normal matter of operations. A browser, barring bugs or security issues, would not delete or modify the data you have outside the browser.
nutjob2 · 2026-07-01 21:40:38 UTC
What's stopping a CLI from doing the same?
I've never used IDEs and never will, why are these things being constantly shoved down our throats?
mattnewton · 2026-07-02 01:20:38 UTC
But then I close my laptop and it’s not running on the headless host anymore right
SwellJoe · 2026-07-02 02:22:07 UTC
That's also true if you're running the agent directly on your laptop OS.
In that case, maybe you want VMs at hosting providers. There are companies building ephemeral VM and container orchestration layers for this kind of thing, I haven't played with them, though. It seems like a reasonable idea, though. One isolated environment per project or repo. Only the secrets needed for that one project and an agent that can't reach outside of it.
I've considered building something along those lines, and actually do run my security auditing benchmarks in containers automatically (that was originally to prevent the models from cheating, because you can disable network, but it has other pleasant side effects).
It's actually not that big of a lift these days to spin up containers on-demand and put just what's needed inside it (including the authentication info for the agent). I probably should automate it..right now I just have four permanent VMs setup for my various types of work: My day job, my open source projects, my benchmark and security work, and some side projects. Plus some temporary ones for experiments.
anavat · 2026-07-02 05:21:03 UTC
No, it actually continues running headless on the host, and you can reconnect from another laptop or mobile phone, or even ssh to the host and attach to the session. At least Codex desktop app works this way.
dandaka · 2026-07-02 07:26:57 UTC
Codex, Claude Code, ZAI — they continue work in headless mode, when you close your laptop, if you have connected to remote machine
htrp · 2026-07-02 02:10:22 UTC
Examples here?
FergusArgyll · 2026-07-01 22:02:08 UTC
I finally repurposed an old server just for that and for anyone reading who has not had a chance to use --dangerously-etc. it's awesome, do it :)
ahmadyan · 2026-07-01 22:56:47 UTC
a well-design IDE should abstract that away, i.e. run the agent in the headless VMs while give you an abstraction that you would feel like you are running the agent locally with all the benefits (editor, browser, diffs, debugger, etc)
aussieguy1234 · 2026-07-02 02:20:53 UTC
I just back up my entire home folder to another device, then let it rip
knocte · 2026-07-02 06:01:44 UTC
I shared your fear some weeks/months ago so I was always using my harness in the cloud. However, latency started to become an issue when I traveled to other countries where I needed a VPN... so I ended up cooking skynot to be able to trust running my harness in my own computer: https://github.com/tarsgate/skynot (PRs welcome if you want to add support for another harness different than Pi)
nicoty · 2026-07-02 06:32:24 UTC
I've contributed to https://github.com/0xferrous/agent-box which allows you to bind-mount git repositories into containers that agents operate in, preventing the agents from accessing files that aren't bind-mounted. Your usual .gitignore can then be used to also ignore files within the repo to be bind-mounted, which prevents agents from accessing them at all, essentially working as a sandbox.
I also maintain https://github.com/nothingnesses/agent-images which allows you to use Nix to reproducibly spin up OCI container images containing agents and any other tools you need for development and use these with agent-box.
I use both at the moment to work on some personal projects with agents, where I set up multiple separate git worktrees for the agents to work in, preventing them from accessing anything outside of the worktrees and from trampling over each other's work.
raphinou · 2026-07-02 06:50:37 UTC
In case anyone is interested, I'm also using bash scripts to run my agents in containers. It's simple, but has only bash and docker as dependency: https://github.com/asfaload/agents_container
What's your setup like and what do you use it for?
I have a M2 Max MBP with plenty of ram and I use VSCode + Zoo Code plugin with Qwen3-Coder-Next-GGUF:UD-Q4_K_XL to run local agentic coding sessions, but I'm intrigued by being able to run headless as I could probably run multiple instances in parallel to do stuff?
Like are you using UTM with some pre-built VM and a local LLM?
Curious.
LaurensBER · 2026-07-01 20:18:49 UTC
They might be sending some user requests to Anthropic to gather trading data for their own models. If they do so, perhaps they need to add some tracer to request that they prefer to hide.
bogdan · 2026-07-01 20:38:37 UTC
Source? Or is it "trust me bro"?
embedding-shape · 2026-07-01 20:49:43 UTC
Literally just FUD unless someone has code to point at.
anakaine · 2026-07-01 20:55:41 UTC
Verbally minimising potential threats is not a valid approach to managing risk. We have seen mass misuse of tokens acquired through nefarious means to distill models and enhance training as a way of catching up recently, among other related issues. It is quite appropriate to wonder what else might be going on.
_aavaa_ · 2026-07-01 22:24:53 UTC
Those nefarious distillers, only we are allowed to freely distill the world’s knowledge into our paid products
DonsDiscountGas · 2026-07-01 21:11:00 UTC
"might" means pure speculation
fwip · 2026-07-01 20:55:29 UTC
Wireshark would catch that easy-peasy.
benatkin · 2026-07-01 22:37:32 UTC
The request would need to be done from their service, so as not to expose the API key, and because it just makes sense. They could probably directly proxy it and Wireshark couldn't catch it, due to everything being HTTPS. But people could probably catch it by decompiling, so it would make more sense to have the server make the request as part of a GLM request. Not that I think this is plausible - I'm not sure.
jijji · 2026-07-02 00:36:18 UTC
or more likely, sending it to the CCP
neonstatic · 2026-07-02 00:53:59 UTC
Californian Communist Party?
WhyNotHugo · 2026-07-02 13:21:29 UTC
California has had a ban on the Communist Party since the fifties.
bermudi · 2026-07-02 01:36:51 UTC
I wonder if you're as cynical and untrustworthy of American companies as well or is it more of a racism kinda thing
MrDrMcCoy · 2026-07-02 02:40:27 UTC
Everyone should distrust them equally. Only local agents in a detached network namespace are safe from data leaks. It is perfectly reasonable to assume they are using our sessions to train on, since everything else short of nuclear launch codes is already there, and they need to keep feeding it.
LaurensBER · 2026-07-02 06:52:54 UTC
This is an extremely weird comment that doesn't add anything to the conversation.
Here on HN we discuss facts, jumping straight into racism has no place here.
maxloh · 2026-07-01 20:34:50 UTC
I don't find a closed-source Chinese agent system trustworthy.
It is essentially a black box with full user permissions, meaning you are just handing over your entire system to a Chinese-owned server. With OpenCode and its GLM provider, at least I can monitor which files were read, which were edited, and what commands were executed.
Not to mention that Chinese national security laws legally obligate companies to cooperate with state intelligence and counter-espionage efforts [0]. If you have this installed on a corporate workstation, and your company is large enough, the possibility of them spying on you is not just a risk—it's almost a certainty.
You shouldn’t find American ones trustworthy either.
saghm · 2026-07-01 20:38:15 UTC
Given that there's such severe concern being expressed by Anthropic about Claude being distilled, and the idea that the harness is part of the the moat, it doesn't seem super surprising that the other side of that would try to also make it harder for them to tell how well they're doing and what their approach is.
JSR_FDED · 2026-07-01 22:40:55 UTC
Unlikely considering they’re publishing the Crown Jewels (GLM 5.2) as open weights.
lelanthran · 2026-07-02 06:55:59 UTC
> and the idea that the harness is part of the the moat,
That idea is wrong, though. These same people thinking harnesses are part of a moat are also boasting that s/ware is easily writable now.
There's no secret sauce in a harness that you can't vibe-code into your own harness.
jorisw · 2026-07-02 09:11:45 UTC
> vibe-code into your own
Except you'd need the knowledge of what to vibe-code, no?
lelanthran · 2026-07-02 12:01:06 UTC
> Except you'd need the knowledge of what to vibe-code, no?
What knowledge? If you've used a harness, you know what it is supposed to do for you!
What further knowledge do you need that can't be extracted from an existing harness?
cco · 2026-07-01 21:45:35 UTC
You're surprised? I think harnesses are almost as important as the underlying model. Folks have been able to improve benchmark results by nearly 2x based on harness alone.
Harnesses are quickly becoming critical components of the "model" itself imo. Not shocking to me at all that a company that spots a revenue opportunity is keeping its harness closed source.
bermudi · 2026-07-02 01:38:23 UTC
Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
Comments
If you're already used to your TUI coding agent, you don't need the desktop agent. Although it is nice that it is there for folks who prefer the Codex App/Claude App UI approach.
For some tasks, it's better. Opus refuses tasks for me pretty regularly. GLM 5.2 has never refused a task. So for anything security-related or that touches on topics that trigger Opus's safety guardrails, I use GLM 5.2.
OTOH, for anything related to UI design, I use Opus 4.8. It's much better at taking relatively vague descriptions of user interfaces and a mockup of a related UI and combining them into an immaculate design.
For anything else, I tend to run tasks in Opus and then have GLM review them and write a Markdown file with anything it finds. Then I have Opus review the markdown file and fix the issues it agrees with. The reason I usually go with Opus 4.8 first is mainly that it's faster. Opus 4.8 is, on average, about twice as fast as GLM 5.2 running on z'ai's infrastructure for the same task. There's a large variance (sometimes GLM 5.2 is pretty fast and Opus 4.8 is pretty slow), but on average it's a very noticeable difference.
When I run into Anthropic's Quota, I switch to GLM 5.2 rather than Sonnet. I don't think there's much reason to ever use Sonnet for anything if you can use GLM 5.2 instead.
This is all pretty subjective, of course. On average, I think Opus 4.8 is still a better, more reliable, and faster model, but if it went away tomorrow and I only had GLM 5.2, I wouldn't be too sad about it; I'd get things done with GLM 5.2 just fine.
Edit- I see down-thread you use z.ai directly. Same concern, aren’t you worried about using it for professional stuff.
There's no customer data sent to anyone, though. I run OpenCode and Claude Code in a Docker container that only has access to a subset of my code base. There are no secrets in there, and I'm vaguely ok with z.ai using this to train their models.
But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.
But that's the only refusal I managed to get.
I've also had it refuse security-related tasks, and occasionally it stops without any discernible reason.
One example I've hit is working on a benchmark of how well LLMs handle Kubernetes security tasks, there's a section on them exploiting security misconfigurations. Opus 4.6 was fine with that section, 4.7 and 4.8 saw some refusals and Fable point blank refused to do any of it.
The only other model I've seen refuse is OpenAI GPT-5.5, all the open weight models seem fine with it.
Ofc if you need to do that kind of work a lot you might be able to get on OpenAI/Anthropics allow-list for cyber work.
Edit: my theory is they wanted to mimic being the primary provider in a quick way with a lot of string replace. Though they could have added opencode back as a regular provider.
By the way, their repo was a bit weird with no changelogs at all. It seems to be picking up speed now with their communication. I actually read in the changelog just now that their Compose (plan/executre/review etc. something like that) flow is now deterministic with software instead of just prompts. That could be really good.
I have read about people giving an agent full access to their main system saying they have nothing of value. To me, that's a strange opinion to have with the distinction between what's private and what's secret.
I've also started creating new github deploy keys for each repo in use on a VM, so the blast area for any given agent disaster is "a couple/few github repos and whatever credentials were needed for the agent/model".
I wouldn't let a coworker, even one I know pretty well, log into my personal account on my machines...why would I let an agent that can be tricked into uploading all my credentials to an attackers web server?
The agents have sandboxes, but those are loose. Not enforced by anything outside of the agent harness itself.
This way, all members of the student union were able to install any software they wanted to on the student union computers without having to give out blanket root access to the members. Only a select few members had full root access.
There’s other ways to achieve the same too.
And you can do this exact same sort of thing for the user that your agent runs as too, without having to give it access to do everything that root can.
You might want to check out Ant's open source srt [0], I use it to contain my local coding agents. It's strict by default and enforced at the OS layer.
[0] https://github.com/anthropic-experimental/sandbox-runtime
You can do a brainstorming with web on a remote container prototyping based on that brainstorm on another container with no network access.
The one thing that is less trustworthy is using local agents for service management, you definitely want to have them scoped to dev/testing. I would never trust an agent to execute any command in production or sensitive data at all
You can't see how the agent having no access to anything other than what it's working on is safer than the agent having access to my home directory with all of my credentials?
Look, you do whatever you want to do with your agents and your computer. I'm going to...contain them.
https://venturebeat.com/security/six-exploits-broke-ai-codin...
I've never used IDEs and never will, why are these things being constantly shoved down our throats?
In that case, maybe you want VMs at hosting providers. There are companies building ephemeral VM and container orchestration layers for this kind of thing, I haven't played with them, though. It seems like a reasonable idea, though. One isolated environment per project or repo. Only the secrets needed for that one project and an agent that can't reach outside of it.
I've considered building something along those lines, and actually do run my security auditing benchmarks in containers automatically (that was originally to prevent the models from cheating, because you can disable network, but it has other pleasant side effects).
It's actually not that big of a lift these days to spin up containers on-demand and put just what's needed inside it (including the authentication info for the agent). I probably should automate it..right now I just have four permanent VMs setup for my various types of work: My day job, my open source projects, my benchmark and security work, and some side projects. Plus some temporary ones for experiments.
I also maintain https://github.com/nothingnesses/agent-images which allows you to use Nix to reproducibly spin up OCI container images containing agents and any other tools you need for development and use these with agent-box.
I use both at the moment to work on some personal projects with agents, where I set up multiple separate git worktrees for the agents to work in, preventing them from accessing anything outside of the worktrees and from trampling over each other's work.
shameless self-plug. I've been dogfooding it for the last 3 weeks now.
What's your setup like and what do you use it for?
I have a M2 Max MBP with plenty of ram and I use VSCode + Zoo Code plugin with Qwen3-Coder-Next-GGUF:UD-Q4_K_XL to run local agentic coding sessions, but I'm intrigued by being able to run headless as I could probably run multiple instances in parallel to do stuff?
Like are you using UTM with some pre-built VM and a local LLM?
Curious.
Here on HN we discuss facts, jumping straight into racism has no place here.
It is essentially a black box with full user permissions, meaning you are just handing over your entire system to a Chinese-owned server. With OpenCode and its GLM provider, at least I can monitor which files were read, which were edited, and what commands were executed.
Not to mention that Chinese national security laws legally obligate companies to cooperate with state intelligence and counter-espionage efforts [0]. If you have this installed on a corporate workstation, and your company is large enough, the possibility of them spying on you is not just a risk—it's almost a certainty.
[0]: https://en.wikipedia.org/wiki/National_Intelligence_Law_of_t...
That idea is wrong, though. These same people thinking harnesses are part of a moat are also boasting that s/ware is easily writable now.
There's no secret sauce in a harness that you can't vibe-code into your own harness.
Except you'd need the knowledge of what to vibe-code, no?
What knowledge? If you've used a harness, you know what it is supposed to do for you!
What further knowledge do you need that can't be extracted from an existing harness?
Harnesses are quickly becoming critical components of the "model" itself imo. Not shocking to me at all that a company that spots a revenue opportunity is keeping its harness closed source.