I’ve been daily-driving Hermes Agent on z.ai’s GLM-5.1 (their Coding Plan endpoint). The model is genuinely great for agentic coding work - strong tool use, reasonable price, and it’s the model behind a SOTA SWE-Bench Pro score.

But it kept timing out. Mid-task. Especially on big tool calls like writing a long file or running a complex shell command. The error wasn’t the kind of thing where you go “oh, my context is too big” - it would just die after about 30 seconds, every time, like clockwork.

Here’s how I tracked it down, what the actual fix was, and the upstream PR I sent to NousResearch.

The Symptom

Running on:

  • Model: glm-5.1 via z.ai Coding Plan
  • Endpoint: https://api.z.ai/api/coding/paas/v4
  • Hermes Agent on main

Most turns worked fine. Then, semi-randomly, a turn that involved generating a large tool call (file write, big bash command, anything where the JSON args would be many KB) would hang for ~30 seconds and then die. The Hermes log would show a stale-stream kill or a generic timeout. Retry? Same thing.

It felt like a network problem until I noticed the timeout was exactly 30 seconds. Every time. Networks aren’t that punctual.

Where I Looked First (and Was Wrong)

Hermes has several timeout knobs, and I assumed I’d find the bug there:

VariableDefaultWhat it controls
HERMES_API_TIMEOUT1800sOverall httpx timeout
HERMES_STREAM_READ_TIMEOUT120sSocket read timeout per chunk
HERMES_STREAM_STALE_TIMEOUT180sKills stream if no chunks arrive in N seconds
HERMES_API_CALL_STALE_TIMEOUT300sSame idea for non-streaming calls

None of those is 30 seconds. So whatever was killing the connection at 30s wasn’t Hermes’ own logic. I assumed it was on z.ai’s end.

I cranked all the stale timeouts to 5+ minutes anyway. No effect. Connection still died at exactly 30 seconds. Bummer.

That ruled out the client-side timeouts. Something on the server side was actively closing the connection. So I was definitely onto something.

The Real Cause

Two things had to be true at the same time, and once I saw them, the picture clicked.

Fact 1: api.z.ai enforces a 30 second idle timeout on streaming connections.

This is documented (well, kind of, by community discovery) in vercel/ai#12949 and opencode#15350. If your SSE stream doesn’t receive any chunks for 30 seconds, the edge proxy resets the connection with ECONNRESET. Z.AI’s mainland China endpoint (open.bigmodel.cn) doesn’t have this timeout, but the global one does.

Fact 2: GLM models batch tool-call argument generation.

Most models stream tool-call args incrementally - you get {"path": then "src/foo. then py" etc. as the model generates. GLM (4.6, 4.7, 5, 5.1) doesn’t. By default it generates the entire arguments JSON internally and emits it as one chunk at the end.

For a small tool call ({"command": "ls"}), this is fine - it shows up in well under a second. For a large tool call ({"path": "src/api.ts", "content": "...3KB of TypeScript..."}), the model can chew on the args for 30+ seconds before emitting anything.

Combine the two: Stream sits silent while GLM generates a big tool call → Z.AI’s edge proxy hits 30s → connection reset → Hermes sees a dropped stream and reports a timeout.

The maddening part is that nothing is technically broken. The model is working, the network is fine, Hermes is patient - but the proxy in the middle isn’t.

The Fix

Z.AI actually documents the fix. Set tool_stream=true alongside stream=true and the model emits tool args incrementally instead of in one batch. No more 30-second silence, no more reset.

In Hermes terms, that means injecting one extra field into the API request payload when the provider is z.ai. Here’s the patch I added to _build_api_kwargs in run_agent.py:

# Z.AI / GLM: enable incremental tool-call argument streaming.
# GLM models (4.6, 4.7, 5, 5.1) batch tool_call args in one chunk by
# default, producing 30+ second silence gaps that trip api.z.ai's
# 30s server-side idle timeout (vercel/ai#12949, opencode#15350).
# Setting tool_stream=true makes the model stream tool args
# incrementally, eliminating the silence gap.
# Docs: https://docs.z.ai/guides/capabilities/stream-tool
_provider_lower = (self.provider or "").lower()
_is_zai_endpoint = (
    _provider_lower == "zai"
    or "z.ai" in self._base_url_lower
    or "bigmodel.cn" in self._base_url_lower
)
if _is_zai_endpoint and self.tools:
    extra_body.setdefault("tool_stream", True)

Three guardrails worth pointing out:

  1. Provider/URL gated. Only fires for z.ai endpoints. Won’t affect OpenRouter, Anthropic, OpenAI, Ollama, or anything else.
  2. Tools gated. No tools bound? No tool args to stream - the parameter would be wasted. Skip it.
  3. setdefault. If a user explicitly sets extra_body["tool_stream"], theirs wins.

I also added five unit tests covering the cases: zai provider, api.z.ai URL, bigmodel.cn URL, no-tools omission, and non-zai omission. Eight tests total in the GLM area, all green.

What About Just Bumping Timeouts?

I kept these as a belt-and-suspenders backup, since longer-thinking models on big-context turns can also push past defaults:

# in ~/.hermes/.env
HERMES_STREAM_STALE_TIMEOUT=300
HERMES_API_CALL_STALE_TIMEOUT=600

Important caveat: this alone doesn’t fix the bug. Hermes can be infinitely patient and the Z.AI edge proxy still resets the connection at 30s. Only the tool_stream=true patch fixes the root cause. The env vars just keep Hermes from killing slow-but-healthy connections of its own accord on long-thinking turns.

Why Not Just Use the China Endpoint?

The China-region endpoint at open.bigmodel.cn doesn’t have the 30s idle timeout. That sounds tempting - until you realize it requires a separate Zhipu account on the China platform, which most international users don’t have. If you do, great, you can sidestep this entirely. Otherwise the tool_stream fix is the path.

Living With a Local Patch

Hermes Agent is a live git checkout in ~/.hermes/hermes-agent, not a pip-installed package. So a local edit to run_agent.py would conflict with git pull next time upstream changes anything nearby. Three options:

  1. Edit and commit on a feature branch - git pull becomes git pull --rebase. Conflicts surface clearly if upstream ever fixes the same spot themselves.
  2. Monkey-patch via sitecustomize.py in the venv - survives git pull but not a venv rebuild.
  3. Just submit the PR upstream and run on your branch until it merges.

I went with #1 + #3. Branch aaron/glm-zai-tool-stream carries the patch locally, and PR #12758 is in flight upstream. When (if) it merges, I’ll rebase onto main and delete the local branch.

If you want to apply this yourself before the PR lands:

cd ~/.hermes/hermes-agent
git fetch origin
git checkout -b glm-tool-stream origin/main
# apply the patch above to run_agent.py around the
# extra_body assembly in _build_api_kwargs
git add run_agent.py
git commit -m "fix(zai): enable tool_stream for GLM endpoints"

Then to update later:

git fetch origin
git rebase origin/main   # resolve any conflicts in run_agent.py

Lessons

1. “Timeout” is a symptom, not a diagnosis. The interesting question is always which timeout, and whose. Stale-stream timeout? Read timeout? Connection timeout? Server-side idle timeout? Reverse proxy keepalive timeout? They all show up the same way at the application layer.

2. Exact-second timeouts are a tell. If something dies at exactly 30.0s every time, that’s a configured limit somewhere. Real network issues are messy.

3. Tool-call streaming is not free. Most providers emit tool args incrementally for a reason - to keep the connection warm. Models that batch (GLM, some DeepSeek variants) can hit edge-proxy timeouts that nobody warned you about.

4. Read the provider docs for capability flags you didn’t know existed. I wouldn’t have found tool_stream=true by debugging Hermes alone. It’s in the Z.AI docs, but you have to know to look.

5. Send the PR. This bug affects every Hermes user on a z.ai endpoint who runs long tool calls. No reason to fix it only locally.

Status

  • Local patch: applied, working - no more 30s timeouts on big tool calls
  • Tests: 5 new + all pre-existing GLM/zai tests green
  • Upstream PR: NousResearch/hermes-agent#12758

If you’re hitting the same thing on z.ai and don’t want to wait for the merge, the patch is ~12 lines and self-contained. Drop it in, commit on a branch, and you’re done.


Running into similar issues with Hermes Agent?

Let me know on X at https://x.com/aarongxa