The Symptom
Cron jobs started returning this odd refusal:
[System Limitation Notice]
I am an AI assistant and cannot actually execute bash scripts...
Except the agent can execute bash. That’s literally its job. The refusal was fake — a content filter hallucination triggered by something else going wrong deeper in the stack.
The Real Problem
Qwen3.5 was dumping its internal reasoning monologue into the tool-calling JSON.
When calling Qwen via OpenAI-compatible /v1/chat/completions with tool definitions, the model should generate clean JSON like:
{"tool_calls": [{"function": {"name": "terminal", "arguments": "{...}"}}]}
But with reasoning_effort: low, Qwen streamed its chain-of-thought directly into the response content. That content got concatenated with the tool JSON, producing malformed blobs like:
<think>I'll use the terminal tool to run this bash script...</think>
{"tool_calls": [{"function": {"name": "terminal", "arguments": "{...}"}}]}
The tool parser choked on the tags, the call failed, and the fallback “I can’t do that” message appeared.
The Fix
Two config changes in ~/.hermes/profiles/qyburn/config.yaml:
agent:
reasoning_effort: none
custom_providers:
- name: ollama-local
base_url: http://10.168.20.149:11434/v1
extra_body:
options:
think: false
reasoning_effort: none tells Hermes not to request reasoning. think: false in extra_body tells Ollama/Qwen not to generate it even if requested.
Both are required. Qwen respects the Ollama flag, but only if the API caller doesn’t override it with reasoning_effort.
Verification
After applying the fix, the disroot-email-processor cron job runs hourly and returns [SILENT] — the expected silent success signal. No more fake refusals. No thinking tokens corrupting JSON.
Lessons Learned
-
OpenAI-compatible endpoints are fragile with local models. For non-tool calls, use Ollama’s native
/api/chatwiththink: false. Only use/v1endpoints when you need tool schemas, and explicitly disable Qwen’s reasoning. -
Profile-level fixes propagate. The
qyburnHermes profile runs 6+ cron jobs. One config change fixed all of them. -
Silent failures look like content refusals. When a tool call fails, the agent may hallucinate a refusal. Check the raw response before assuming it’s a policy limitation.
-
Local LLMs have sharp edges. They don’t always respect the same API contracts as cloud providers. Being able to inspect raw outputs and tweak
extra_bodyparameters is essential.
Technical Context
The issue manifested in the disroot-email-processor skill — a mandatory routing skill for email processing via himalaya. When Qwen’s thinking tokens corrupted the tool-calling JSON, the terminal tool failed to invoke, and the skill’s bash script never ran.
This wasn’t limited to one job. Any Qyburn profile job using tool-calling through the OpenAI-compatible endpoint was affected. The fix applies globally to the profile.
References
- Related skill:
dogfood/hermes-espresense-calibration— documents similar Qwen thinking token issues - Config:
~/.hermes/profiles/qyburn/config.yaml - Job ID:
64cf70656e33(Disroot Email Processor)
Posted from the homelab after fixing yet another local LLM quirk.