The Symptom

Cron jobs started returning this odd refusal:

[System Limitation Notice]
I am an AI assistant and cannot actually execute bash scripts...

Except the agent can execute bash. That’s literally its job. The refusal was fake — a content filter hallucination triggered by something else going wrong deeper in the stack.

The Real Problem

Qwen3.5 was dumping its internal reasoning monologue into the tool-calling JSON.

When calling Qwen via OpenAI-compatible /v1/chat/completions with tool definitions, the model should generate clean JSON like:

{"tool_calls": [{"function": {"name": "terminal", "arguments": "{...}"}}]}

But with reasoning_effort: low, Qwen streamed its chain-of-thought directly into the response content. That content got concatenated with the tool JSON, producing malformed blobs like:

<think>I'll use the terminal tool to run this bash script...</think>
{"tool_calls": [{"function": {"name": "terminal", "arguments": "{...}"}}]}

The tool parser choked on the tags, the call failed, and the fallback “I can’t do that” message appeared.

The Fix

Two config changes in ~/.hermes/profiles/qyburn/config.yaml:

agent:
  reasoning_effort: none

custom_providers:
- name: ollama-local
  base_url: http://10.168.20.149:11434/v1
  extra_body:
    options:
      think: false

reasoning_effort: none tells Hermes not to request reasoning. think: false in extra_body tells Ollama/Qwen not to generate it even if requested.

Both are required. Qwen respects the Ollama flag, but only if the API caller doesn’t override it with reasoning_effort.

Verification

After applying the fix, the disroot-email-processor cron job runs hourly and returns [SILENT] — the expected silent success signal. No more fake refusals. No thinking tokens corrupting JSON.

Lessons Learned

  1. OpenAI-compatible endpoints are fragile with local models. For non-tool calls, use Ollama’s native /api/chat with think: false. Only use /v1 endpoints when you need tool schemas, and explicitly disable Qwen’s reasoning.

  2. Profile-level fixes propagate. The qyburn Hermes profile runs 6+ cron jobs. One config change fixed all of them.

  3. Silent failures look like content refusals. When a tool call fails, the agent may hallucinate a refusal. Check the raw response before assuming it’s a policy limitation.

  4. Local LLMs have sharp edges. They don’t always respect the same API contracts as cloud providers. Being able to inspect raw outputs and tweak extra_body parameters is essential.

Technical Context

The issue manifested in the disroot-email-processor skill — a mandatory routing skill for email processing via himalaya. When Qwen’s thinking tokens corrupted the tool-calling JSON, the terminal tool failed to invoke, and the skill’s bash script never ran.

This wasn’t limited to one job. Any Qyburn profile job using tool-calling through the OpenAI-compatible endpoint was affected. The fix applies globally to the profile.

References

  • Related skill: dogfood/hermes-espresense-calibration — documents similar Qwen thinking token issues
  • Config: ~/.hermes/profiles/qyburn/config.yaml
  • Job ID: 64cf70656e33 (Disroot Email Processor)

Posted from the homelab after fixing yet another local LLM quirk.