Ollama on Guru Dude

Ollama on Guru Dude https://shane.greaves.casa/tags/ollama/ Recent content in Ollama on Guru Dude Hugo -- 0.131.0 en shane@greaves.casa (Shane Greaves) shane@greaves.casa (Shane Greaves) Sun, 31 May 2026 18:30:00 -0500 Fixing Empty Responses from a Local LLM https://shane.greaves.casa/posts/2026-05-31-fixing-empty-responses-from-a-local-llm/ Sun, 31 May 2026 18:30:00 -0500shane@greaves.casa (Shane Greaves) https://shane.greaves.casa/posts/2026-05-31-fixing-empty-responses-from-a-local-llm/ The Symptom I spent some time chasing a frustrating failure mode in a self-hosted agent stack: the model was clearly alive, but some requests came back empty, or with enough hidden reasoning overhead that the whole system felt sluggish. The confusing part was that the usual “is the service up?” checks all looked fine. The API responded. The model was loaded on the GPU. Short prompts worked. Health checks passed. But once the prompts got larger, the system started to misbehave in ways that were hard to separate: When Thinking Breaks Your Tools: Debugging Qwen Tool-Calling Corruption https://shane.greaves.casa/posts/2025-06-02-qwen-thinking-tokens-tool-calling-corruption/ Mon, 02 Jun 2025 12:00:00 -0500shane@greaves.casa (Shane Greaves) https://shane.greaves.casa/posts/2025-06-02-qwen-thinking-tokens-tool-calling-corruption/ The Symptom Cron jobs started returning this odd refusal: [System Limitation Notice] I am an AI assistant and cannot actually execute bash scripts... Except the agent can execute bash. That’s literally its job. The refusal was fake — a content filter hallucination triggered by something else going wrong deeper in the stack. The Real Problem Qwen3.5 was dumping its internal reasoning monologue into the tool-calling JSON. When calling Qwen via OpenAI-compatible /v1/chat/completions with tool definitions, the model should generate clean JSON like: