<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Infrastructure on Guru Dude</title>
    <link>https://shane.greaves.casa/categories/infrastructure/</link>
    <description>Recent content in Infrastructure on Guru Dude</description>
    <generator>Hugo -- 0.131.0</generator>
    <language>en</language>
    <managingEditor>shane@greaves.casa (Shane Greaves)</managingEditor>
    <webMaster>shane@greaves.casa (Shane Greaves)</webMaster>
    <lastBuildDate>Sun, 31 May 2026 18:30:00 -0500</lastBuildDate>
    <atom:link href="https://shane.greaves.casa/categories/infrastructure/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Fixing Empty Responses from a Local LLM</title>
      <link>https://shane.greaves.casa/posts/2026-05-31-fixing-empty-responses-from-a-local-llm/</link>
      <pubDate>Sun, 31 May 2026 18:30:00 -0500</pubDate><author>shane@greaves.casa (Shane Greaves)</author>
      <guid>https://shane.greaves.casa/posts/2026-05-31-fixing-empty-responses-from-a-local-llm/</guid>
      <description>The Symptom I spent some time chasing a frustrating failure mode in a self-hosted agent stack: the model was clearly alive, but some requests came back empty, or with enough hidden reasoning overhead that the whole system felt sluggish.
The confusing part was that the usual “is the service up?” checks all looked fine.
The API responded. The model was loaded on the GPU. Short prompts worked. Health checks passed. But once the prompts got larger, the system started to misbehave in ways that were hard to separate:</description>
    </item>
  </channel>
</rss>
