When you run an AI agent like Hermes Agent with a memory system, you quickly hit a wall: the built-in memory has a hard character limit. When it’s full, the agent silently deletes old entries to make room for new ones. Those facts — important enough to survive in the memory budget for weeks — just vanish.

This is a known problem in the Hermes Agent community. Users on r/hermesagent regularly ask about memory management. The built-in memory (MEMORY.md and USER.md) is capped at roughly 2,200 and 1,375 characters respectively. When you hit the cap, the agent is expected to “consolidate, replace, or remove” entries. But there’s no documented guidance on what happens to evicted facts.

The Setup

I run Hermes Agent with Honcho as an AI-native memory backend. They coexist as two layers:

Layer What it does Budget
Hermes memory (MEMORY.md / USER.md) Compact facts injected into every prompt, every turn ~2,200 / ~1,375 chars
Honcho conclusions (honcho_conclude) Durable facts written to a searchable archive Unlimited
Honcho peer cards Stable identity markers (name, location, relationships) Max 40 entries
Honcho observations Raw facts extracted automatically from every conversation turn Automatic

Hermes memory is always-on — every single turn, the system prompt includes the full memory block. This is great for high-signal facts that must always be in context (“user prefers CAD”, “never push to main without review”). But the tight budget means you’re constantly making room.

Honcho, on the other hand, has unlimited storage. It’s searchable via honcho_search, can be reasoned over via honcho_reasoning, and conclusions persist across sessions. But it’s not injected into every prompt — you have to explicitly query it.

The Problem

When I needed to add a new fact to Hermes memory and the budget was full, I’d remove an old entry. That fact would disappear from per-turn injection AND I never wrote it to Honcho first. The information was lost unless it happened to still exist as a raw observation the deriver had extracted from the original conversation.

Nobody is doing this better. I searched the Hermes docs, Honcho docs, GitHub discussions, Reddit threads, and community blog posts. The pattern of “graduate evicted memory to the archive layer before deletion” doesn’t exist anywhere.

The Fix: Memory Graduation

Instead of deleting evicted entries, graduate them to Honcho’s searchable archive before removing them from the tight-budget per-turn layer.

The flow:

  1. Hermes memory is full — need to make room for a new fact
  2. Before evicting, write the old entry to Honcho via honcho_conclude
  3. Then remove from Hermes memory
  4. The fact graduates from always-on per-turn injection → on-demand searchable archive

This ensures durable facts are never silently lost. They remain retrievable via honcho_search and honcho_reasoning even without per-turn injection.

When NOT to graduate

  • The fact is stale or wrong and should be forgotten entirely
  • The fact was already corrected (write the correction to Honcho first, then evict the old version)
  • The fact is duplicated elsewhere in memory

When facts conflict

If Hermes memory and Honcho disagree, defer to the user. Ask which is current before reconciling.

Why This Works

The two systems have complementary strengths:

  • Hermes memory is fast, always-on, and has a tight budget. It’s for facts that must be in context right now.
  • Honcho conclusions are unlimited, searchable, and don’t cost per-turn tokens. They’re for facts that are durable but don’t need to be in every prompt.

Graduation respects the nature of both layers. The tight budget of Hermes memory stays tight — you’re still making room. But nothing is lost. The archive grows, and honcho_search / honcho_reasoning can surface graduated facts when they become relevant again.

What About Honcho’s Dreamer?

Honcho 3.x has a “dreaming” system — a background reasoning agent that crawls accumulated observations and synthesizes conclusions, peer cards, and summaries. This is Honcho’s own automatic graduation: raw observations get reasoned into higher-order knowledge.

The manual graduation pattern I’m describing here is complementary, not competing. The dreamer handles the automatic synthesis — patterns, deductions, identity markers. Manual honcho_conclude handles the explicit facts the user states — “my secondary email is X”, “the agent’s email is Y”. These are things the deriver might extract, but writing them explicitly ensures they’re captured with the right framing and no delay.

Implementation

I saved this as a skill in Hermes so it persists across sessions. The key rules:

  1. Before evicting a Hermes memory entry → write it to Honcho via honcho_conclude
  2. Then remove from Hermes memory
  3. Conflicting facts → defer to the user
  4. Don’t silently delete — always graduate first (unless the fact is stale/wrong)

The skill also documents the broader write-path routing:

Fact type Hermes memory Honcho conclusion Peer card
Compact preference ✅ if durable ❌ (dreamer-owned)
Standing instruction ❌ (dreamer-owned)
Identity fact ✅ if compact ❌ (dreamer-owned)
Correction ✅ update existing Only if dreamer fails

Lessons

  1. Don’t delete what you can graduate. If you have a tiered memory system, eviction from the hot layer should be a promotion, not a deletion.
  2. Check the docs first, but don’t stop there. Neither Hermes nor Honcho docs mention this pattern. It’s a natural consequence of how the two systems work, not something that’s prescribed.
  3. Manual writes complement automatic ones. Honcho’s dreamer handles synthesis. The agent writing explicit conclusions via honcho_conclude fills gaps the dreamer might miss or take too long to produce.
  4. Measure before optimizing. I ran a full token economy audit before touching memory routing. Understanding where quota was actually being burned made the memory layering decision obvious.

This pattern isn’t specific to Hermes + Honcho. Any agent setup with a hot per-turn memory and a cold searchable archive can use it — MemGPT, Letta, custom RAG stacks, anything with tiered memory. The principle is simple: when the hot layer is full, don’t throw away what you’ve learned. Put it somewhere you can find it again.