The problem with context limits

Long-running agent sessions accumulate conversation history faster than the model can process it. When the context window fills, something has to go. The question is: what?

Most systems answer with heuristics: truncate the oldest turns, summarize in chunks, or use a scoring model with no knowledge of what's actually been learned. They treat all turns as equally valuable — which they're not.

A turn where you debugged a crash and found the root cause is irreplaceable if that knowledge isn't anywhere else. A turn where you asked about something chitta has distilled into a memory is safely droppable — the knowledge lives on in the graph.

Memory-aware vs. stateless compaction

Generic (e.g. Morph Compact)
Stateless scoring
  • Scores each turn purely by text relevance to next task
  • No knowledge of what the agent has already learned
  • Cannot distinguish novel signal from redundant noise
  • 50–70% reduction, verbatim survivors
  • Works well as a general-purpose filter
Chitta — Memory-aware
Soul-informed scoring
  • Checks each turn against what’s already in memory
  • High memory coverage → safely droppable
  • Novel content → preserved regardless of age
  • Graceful degradation when embedder unavailable
  • Integrated with PreCompact hook automatically

How it works

The scoring formula

Every non-system message gets a score. System messages are always kept. The score balances three signals:

score(msg) = structural_weight × (
  0.3 × recency
  + 0.4 × semantic_sim(msg, query)
  + 0.3 × (1 − memory_coverage)
)

// memory_coverage: cosine sim vs nearest 3 soul memories
// high coverage → already learned → cheaper to drop
ComponentWhat it measuresWeight
structural_weightRole importance: system=2.0, user=1.0, assistant=0.8, tool=0.5multiplier
recencyLinear 0.1 (oldest) → 1.0 (most recent)0.3
semantic_simCosine similarity of turn embedding vs. upcoming query0.4
memory_coverageMax cosine sim vs. nearest 3 soul memories (higher = already learned)0.3 (inverted)
The algorithm
1
Estimate tokens
Word count × 1.3 per message. Fast, no tokenizer dependency.
2
Pre-embed all messages once
VakYantra ONNX embedder runs locally (~1ms/message). Embeddings cached per message — used for both semantic_sim and memory_coverage without re-computing.
3
Score each turn
Query the memory graph via field_store_→recall(emb, k=3) to get memory_coverage. Compute cosine similarity vs. query embedding for semantic_sim.
4
Greedy keep by budget
Sort by score descending. Add messages until target_ratio of original tokens is filled. System messages bypass the budget check entirely.
5
Re-sort by original index
Survivors are returned in their original conversation order — verbatim, unmodified.

Live scoring demo

Scroll to animate. Each bar shows the computed score. Green = kept, faded = memory-covered drop.

compact_context — scoring pass
before: after: dropped:

compact_context tool

ParameterTypeRequiredDescription
messagesarrayyesConversation turns [{role, content}] — OpenAI/Claude format
querystringnoUpcoming task hint — biases semantic scoring toward what’s needed next
target_ratiofloatnoFraction of tokens to keep (default: 0.4, range: 0.05–1.0)
distill_novelbooleannoReserved — distill novel drops into memories before removing (coming soon)
Response
{
  "messages": [...],          // kept messages, original order, verbatim
  "stats": {
    "before_tokens": 48200,
    "after_tokens":  18600,
    "dropped":       34,
    "dropped_pct":   61.4,
    "embedding":     true     // false if embedder unavailable (graceful degradation)
  }
}
Direct CLI (JSON-RPC thin client)
echo '{
  "jsonrpc": "2.0", "id": 1, "method": "tools/call",
  "params": {
    "name": "compact_context",
    "arguments": {
      "messages": [...],
      "query": "implement the new feature",
      "target_ratio": 0.4
    }
  }
}' | chitta

Hook integration

Chitta wires compact_context into the PreCompact hook automatically. When Claude Code is about to compact the conversation, chitta:

1
Parses the transcript JSONL
Converts the last 60 user/assistant turns into a messages array.
2
Calls compact_context
Scores all turns via the memory graph. Uses the pre-compact session snapshot as the query to bias scoring toward what’s contextually important.
3
Stores a compaction episode
Saves token stats as an episode memory: “48200→18600 tokens, 61% dropped as memory-covered”. The soul tracks its own compaction history.
4
Continues with distillation
The existing distill_trigger pipeline then extracts wisdom from the surviving turns before the window closes.

You can also trigger compaction manually from any hook or script:

# In UserPromptSubmit hook — compact before long-session turns
MESSAGES=$(jq -sc '[.[] | select(.type=="user" or .type=="assistant") | ...] | .[-60:]' "$TRANSCRIPT_PATH")
echo "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"compact_context\",\"arguments\":{\"messages\":$MESSAGES,\"query\":\"$QUERY\",\"target_ratio\":0.4}}}" \
  | chitta

Coming: distill_novel

When distill_novel: true is enabled, turns that score low AND have low memory coverage (novel content not yet in chitta) will be distilled into memories before being dropped. This means nothing is lost — novel knowledge is compressed into the memory graph rather than discarded.

The pipeline becomes: score → detect novel drops → distill to memory → then drop verbatim text. The context shrinks; the soul grows.