The problem with context limits
Long-running agent sessions accumulate conversation history faster than the model can process it. When the context window fills, something has to go. The question is: what?
Most systems answer with heuristics: truncate the oldest turns, summarize in chunks, or use a scoring model with no knowledge of what's actually been learned. They treat all turns as equally valuable — which they're not.
A turn where you debugged a crash and found the root cause is irreplaceable if that knowledge isn't anywhere else. A turn where you asked about something chitta has distilled into a memory is safely droppable — the knowledge lives on in the graph.
Memory-aware vs. stateless compaction
- Scores each turn purely by text relevance to next task
- No knowledge of what the agent has already learned
- Cannot distinguish novel signal from redundant noise
- 50–70% reduction, verbatim survivors
- Works well as a general-purpose filter
- Checks each turn against what’s already in memory
- High memory coverage → safely droppable
- Novel content → preserved regardless of age
- Graceful degradation when embedder unavailable
- Integrated with PreCompact hook automatically
How it works
Every non-system message gets a score. System messages are always kept. The score balances three signals:
0.3 × recency
+ 0.4 × semantic_sim(msg, query)
+ 0.3 × (1 − memory_coverage)
)
// memory_coverage: cosine sim vs nearest 3 soul memories
// high coverage → already learned → cheaper to drop
| Component | What it measures | Weight |
|---|---|---|
| structural_weight | Role importance: system=2.0, user=1.0, assistant=0.8, tool=0.5 | multiplier |
| recency | Linear 0.1 (oldest) → 1.0 (most recent) | 0.3 |
| semantic_sim | Cosine similarity of turn embedding vs. upcoming query | 0.4 |
| memory_coverage | Max cosine sim vs. nearest 3 soul memories (higher = already learned) | 0.3 (inverted) |
field_store_→recall(emb, k=3) to get memory_coverage. Compute cosine similarity vs. query embedding for semantic_sim.target_ratio of original tokens is filled. System messages bypass the budget check entirely.Live scoring demo
Scroll to animate. Each bar shows the computed score. Green = kept, faded = memory-covered drop.
compact_context tool
| Parameter | Type | Required | Description |
|---|---|---|---|
| messages | array | yes | Conversation turns [{role, content}] — OpenAI/Claude format |
| query | string | no | Upcoming task hint — biases semantic scoring toward what’s needed next |
| target_ratio | float | no | Fraction of tokens to keep (default: 0.4, range: 0.05–1.0) |
| distill_novel | boolean | no | Reserved — distill novel drops into memories before removing (coming soon) |
{
"messages": [...], // kept messages, original order, verbatim
"stats": {
"before_tokens": 48200,
"after_tokens": 18600,
"dropped": 34,
"dropped_pct": 61.4,
"embedding": true // false if embedder unavailable (graceful degradation)
}
}
echo '{
"jsonrpc": "2.0", "id": 1, "method": "tools/call",
"params": {
"name": "compact_context",
"arguments": {
"messages": [...],
"query": "implement the new feature",
"target_ratio": 0.4
}
}
}' | chitta
Hook integration
Chitta wires compact_context into the PreCompact hook automatically. When Claude Code is about to compact the conversation, chitta:
query to bias scoring toward what’s contextually important.distill_trigger pipeline then extracts wisdom from the surviving turns before the window closes.You can also trigger compaction manually from any hook or script:
# In UserPromptSubmit hook — compact before long-session turns
MESSAGES=$(jq -sc '[.[] | select(.type=="user" or .type=="assistant") | ...] | .[-60:]' "$TRANSCRIPT_PATH")
echo "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"compact_context\",\"arguments\":{\"messages\":$MESSAGES,\"query\":\"$QUERY\",\"target_ratio\":0.4}}}" \
| chitta
Coming: distill_novel
When distill_novel: true is enabled, turns that score low AND have low memory coverage (novel content not yet in chitta) will be distilled into memories before being dropped. This means nothing is lost — novel knowledge is compressed into the memory graph rather than discarded.
The pipeline becomes: score → detect novel drops → distill to memory → then drop verbatim text. The context shrinks; the soul grows.