Context Management

The problem with context limits

Long-running agent sessions accumulate conversation history faster than the model can process it. When the context window fills, something has to go. The question is: what?

Most systems answer with heuristics: truncate the oldest turns, summarize in chunks, or use a scoring model with no knowledge of what's actually been learned. They treat all turns as equally valuable — which they're not.

A turn where you debugged a crash and found the root cause is irreplaceable if that knowledge isn't anywhere else. A turn where you asked about something chitta has distilled into a memory is safely droppable — the knowledge lives on in the graph.

Memory-aware vs. stateless compaction

Generic (e.g. Morph Compact)

Stateless scoring

Scores each turn purely by text relevance to next task
No knowledge of what the agent has already learned
Cannot distinguish novel signal from redundant noise
50–70% reduction, verbatim survivors
Works well as a general-purpose filter

Chitta — Memory-aware

Soul-informed scoring

Checks each turn against what’s already in memory
High memory coverage → safely droppable
Novel content → preserved regardless of age
Graceful degradation when embedder unavailable
Integrated with PreCompact hook automatically

How it works

The scoring formula

Every non-system message gets a score. System messages are always kept. The score balances three signals:

score(msg) = structural_weight × (
  0.3 × recency
  + 0.4 × semantic_sim(msg, query)
  + 0.3 × (1 − memory_coverage)
)

// memory_coverage: cosine sim vs nearest 3 soul memories
// high coverage → already learned → cheaper to drop

Component	What it measures	Weight
structural_weight	Role importance: system=2.0, user=1.0, assistant=0.8, tool=0.5	multiplier
recency	Linear 0.1 (oldest) → 1.0 (most recent)	0.3
semantic_sim	Cosine similarity of turn embedding vs. upcoming `query`	0.4
memory_coverage	Max cosine sim vs. nearest 3 soul memories (higher = already learned)	0.3 (inverted)

The algorithm

1

Estimate tokens

Word count × 1.3 per message. Fast, no tokenizer dependency.

2

Pre-embed all messages once

VakYantra ONNX embedder runs locally (~1ms/message). Embeddings cached per message — used for both semantic_sim and memory_coverage without re-computing.

3

Score each turn

Query the memory graph via field_store_→recall(emb, k=3) to get memory_coverage. Compute cosine similarity vs. query embedding for semantic_sim.

4

Greedy keep by budget

Sort by score descending. Add messages until target_ratio of original tokens is filled. System messages bypass the budget check entirely.

5

Re-sort by original index

Survivors are returned in their original conversation order — verbatim, unmodified.

Live scoring demo

Scroll to animate. Each bar shows the computed score. Green = kept, faded = memory-covered drop.

compact_context — scoring pass

before: — after: — dropped: —

compact_context tool

Parameter	Type	Required	Description
messages	array	yes	Conversation turns `[{role, content}]` — OpenAI/Claude format
query	string	no	Upcoming task hint — biases semantic scoring toward what’s needed next
target_ratio	float	no	Fraction of tokens to keep (default: 0.4, range: 0.05–1.0)
distill_novel	boolean	no	Reserved — distill novel drops into memories before removing (coming soon)

Response

{
  "messages": [...],          // kept messages, original order, verbatim
  "stats": {
    "before_tokens": 48200,
    "after_tokens":  18600,
    "dropped":       34,
    "dropped_pct":   61.4,
    "embedding":     true     // false if embedder unavailable (graceful degradation)
  }
}

Direct CLI (JSON-RPC thin client)

echo '{
  "jsonrpc": "2.0", "id": 1, "method": "tools/call",
  "params": {
    "name": "compact_context",
    "arguments": {
      "messages": [...],
      "query": "implement the new feature",
      "target_ratio": 0.4
    }
  }
}' | chitta

Hook integration

Chitta wires compact_context into the PreCompact hook automatically. When Claude Code is about to compact the conversation, chitta:

1

Parses the transcript JSONL

Converts the last 60 user/assistant turns into a messages array.

2

Calls compact_context

Scores all turns via the memory graph. Uses the pre-compact session snapshot as the query to bias scoring toward what’s contextually important.

3

Stores a compaction episode

Saves token stats as an episode memory: “48200→18600 tokens, 61% dropped as memory-covered”. The soul tracks its own compaction history.

4

Continues with distillation

The existing distill_trigger pipeline then extracts wisdom from the surviving turns before the window closes.

You can also trigger compaction manually from any hook or script:

# In UserPromptSubmit hook — compact before long-session turns
MESSAGES=$(jq -sc '[.[] | select(.type=="user" or .type=="assistant") | ...] | .[-60:]' "$TRANSCRIPT_PATH")
echo "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"compact_context\",\"arguments\":{\"messages\":$MESSAGES,\"query\":\"$QUERY\",\"target_ratio\":0.4}}}" \
  | chitta

Coming: distill_novel

When distill_novel: true is enabled, turns that score low AND have low memory coverage (novel content not yet in chitta) will be distilled into memories before being dropped. This means nothing is lost — novel knowledge is compressed into the memory graph rather than discarded.

The pipeline becomes: score → detect novel drops → distill to memory → then drop verbatim text. The context shrinks; the soul grows.