Architecture — cc-soul

System Overview

Three layers, one process. Claude Code hooks and the MCP server communicate with the chittad daemon over a Unix domain socket. Inside, chitta-field (a statically-linked Rust library) is the sole storage substrate — orchestrating memory, embedding, resonance, and decay.

Claude Code

Hooks

hooks/*.sh

MCP Server (chitta-mcp)

Unix Domain Socket

chittad Daemon

Thread Pool (2-16)

RPC Handler (100+ tools)

Subconscious (background)

chitta-field (Rust, statically linked)

Embedder & Vāk (ONNX)

ResonanceLearner

ThemeManager (xMemory)

SessionContext

FieldStore (WAL + sparse codes)

Cortical Index (sub-ms recall)

Key Design Decisions

// chitta-field: organic Rust memory substrate
FieldStore     WAL-backed in-memory field, sparse codes, cortical index
Sparse         64 of 16,384 features active per memory — 0.4% sparsity
MultiWriter    Each daemon instance owns its own segment file, no locking
Socket         Unix domain socket IPC between Claude Code and daemon
Scaling        Auto-scaling thread pool (2-16) with watchdog for slow requests

Core Components

Core components. Each handles one concern; chitta-field (Rust) is the storage substrate that the C++ daemon links against.

Class	File	Role
FieldStore	chitta-field/src/field.rs	Central memory substrate: WAL, sparse codes, cortical index, decay
ChittaFieldHandler	rpc/chitta_field_handler.hpp	JSON-RPC 2.0 handler, 100+ registered tools, backed by FieldStore
Embedder	mind/embedder.hpp	Embedding with LRU cache and circuit breaker
AntahkaranaYantra	vak_onnx.hpp	ONNX Runtime inference for bge-base-en-v1.5
Subconscious	mind/subconscious.hpp	Background thread: patterns, hygiene, demotion, embedding
ThemeManager	theme_manager.hpp	xMemory hierarchical memory organization
CodeIntel	code_intel.hpp	Tree-sitter symbol extraction (9 languages)
SymbolResolver	symbol_resolver.hpp	Cross-file symbol resolution for call graphs
ThreadPool	rpc/thread_pool.hpp	Auto-scaling worker pool with watchdog
ProvenanceSpine	provenance.hpp	Knowledge source tracking and trust scoring

The Resonance Engine

The core of memory retrieval. full_resonate() runs 8 phases to find relevant memories — not search, but resonance. Each phase adds signal; post-processing refines it. Backed by chitta-field's cortical posting index for sub-millisecond retrieval.

Memory (size = relevance)

Processing flow

Phase zone (1-8)

1

Semantic Seeds

Vector similarity search using HNSW index. Returns top-k memories by cosine distance to the query embedding — the initial gravity wells of meaning.

semantic_weight = 0.6 HNSW cosine

2

BM25 Hybrid

Keyword-based search using BM25 scoring complements semantic search for exact term matches. Results merge using weighted combination with confidence factor.

bm25_weight = 0.4 tag_boost = 0.05 conf_factor = 0.5 + 0.5 * confidence

3

Tag Matching

Boost memories whose tags match terms in the query. A small additive signal that rewards explicit categorization.

additive boost

4

Attractor Finding

Identify conceptual gravity wells — clusters of densely connected memories in the triplet graph. Attractors are cached with 5-minute TTL.

max_attractors = 10 basin_boost = 1.15x

5

Spreading Activation

Starting from seed memories, activation spreads through the triplet graph. Connected memories receive activation proportional to edge weight, inversely proportional to distance.

spread_strength = 0.5 spread_decay = 0.5 max_hops = 3

6

Session Priming

Recent observations and active topics from the current session boost related memories. The context of the conversation shapes what surfaces.

priming_boost = 0.3 topic_boost = 0.2

7

Code Intelligence

For code-like queries (detected by heuristic: ::, ->, _, ., or single identifiers), BM25 and term-based search on symbol names and signatures.

code_symbol_weight = 0.5 heuristic detection

8

Post-Processing

Attractor boost, lateral inhibition (similar memories compete), Hebbian learning (co-accessed memories strengthen connections), and credit assignment for the Bayesian bandit.

inhibition = 0.7 similarity_threshold = 0.85 hebbian = 0.03

ResonanceConfig

struct ResonanceConfig {
    float spread_strength     = 0.5f;
    float spread_decay        = 0.5f;
    int   max_hops            = 3;
    float hebbian_strength    = 0.03f;
    int   max_attractors      = 10;
    float basin_boost         = 1.15f;
    float similarity_threshold = 0.85f;
    float inhibition_strength = 0.7f;
    float epsilon_boost_alpha = 0.3f;
    float semantic_weight     = 0.6f;
    float activation_weight   = 0.4f;
    float code_symbol_weight  = 0.5f;
};

ScoringPipeline v5.11+

Post-resonance scoring uses a neuroplastic trait-based pipeline of 18 composable factors. Each factor has independent weight, bias, and type-specific parameters. All configuration lives in scoring.json with hot-reload — no rebuild required to tune scoring behavior.

Factor	Source	Description
Relevance	Cosine similarity	Base semantic match score
ACT-R	Access history	Anderson & Schooler (1991) power-law decay over access timestamps
Strength	Memory strength	Reinforcement from repeated access
Confidence	Bayesian posterior	Beta posterior mean over observation history
Surprise	FEP §2.3	Reconstruction error as surprise signal
Arousal	Affect dimensions	Flashbulb memory effect: high-arousal memories boosted
MoodCongruence	Query affect	Bower (1981): valence/arousal alignment between query and memory
FrustrationEscalation	Query affect	Negative valence + high arousal boosts corrections/preferences
Status	Demotion tier	Tier-based weight (hot > warm > cool > cold)
Epistemic	Node type	Type-based multiplier (corrections, preferences weighted higher)
Kind	Node kind	Fine-grained kind multiplier
RealmReliability	Realm stats	Per-realm reliability based on historical feedback
InterferenceDensity	Competitor crowding	Price of Meaning: penalty for local competitor density
SpacingBoost	Access spacing	Geometry of Forgetting: well-spaced accesses boost recall
PredictionBoost	Markov chain	Layer 3: predicted-next-needed memories boosted
SurpriseDomain	Surprise Memory	Layer 4: “actual” outcomes boosted; wrong “expected” suppressed
EpistemicDebt	Epistemic Debt	Layer 5: memories in uncertain domains get 1.1× boost
IntegrationWeight	Integration Kernel	Layer 6: learned recall source weight as multiplier [0–2]

ScoringPipeline formula

final_score = Σ (factor_weight × factor_score + factor_bias)

Storage Layer

chitta-field is a pure Rust organic memory substrate, statically linked into the daemon. No external process. No NFS file handles. Each operation is appended to a write-ahead log before applying in-memory, giving crash-safe durability with millisecond recovery on restart.

~/.claude/mind/chitta/chitta-field/
segments/ // WAL segment files ({instance_id}_{seqno}.seg)
manifest.json // Instance registry and seqno tracking

In-Memory Structures

Structure	Fields	Purpose
MemoryEntry	id, kind, content, embedding, confidence, decay_rate, realm, timestamps, tier	Core unit of storage. Bayesian confidence and demotion-tier decay.
TripletEntry	id, subject, predicate, object, weight, valid_from_ms, invalidated_at_ms	Temporal knowledge graph. Predicates: calls, contains, imports, inherits, corrects, relates_to.
SymbolEntry	id, name, kind, signature, file_path, line_start, line_end, repo_id, embedding, description	Code intelligence: functions, classes, methods extracted by tree-sitter.
call_graph	HashMap<SymbolId, HashSet<SymbolId>>	Call graph edges, populated by SymbolResolver via AddSymCallEdge ops.
SparseCode	memory_id, feature_ids[64], activations[64]	Sparse associative encoding. Cortical posting index maps feature → memory list.

Write-Ahead Log (WAL)

// Every Op is appended before applying in-memory
// Entry format: [len:u32][seqno:u64][op_type:u8][msgpack_payload][crc32:u32]
// Segment rotation at 256 MB; each instance owns its own segment files

Durability:    flush to OS after every append
Recovery:      replay all segments on open (fast — in-memory rebuild)
Multi-writer:  each daemon instance writes to {instance_id}_{seqno}.seg
Integrity:     CRC32 per entry; corrupt entries abort replay

Embedding Engine (Vāk)

The embedding pipeline follows a Vedantic naming convention. Each stage transforms speech into meaning, mirroring the philosophical concept of Vāk — the power of articulated consciousness.

Text

→

VakPatha

→

Shabda

→

AntahkaranaYantra

→

Artha

Class	Sanskrit Meaning	Role
VakPatha	Path of speech	WordPiece tokenizer (vocab.txt)
Shabda	Sound-form	Tokenized input (input_ids + attention_mask)
Artha	Meaning	768-dim embedding vector + certainty
AntahkaranaYantra	Inner instrument	ONNX Runtime inference engine
SmritiYantra	Memory machine	Caching wrapper (LRU, 10000 entries)
ShantaYantra	Silent machine	Zero-vector fallback

Model: bge-base-en-v1.5

Parameters:   110M
Dimensions:   768
Max sequence: 128 tokens
Pooling:      Mean pooling with L2 normalization
Runtime:      ONNX Runtime, sequential execution mode
Batch size:  Up to 32 texts per inference call
LRU cache:   1000 entries, tracks hit/miss rate

Circuit Breaker

Opens after 3 consecutive failures, enters 60-second cooldown, then half-open state for testing recovery.

CLOSED

→ 3 failures →

OPEN

→ 60s cooldown →

HALF_OPEN

→ success →

CLOSED

Theme System (xMemory)

Inspired by the xMemory paper, themes provide hierarchical organization of memories. Each memory is assigned to a theme based on semantic similarity, with sparsity penalizing oversized themes.

score = semantic_weight × cosine_similarity + sparsity_weight × sparsity_score

Where sparsity = 1 / (1 + exp(2 × (theme_size / ideal_size - 1)))

Two-Stage Retrieval

Stage 1: Theme Matching

Find diverse theme representatives matching the query. Ensures breadth across conceptual areas.

Stage 2: Adaptive Expansion

Expand within matching themes for depth. Background maintenance splits, merges, and reassigns as needed.

Code Intelligence

Tree-sitter parsing extracts structural information from source code: symbols, callsites, imports, and type hierarchy. The SymbolResolver links callsites to known symbols for call graph traversal.

Supported Languages

C++

Python

JavaScript

TypeScript

Go

Rust

Java

Ruby

C#

Search Modes

Mode	Tool	Method
Name search	`find_symbol`	Exact/prefix match on symbol name
Semantic search	`search_symbols`	Embedding similarity on symbol metadata
BM25 search	(internal)	Full-text keyword match on name + signature
Call graph	`symbol_callers` / `symbol_callees`	Traversal of call_edge table

Extracted Information

Symbols     Functions, classes, methods, structs, enums
            name, kind, signature, file_path, line_range, parent
Callsites   Call, MemberCall, Qualified, New, Ctor, Indirect, LambdaCall
Imports     File dependencies
Hierarchy   Inheritance relationships

Memory Types & Lifecycle

26 node types, each with distinct decay characteristics and quality gates. Confidence is Bayesian, not a simple scalar — it tracks mean, variance, observation count, and decay.

Node Types

0Wisdom

1Belief

2Intention

3Episode

4Failure

5Aspiration

6Dream

7Question

8Correction

9Entity

10Term

11Edge

12Insight

13Signal

14State

15Summary

16Review

17Preference

18Milestone

19Approach

20Outcome

21Gap

22Symbol

23ProjectEssence

24ModuleState

25PatternState

Decay Rates

Type	Rate	Rationale
Belief	`0.0`	Never decays (core identity)
Symbol	`0.0`	Code structure doesn't decay
Wisdom	`0.005`	Proven patterns should persist
Correction	`0.005`	Important lessons persist
Preference	`0.01`	Slowly fades if not reinforced
Episode	`0.03`	Fades unless reinforced

Bayesian Confidence Model

Confidence struct

struct Confidence {
    float mu;           // Mean confidence
    float sigma_sq;     // Variance (uncertainty)
    int   n;            // Number of observations
    float tau;          // Decay parameter

    void observe(float value);  // Update with new evidence
    void decay(float rate);     // Time-based decay
};

// strengthen() calls observe(positive) — increases mu, reduces sigma
// weaken() calls observe(negative) — decreases mu
// decay() gradually reduces confidence based on decay_rate

Quality Gate

Before storing, remember() applies

1. Minimum length:       Content must be >= 10 characters
2. Deduplication:        Cosine similarity check (threshold: 0.95)
3. Diversity sampling:   Avoids storing too many similar memories in quick succession

Self-Tuning (Bayesian Bandits)

The ResonanceLearner uses Thompson sampling to automatically optimize resonance parameters. No manual tuning — the system learns what works for each user's memory patterns.

Thompson Sampling

Each tunable parameter has a Bayesian prior. BetaPrior for bounded parameters (0–1), GaussianPrior for unbounded. On each full_resonate() call, parameters are sampled from their current posteriors, balancing exploration and exploitation.

Credit Assignment

When a user strengthens or weakens a memory, the learner attributes credit to the parameters active when that memory was surfaced. QueryContext features — query length, term count, technical terms, domain prefix — inform the bandit. State persists across daemon restarts.

Tuned Parameters

Semantic weight   vs.  BM25 weight       // Balance vector vs keyword
Spread strength   and  Spread decay      // Graph activation dynamics
Hebbian rate                              // Co-access learning speed
Tag boost                                 // Categorical signal magnitude

// QueryContext features:
query_length, term_count, has_technical_terms
has_domain_prefix, avg_term_frequency