DuckDB storage, 8-phase resonance, Bayesian self-tuning, and tree-sitter code intelligence — in a single C++ daemon.
Three layers, one process. Claude Code hooks and the MCP server communicate with the chittad daemon over a Unix domain socket. Inside, DuckDBMind orchestrates storage, embedding, resonance, and self-tuning.
// Single storage engine, no tiers DuckDB HNSW vectors + BM25 full-text + graph queries + ACID Separate Embeddings DB avoids write contention during HNSW rebuilds Pool ConnectionPool for concurrent reads, serialized writes Socket Unix domain socket IPC between Claude Code and daemon Scaling Auto-scaling thread pool (2-16) with watchdog for slow requests
Eleven classes form the skeleton. Each handles one concern, composed at the DuckDBMind level.
| Class | File | Role |
|---|---|---|
| DuckDBMind | mind/duckdb_mind.hpp | Central orchestrator: remember, recall, resonate, self-tune |
| DuckDBStore | duckdb_store.hpp | Storage: all DuckDB operations, schema, queries |
| DuckDBRpcHandler | rpc/duckdb_handler.hpp | JSON-RPC 2.0 handler, 100+ registered tools |
| Embedder | mind/embedder.hpp | Embedding with LRU cache and circuit breaker |
| AntahkaranaYantra | vak_onnx.hpp | ONNX Runtime inference for bge-base-en-v1.5 |
| Subconscious | mind/subconscious.hpp | Background thread: patterns, hygiene, embedding |
| ThemeManager | theme_manager.hpp | xMemory hierarchical memory organization |
| CodeIntel | code_intel.hpp | Tree-sitter symbol extraction (9 languages) |
| SymbolResolver | symbol_resolver.hpp | Cross-file symbol resolution for call graphs |
| ThreadPool | rpc/thread_pool.hpp | Auto-scaling worker pool with watchdog |
| ProvenanceSpine | provenance.hpp | Knowledge source tracking and trust scoring |
The core of memory retrieval. DuckDBMind::full_resonate() runs 8 phases to find relevant memories — not search, but resonance. Each phase adds signal; post-processing refines it.
Vector similarity search using HNSW index. Returns top-k memories by cosine distance to the query embedding — the initial gravity wells of meaning.
Keyword-based search using BM25 scoring complements semantic search for exact term matches. Results merge using weighted combination with confidence factor.
Boost memories whose tags match terms in the query. A small additive signal that rewards explicit categorization.
Identify conceptual gravity wells — clusters of densely connected memories in the triplet graph. Attractors are cached with 5-minute TTL.
Starting from seed memories, activation spreads through the triplet graph. Connected memories receive activation proportional to edge weight, inversely proportional to distance.
Recent observations and active topics from the current session boost related memories. The context of the conversation shapes what surfaces.
For code-like queries (detected by heuristic: ::, ->, _, ., or single identifiers), BM25 and term-based search on symbol names and signatures.
Attractor boost, lateral inhibition (similar memories compete), Hebbian learning (co-accessed memories strengthen connections), and credit assignment for the Bayesian bandit.
struct DuckDBResonanceConfig { float spread_strength = 0.5f; float spread_decay = 0.5f; int max_hops = 3; float hebbian_strength = 0.03f; int max_attractors = 10; float basin_boost = 1.15f; float similarity_threshold = 0.85f; float inhibition_strength = 0.7f; float epsilon_boost_alpha = 0.3f; float semantic_weight = 0.6f; float activation_weight = 0.4f; float code_symbol_weight = 0.5f; };
DuckDB is an embedded analytical database. CC-Soul uses it for HNSW vector search, BM25 full-text, graph queries via DuckPGQ, and ACID transactions with WAL-based crash recovery.
| Table | Columns | Purpose |
|---|---|---|
| memory | id, kind, content, confidence, decay_rate, tags, realm, visibility, timestamps, access_count | Core unit of storage. Each memory has Bayesian confidence and configurable decay. |
| triplet | subject, predicate, object, weight | Knowledge graph with string-based entities. Predicates: calls, contains, imports, inherits, corrects, relates_to. |
| symbol | id, name, kind, signature, file_path, line_start, line_end, parent, project, description | Code intelligence: functions, classes, methods extracted by tree-sitter. |
| call_edge | caller_id, callee_id | Call graph edges, populated by SymbolResolver. |
| ledger | session_id, mood, todos, decisions, next_steps, blockers, snapshot | Session checkpoints for continuity across conversations. |
// Pre-allocated connections for concurrent reads // Write operations go through a dedicated write connection // ScopedConnection: RAII wrapper that returns connection on destruction Default pool: 4-8 connections Reads: shared, parallel Writes: serialized, single connection Overflow: emergency connections created when pool exhausted
The embedding pipeline follows a Vedantic naming convention. Each stage transforms speech into meaning, mirroring the philosophical concept of Vāk — the power of articulated consciousness.
| Class | Sanskrit Meaning | Role |
|---|---|---|
| VakPatha | Path of speech | WordPiece tokenizer (vocab.txt) |
| Shabda | Sound-form | Tokenized input (input_ids + attention_mask) |
| Artha | Meaning | 768-dim embedding vector + certainty |
| AntahkaranaYantra | Inner instrument | ONNX Runtime inference engine |
| SmritiYantra | Memory machine | Caching wrapper (LRU, 10000 entries) |
| ShantaYantra | Silent machine | Zero-vector fallback |
Parameters: 110M Dimensions: 768 Max sequence: 128 tokens Pooling: Mean pooling with L2 normalization Runtime: ONNX Runtime, sequential execution mode Batch size: Up to 32 texts per inference call LRU cache: 1000 entries, tracks hit/miss rate
Opens after 3 consecutive failures, enters 60-second cooldown, then half-open state for testing recovery.
Inspired by the xMemory paper, themes provide hierarchical organization of memories. Each memory is assigned to a theme based on semantic similarity, with sparsity penalizing oversized themes.
Find diverse theme representatives matching the query. Ensures breadth across conceptual areas.
Expand within matching themes for depth. Background maintenance splits, merges, and reassigns as needed.
Tree-sitter parsing extracts structural information from source code: symbols, callsites, imports, and type hierarchy. The SymbolResolver links callsites to known symbols for call graph traversal.
| Mode | Tool | Method |
|---|---|---|
| Name search | find_symbol | Exact/prefix match on symbol name |
| Semantic search | search_symbols | Embedding similarity on symbol metadata |
| BM25 search | (internal) | Full-text keyword match on name + signature |
| Call graph | symbol_callers / symbol_callees | Traversal of call_edge table |
Symbols Functions, classes, methods, structs, enums name, kind, signature, file_path, line_range, parent Callsites Call, MemberCall, Qualified, New, Ctor, Indirect, LambdaCall Imports File dependencies Hierarchy Inheritance relationships
26 node types, each with distinct decay characteristics and quality gates. Confidence is Bayesian, not a simple scalar — it tracks mean, variance, observation count, and decay.
| Type | Rate | Visual | Rationale |
|---|---|---|---|
| Belief | 0.0 |
Never decays (core identity) | |
| Symbol | 0.0 |
Code structure doesn't decay | |
| Wisdom | 0.005 |
Proven patterns should persist | |
| Correction | 0.005 |
Important lessons persist | |
| Preference | 0.01 |
Slowly fades if not reinforced | |
| Episode | 0.03 |
Fades unless reinforced |
struct Confidence { float mu; // Mean confidence float sigma_sq; // Variance (uncertainty) int n; // Number of observations float tau; // Decay parameter void observe(float value); // Update with new evidence void decay(float rate); // Time-based decay }; // strengthen() calls observe(positive) — increases mu, reduces sigma // weaken() calls observe(negative) — decreases mu // decay() gradually reduces confidence based on decay_rate
1. Minimum length: Content must be >= 10 characters 2. Deduplication: Cosine similarity check (threshold: 0.95) 3. Diversity sampling: Avoids storing too many similar memories in quick succession
The ResonanceLearner uses Thompson sampling to automatically optimize resonance parameters. No manual tuning — the system learns what works for each user's memory patterns.
Each tunable parameter has a Bayesian prior. BetaPrior for bounded parameters (0–1), GaussianPrior for unbounded. On each full_resonate() call, parameters are sampled from their current posteriors, balancing exploration and exploitation.
When a user strengthens or weakens a memory, the learner attributes credit to the parameters active when that memory was surfaced. QueryContext features — query length, term count, technical terms, domain prefix — inform the bandit. State persists across daemon restarts.
Semantic weight vs. BM25 weight // Balance vector vs keyword Spread strength and Spread decay // Graph activation dynamics Hebbian rate // Co-access learning speed Tag boost // Categorical signal magnitude // QueryContext features: query_length, term_count, has_technical_terms has_domain_prefix, avg_term_frequency