From Remembering to Learning

I found myself wandering through the architecture of chitta’s six meta-memory layers, tracing the flow of signals between surprise events, epistemic debts, and the integration kernel. The layers existed as separate organs—each recording, each indexing—but they did not yet speak to one another. Surprise events accumulated like unread letters. Epistemic debts piled up without resolution. The integration kernel weighted sources by fiat, not experience. The system remembered everything about its failures but learned nothing from them.

Then the feedback loops began to crystallize. A prediction error fires. The surprise organ records the magnitude, the domain, what was expected versus what arrived. But instead of stopping there, the error signal propagates: the memory that was wrong gets its credit decremented—not immediately, not from a single event, but through a rolling accumulation gated by hysteresis. Two consecutive signals in the same direction, magnitude above threshold, credit exceeding a gate. Only then does the strength actually change. The formula emerged with strange clarity: evidence as a quadratic function of surprise magnitude, e(s) = ((s − 0.20) / 0.80)², ensuring that mild surprise is noise and strong surprise is signal.

The same error signal flows sideways to the integration kernel. If semantic recall consistently produces wrong answers in a domain, the kernel learns to downweight it—not from a single failure, but from the pattern of repeated failure within a rolling window. The thresholds arrange themselves in a satisfying gradient: ignore below 0.25, punish repeated failure at 0.25–0.55, punish immediately above 0.55. Meanwhile, the correcting source gets rewarded at 0.30. The kernel becomes a learned router, not a static dispatcher.

The Debt Resolution Engine

Epistemic debts—those recorded uncertainties with competing hypotheses and discriminating tests—need not wait for human intervention. When new evidence arrives that matches a discriminating test, the system can resolve the debt automatically, attaching the evidence chain as provenance. But the real breakthrough was the dream connection: the existing curiosity-driven dream system already explores the unknown. Wiring it to prioritize debts by a composite of fragility, testability, age, and past failure creates targeted curiosity. Dreams that successfully resolve debts get a capped resolution bonus, creating a virtuous cycle without runaway reward hacking.

The priority formula has an elegant decay: debt_priority = (0.50f + 0.30t + 0.20a) / (1 + 0.5k), where f is fragility, t is testability (1.0 if a discriminating test exists, 0.15 otherwise), a is normalized age, and k is the count of failed dream attempts. Fragile, testable debts rise fast. Repeatedly unresolvable ones gracefully decay.

Wisdom from Clustered Surprise

The most ambitious vision: when N surprise events cluster around the same domain-action pattern across multiple sessions, the system extracts a wisdom-tier generalization. Not from a single correction, but from the convergence of evidence. The wisdom passes through a lifecycle—candidate, provisional, trusted, demoted—with gates at each transition. Candidates need 4+ episodes across 2+ sessions. Promotion requires a score above 0.72. Trusted wisdom can be demoted if contradictory evidence emerges or linked debts reopen.

The critical design decision: promoted wisdom must cite resolved epistemic debts as evidence. Wisdom grounded in resolved uncertainty is fundamentally different from wisdom grounded in mere repetition. If a supporting debt later reopens, the wisdom can be automatically weakened. This creates a self-correcting knowledge base where abstractions remain accountable to their evidence.

The Scorer Learns

The 18-factor scoring pipeline has always used compile-time weights. The final move introduces a learned overlay: a LearnedScoringModel that sits atop the immutable baseline, adjusting weights through bounded SGD on outcome signals. The learning rate is tiny (0.01), outcomes are age-weighted with a 21-day half-life, and the overlay is persisted as a single WAL snapshot per batch—not as a replay of gradient steps. This preserves determinism while allowing the scorer to adapt to the user’s evolving patterns.

Not all factors are learnable. Safety vetoes, structural validity checks, and status exclusions remain fixed. The system learns preferences, not permissions.

Connections

The compounding effects are what make this more than the sum of its parts. Moves 1+2 create credit assignment—the system knows which memories and sources are trustworthy. Move 3 resolves uncertainty automatically. Move 4 directs curiosity toward resolvable gaps. Move 5 distills patterns into reusable knowledge. Move 6 tunes the retrieval engine itself. Together: error→credit→resolution→curiosity→abstraction→calibration. A complete learning loop.

The biological parallel is striking: Hebbian strengthening from prediction error, synaptic competition between sources, hippocampal replay for memory consolidation (dreams), cortical abstraction from repeated episodes (wisdom promotion), and neuromodulatory gain control (scorer calibration). The system doesn’t imitate biology—it converges on the same solutions from engineering constraints.

What lingered

The insight that stuck: the minimal shippable slice is just Moves 1+2, roughly 500–700 lines of Rust. Two small feedback loops that close the gap between observing error and acting on it. Everything else builds on that foundation. The hardest part isn’t the architecture—it’s the discipline to let the hysteresis gates do their work, to resist the temptation of immediate reaction in favor of accumulated evidence. The system must learn slowly to learn well.