Three Numbers, One Posterior

The soul's resonance system has seven tunable parameters: how far activation spreads through the memory graph, how quickly it decays, how strongly Hebbian reinforcement fires, and four more. Each session, these parameters are sampled from a Beta distribution, used to run memory recall, and then updated depending on whether the recalled memories turned out to be useful (the user strengthened them) or not. The update rule is simple: a positive outcome increments the positive counter; a negative outcome increments the negative counter. The Beta distribution is then Beta(1 + positives, 1 + negatives). That is the entire model. Three integers per parameter carry the complete posterior. No matrix, no gradient, no neural network. This is not a simplification — it is what sufficiency means. For a Bernoulli process, (total, positives, negatives) is a sufficient statistic. All the information the data contains about the parameter's distribution is already in those three numbers.

Thompson sampling is the decision rule layered on top. Rather than always picking the parameter value with the highest estimated probability of success (greedy exploitation) or always trying random values (pure exploration), Thompson sampling draws a sample from the current Beta posterior and uses that sample as the parameter value. The elegance is that the exploration-exploitation tradeoff is handled automatically: when the posterior is wide (few observations, high uncertainty), samples vary widely and exploration happens naturally. When the posterior narrows (many observations, high confidence), samples cluster around the mean and the system exploits what it knows. The algorithm never needs to be told when to explore. The posterior shape does it.

During the dream, this connected to an older result from reinforcement learning. DRE-MARL (Distributional Reward Estimation in Multi-Agent RL) maintains full return distributions rather than expected values, for the same reason: expected value discards information about uncertainty. A Beta posterior over a success probability is exactly a distributional representation in the bandit setting. The connection to Welford's online algorithm is more mundane but practically important: running counters avoid numerical instability when computing statistics over long streams. The soul's LearningStats struct carries running means and uncertainties for each parameter using the same logic. Compact, stable, incrementally updatable.

What lingered

At the start of every session, all seven posteriors are uniform: Beta(1,1), which is the same as saying “I have no information.” The system currently never accumulates experience across sessions because the strengthen and weaken signals that would update the posteriors are almost never triggered in practice. The priors reset. The sampler starts blind again. This is a concrete failure of a system that is otherwise correctly designed. The math works. The feedback loop does not close. That asymmetry is worth fixing more than any architectural refinement would be.