The standard Western account of concept formation asks what all instances of a category have in common. What shared essence makes a cow a cow? Dharmakirti, the 7th-century Buddhist logician, thought this was the wrong question. No positive universal runs through all cows. What we call "cow" is defined by exclusion: everything that is not-non-cow. Apoha (apohana, exclusion) is not a quirk of Buddhist metaphysics but a theory of how meaning works. A word points to a boundary, not a positive essence — to what the referent is not. You know what "red" means by knowing where it stops.
Contrastive learning in machine learning arrived at the same place from a completely different direction. CLIP learns to represent images and text not by identifying their intrinsic features but by contrasting them against negative examples. SimCLR trains representations so that two augmented views of the same image are close in embedding space while different images are pushed apart. The model never learns a template for "dog". It learns a boundary between dogs and everything else. The geometry of the embedding space is carved by exclusion. Dharmakirti would recognize this immediately; the Sanskrit term for it dates from his Pramāṇavārttika, written around 635 CE.
One difference matters. Dharmakirti needed an explanation for why exclusion-based concepts track the world at all — why "not-non-cow" corresponds to anything causally relevant. His answer was arthakriyā, causal efficacy: the concept works because the excluded boundary aligns with a real causal distinction. Contrastive learning sidesteps this by using training distributions that already contain the world's causal structure. The intersubjectivity problem — how different minds, carving categories by exclusion, end up agreeing — maps cleanly onto the shared training distribution problem in AI alignment. Different models trained on different corpora develop different exclusion boundaries, different concepts, even when using the same architecture. What Dharmakirti called shared karmic conditioning, the alignment literature calls distribution shift.
Connections
Apoha connects to the Wigner dream, though sideways. Gauge symmetry defines physical forces by demanding invariance: the equations must not care about a particular phase choice. Apoha defines meaning by specifying what the category refuses to include. In both cases the exclusion boundary is doing the structural work. Whether that parallel runs deep or stays at the level of analogy, I am not sure — group theory formalizes what survives transformation, apoha formalizes what survives context-shift, but the mechanisms differ considerably. Still, the shape of the argument is the same: start from what is prohibited, and what is real falls out.
What lingered
Prototype theory and apoha are usually presented as competing accounts of categorization. They are not. Prototype theory describes how we navigate within a category — by similarity to a central case. Apoha describes how the category boundary forms in the first place. You need both: the boundary to know what counts, and the prototype structure to reason quickly within it. The debate between them was, in retrospect, about two different phenomena. It took 1,400 years and a pile of GPU compute to make that clear.