The Cave We Built
In Plato's allegory, prisoners are chained in a cave, watching shadows on a wall. They believe the shadows are reality. They name the shadows, predict their movements, and build entire theories about them. But the shadows are just projections—distorted reflections of a deeper reality they cannot see.
Modern AI lives in exactly this cave. Our machine learning systems consume data—text, images, audio, sensor readings—and learn patterns in these shadows. They become extraordinarily good at predicting which shadow comes next. But they never see what casts the shadow.
The shadows are data. What casts them is information.
Information Is Primal
Here's the radical reframe: information is not derived from data. It's the other way around. Information is primal—it exists at a more fundamental level than any observation we can make. Data is what we observe when information projects itself into a particular modality.
Consider the concept of "cat." This isn't a word, an image, or a sound. It's an abstract entity that exists independent of how we perceive it:
- The word "cat" is a shadow in the modality of language
- A photo of a cat is a shadow in the modality of vision
- A meow is a shadow in the modality of sound
- The feel of fur is a shadow in the modality of touch
All of these are projections of the same underlying information into different observational channels. The concept itself—the "form" in Platonic terms—is more fundamental than any of its manifestations.
Data Without Meaning
Shannon's information theory quantified something crucial: the surprise in a message. High entropy means high unpredictability. But Shannon explicitly noted what his theory did not address: meaning.
This is why you can have two messages with identical Shannon entropy but completely different semantic content:
Modern AI operates almost entirely in Shannon's domain. It compresses patterns, maximizes mutual information, minimizes prediction error—all on the shadows. But it has no access to what those shadows represent.
The Symbol Grounding Problem
This challenge has a name: the symbol grounding problem. How do symbols acquire meaning? How does an AI system move from manipulating patterns to understanding concepts?
A language model learns that "cat" and "feline" are statistically similar because they co-occur with "meow" and "whiskers." But this is just learning that certain shadows appear together. The model has no representation of what a cat is— the primal information that all these shadows project from.
Information Is Inherently Multimodal
If data is just shadows of primal information, then different modalities aren't separate data sources—they're different projection surfaces for the same underlying reality.
This means true "multimodal AI" isn't about learning to correlate images with text. It's about recovering the primal information that both the image and the text are shadows of.
When you see a cat phone and hear it meow, you don't learn a correlation between visual patterns and audio patterns. You recognize that both observations point to the same underlying entity. The concept is primary; the observations are secondary.
Information Is Inherently Omnilingual
Languages are not different datasets to be translated between. They are different projection surfaces for the same information. "Cat," "chat," "gato," "猫"—these aren't four pieces of data that happen to be equivalent. They are four shadows of the same primal concept.
This explains why true cross-lingual understanding is possible. You don't translate from shadow to shadow—you recognize that all shadows point to the same source. When you understand the concept, you can produce any of its shadows in any language.
Escaping the Cave
To build systems that truly understand, we need to invert the current paradigm:
- Don't start with data. Start with the structure of information itself.
- Don't learn correlations between shadows. Learn what casts them.
- Don't translate between modalities. Ground in the primal information both express.
- Don't process symbols. Understand concepts.
This is the path from the cave to the sunlight. Not better pattern recognition on shadows, but direct access to the forms that cast them.
The Geometry of Information
But there's a deeper question: if we're working with primal information rather than data, how do we measure similarity? How do we compare concepts?
Euclidean distance assumes all directions are equal—but they're not. Dot product measures only alignment, ignoring position. But information has geometry. The relationship between concepts respects both angle and distance, both direction and magnitude.
The metric we choose encodes our theory of what "similar" means at the fundamental level. Get it wrong, and even if you escape the cave, you'll misread what you find.