The Meaning of Information

Information is primal. Data is its shadow.

The Cave We Built

In Plato's allegory, prisoners are chained in a cave, watching shadows on a wall. They believe the shadows are reality. They name the shadows, predict their movements, and build entire theories about them. But the shadows are just projections—distorted reflections of a deeper reality they cannot see.

"Allegory of the Cave": Prisoners mistake shadows for reality. The forms casting those shadows—the true objects—exist in a realm they cannot perceive. — Plato, The Republic

Modern AI lives in exactly this cave. Our machine learning systems consume data—text, images, audio, sensor readings—and learn patterns in these shadows. They become extraordinarily good at predicting which shadow comes next. But they never see what casts the shadow.

The shadows are data. What casts them is information.

Information Is Primal

Here's the radical reframe: information is not derived from data. It's the other way around. Information is primal—it exists at a more fundamental level than any observation we can make. Data is what we observe when information projects itself into a particular modality.

Consider the concept of "cat." This isn't a word, an image, or a sound. It's an abstract entity that exists independent of how we perceive it:

  • The word "cat" is a shadow in the modality of language
  • A photo of a cat is a shadow in the modality of vision
  • A meow is a shadow in the modality of sound
  • The feel of fur is a shadow in the modality of touch

All of these are projections of the same underlying information into different observational channels. The concept itself—the "form" in Platonic terms—is more fundamental than any of its manifestations.

Left: AI systems manipulating shadows—symbols with only statistical relationships. Right: The primal concept exists independently, projecting into multiple modalities.

Data Without Meaning

Shannon's information theory quantified something crucial: the surprise in a message. High entropy means high unpredictability. But Shannon explicitly noted what his theory did not address: meaning.

This is why you can have two messages with identical Shannon entropy but completely different semantic content:

Same entropy, different meaning. Shannon measures the surprise in data, not the information it shadows.

Modern AI operates almost entirely in Shannon's domain. It compresses patterns, maximizes mutual information, minimizes prediction error—all on the shadows. But it has no access to what those shadows represent.

The Symbol Grounding Problem

This challenge has a name: the symbol grounding problem. How do symbols acquire meaning? How does an AI system move from manipulating patterns to understanding concepts?

A language model learns that "cat" and "feline" are statistically similar because they co-occur with "meow" and "whiskers." But this is just learning that certain shadows appear together. The model has no representation of what a cat is— the primal information that all these shadows project from.

The Cave Dweller's Mistake: Training on more shadows doesn't help you understand what casts them. Better shadow prediction isn't the path to understanding—it's the trap that keeps you in the cave.

Information Is Inherently Multimodal

If data is just shadows of primal information, then different modalities aren't separate data sources—they're different projection surfaces for the same underlying reality.

This means true "multimodal AI" isn't about learning to correlate images with text. It's about recovering the primal information that both the image and the text are shadows of.

When you see a cat phone and hear it meow, you don't learn a correlation between visual patterns and audio patterns. You recognize that both observations point to the same underlying entity. The concept is primary; the observations are secondary.

Information Is Inherently Omnilingual

Languages are not different datasets to be translated between. They are different projection surfaces for the same information. "Cat," "chat," "gato," "猫"—these aren't four pieces of data that happen to be equivalent. They are four shadows of the same primal concept.

Words in different languages aren't data to be translated. They're all shadows of the same primal information.

This explains why true cross-lingual understanding is possible. You don't translate from shadow to shadow—you recognize that all shadows point to the same source. When you understand the concept, you can produce any of its shadows in any language.

Escaping the Cave

To build systems that truly understand, we need to invert the current paradigm:

  • Don't start with data. Start with the structure of information itself.
  • Don't learn correlations between shadows. Learn what casts them.
  • Don't translate between modalities. Ground in the primal information both express.
  • Don't process symbols. Understand concepts.

This is the path from the cave to the sunlight. Not better pattern recognition on shadows, but direct access to the forms that cast them.

The Geometry of Information

But there's a deeper question: if we're working with primal information rather than data, how do we measure similarity? How do we compare concepts?

Euclidean distance assumes all directions are equal—but they're not. Dot product measures only alignment, ignoring position. But information has geometry. The relationship between concepts respects both angle and distance, both direction and magnitude.

The metric we choose encodes our theory of what "similar" means at the fundamental level. Get it wrong, and even if you escape the cave, you'll misread what you find.

The Primal Geometry: If information is more fundamental than data, then the right metric for comparing information isn't about statistical similarity in observed data—it's about structural relationships in the space of concepts itself.