The Duality of Information

Seeing vectors as particles or waves—a physics-inspired paradigm for understanding high-dimensional data

The Geometric Trap

Ask anyone to visualize a vector, and they'll draw an arrow. Maybe in $\mathbb{R}^2$. Perhaps in $\mathbb{R}^3$ if they're feeling ambitious. But what about a vector $\mathbf{v} \in \mathbb{R}^{1000}$? Or $\mathbb{R}^{10000}$?

The traditional geometric view of vectors—as arrows in space—fundamentally limits our ability to imagine high-dimensional data. Our brains evolved to navigate 3D space, not to visualize the 768-dimensional embeddings of modern AI.

We're trapped by our intuitions. We project, compress, and squash high-dimensional structure into 2D plots, losing most of the information in the process. As we discussed in The Meaning of Non-Linearity, these projections are lies—shadows of a richer reality.

// The Geometric Limitation Watch the dimensions fade
Beyond 3D, our geometric intuition fails. We need a new metaphor.
The Problem: You cannot draw a 1000-dimensional arrow. You cannot visualize the angles between vectors in embedding space. The geometric metaphor breaks down precisely where we need it most.

The Particle View: Vectors as Atoms

So we need a new metaphor. What if, instead of drawing arrows, we borrow from physics—a field that routinely handles invisible, high-dimensional phenomena?

Here's the idea: see a vector not as an arrow, but as an atom.

Each dimension $v_i$ of the vector $\mathbf{v} = (v_1, v_2, \ldots, v_n)$ becomes a particle orbiting the nucleus. The value of that dimension determines the particle's "charge":

  • $v_i > 0$ → Protons (red/pink particles)
  • $v_i < 0$ → Electrons (blue particles)
  • $v_i \approx 0$ → Neutral (gray, dim)

The magnitude $|v_i|$ determines the particle's "energy"—larger values sit in outer orbits, smaller values cluster near the nucleus. Suddenly, a vector $\mathbf{v} \in \mathbb{R}^{100}$ becomes a single, comprehensible object: an atom with its unique particle distribution.

// Vector as Atom Hover over particles • Click to change vector
Each dimension is a particle. Its charge (positive/negative) and magnitude determine its position in the atomic structure.

This isn't just a pretty picture—it's a computable metaphor. Two vectors are similar if their atoms have similar particle distributions. Different dimensions contribute differently based on their "charge alignment."

The Yat as Electromagnetic Force

In physics, charged particles create forces. Opposite charges attract; like charges repel. The strength depends on both the charge magnitude and the distance.

The Yat product operates exactly like this electromagnetic analogy:

$$\text{Yat}(\mathbf{x}, \mathbf{y}) = \frac{(\mathbf{x} \cdot \mathbf{y})^2}{\|\mathbf{x} - \mathbf{y}\|^2}$$

The dot product in the numerator measures charge alignment—do the particles in corresponding dimensions have the same or opposite polarities? The distance in the denominator measures proximity—how close are the two atoms in the space?

// Yat Force Field Drag atoms to see force change
Two vector-atoms with their force field. High Yat = strong attraction. Low Yat = independence (no force).
The Physics Insight: Just like gravity follows $F \propto 1/r^2$, the Yat follows $\text{Yat} \propto 1/d^2$ where $d = \|\mathbf{x} - \mathbf{y}\|$. Nearby vectors dominate; distant vectors contribute weakly. This preserves locality—the most important property for understanding structure.

If this sounds familiar, it should. This is precisely the philosophy behind contrastive learning [1, 2]—the technique that powers modern self-supervised AI. Methods like SimCLR and MoCo train neural networks by treating similar examples as attracting particles and dissimilar examples as repelling ones. The loss function literally pushes representations together or apart.

But here's the gap: most contrastive losses use only the dot product or cosine similarity. They measure whether particles are aligned—but not how far apart they are. The Yat captures both. It's the complete electromagnetic picture: alignment and distance, attraction and proximity.

The Network: Atoms Connected by Force

Now we can scale up. With the Yat as our force law, we can build something more ambitious: a complete molecular network of an entire dataset.

Each vector becomes an atom. The Yat between every pair becomes a "bond"—a connection whose thickness represents the strength of their relationship. What emerges is a 2D network that our eyes can actually parse, even when the original space had thousands of dimensions.

// Atomic Network Drag nodes • Hover for Yat values
Each node is a vector-atom. Lines show Yat similarity—thicker means stronger relationship. Clusters emerge naturally from the force structure.

Notice how clusters form? Vectors with high mutual Yat values pull together, creating visible structure. Orthogonal vectors (low Yat) drift apart. This is information geometry made tangible.

The Yat Similarity Matrix

Networks work beautifully for dozens of vectors. But what about thousands? Millions? We need a more compact representation.

Enter the Yat similarity matrix—a heatmap where each cell $(i, j)$ encodes the Yat between vector $i$ and vector $j$. The entire relational structure of your dataset, compressed into a single image.

// Yat Similarity Matrix Hover for values • Click for new vectors
The complete pairwise relationship structure. Bright = high Yat (linear). Dark = low Yat (orthogonal). The diagonal is always maximal (self-similarity).

This matrix is the "fingerprint" of your dataset's structure. Blocks of brightness indicate clusters. Off-diagonal peaks reveal unexpected relationships. The full geometry of high-dimensional space, compressed into a single image.

The Wave View: Vectors as Signals

We've been treating dimensions as independent particles. But there's another way to look at this—one that will feel familiar if you've worked with embeddings or probability distributions.

What if, instead of independent particles, we see each vector as a coherent wave?

The transformation is simple: normalize the vector to unit length.

$$\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|} \quad \text{where} \quad \|\hat{\mathbf{v}}\| = 1$$

something profound happens. The dimensions are no longer independent—they become coupled. If one dimension grows, others must shrink to maintain the constraint $\sum_i \hat{v}_i^2 = 1$. The vector becomes a single, coherent signal.

// Particle → Wave Transformation Switch modes: Particle • Wave • Both
Particle: Raw values, independent dimensions. Wave: Normalized signal, coupled dimensions. Both: Overlay comparison.

In the wave view, each dimension is a "frequency component." The normalized vector is a probability distribution over dimensions—how much "energy" does each dimension carry relative to the whole?

🔴 Particle View

Dimensions are independent

Values are absolute charges

Magnitude matters

Good for: raw feature comparison

🔵 Wave View

Dimensions are coupled

Values are relative proportions

Only direction matters

Good for: semantic similarity

This duality isn't just a visualization trick—it's baked into how modern AI works. In contrastive learning [3], the first step is almost always to L2 normalize your embeddings. Why? Because you're switching from particle mode to wave mode. You're saying: "I don't care about magnitudes; I care about patterns."

The same transformation happens in attention mechanisms and classifiers. The softmax function [4] doesn't just normalize—it creates a probability distribution. And what is a probability distribution? It's a wave. Specifically, it's a quantum probability wave where $\sum_i p_i = 1$. Every row of attention weights, every classification output—they're all waves, not particles.

So when you wonder whether to normalize your embeddings, you're really asking: should I treat my data as particles or waves? The answer depends on your question.

Signal vs Noise: The Yat as Coherence Detector

The wave view gives us a powerful new lens. In signal processing, the fundamental question is: how much of what I'm measuring is signal, and how much is noise?

The Yat answers this question for vectors. When comparing two unit vectors $\hat{\mathbf{x}}$ and $\hat{\mathbf{y}}$ (waves), a high Yat means they're transmitting on the same "frequency"—their signals are coherent. A low Yat means they're orthogonal ($\hat{\mathbf{x}} \cdot \hat{\mathbf{y}} \approx 0$), essentially noise to each other.

// Signal Coherence Meter Adjust the wave alignment
Two signals compared. When aligned (high Yat), the coherence is strong. When orthogonal (low Yat), they're just noise to each other.
The Signal Insight: In the wave view, similarity isn't about matching absolute values—it's about matching patterns. Two signals $\mathbf{x}$ and $\alpha\mathbf{x}$ (where $\alpha > 0$) have identical waveforms but different amplitudes. The Yat captures this pattern similarity via $(\mathbf{x} \cdot \mathbf{y})^2$, while still respecting proximity via $\|\mathbf{x} - \mathbf{y}\|^2$.

The Duality Unified

Now we can see the full picture. We've been building toward this moment: the realization that both views are correct.

Just like light is both a particle and a wave—depending on how you measure it—a vector can be understood as either. The particle view and the wave view aren't contradictory; they're complementary.

  • Particle view: Use when dimensions represent independent features. Compare raw values. Think: physical measurements, sensor readings.
  • Wave view: Use when you care about patterns and proportions. Normalize first. Think: embeddings, semantic similarity, distributions.

The Yat metric works in both paradigms. In particle mode, it measures force between atoms. In wave mode, it measures signal coherence. The mathematics is the same; only the interpretation changes.

// The Duality Toggle Toggle Particle ⟷ Wave • Try presets
The same vectors, two visualizations. Particle (atoms with charges) or Wave (signals with frequencies). Toggle to see how each reveals different structure.

This duality isn't just a visualization trick—it reflects deep physics. In quantum mechanics, the wave-particle duality isn't a metaphor; it's reality. Information, too, has this dual nature. Sometimes it behaves like discrete particles; sometimes it flows like continuous waves.

The Core Principle: To understand information, you must embrace both views. The particle view shows you what the data contains. The wave view shows you how it relates. Together, they give you the complete picture that geometry alone cannot provide.

Beyond Arrows

Let's return to where we started. We were trapped by the geometric metaphor—unable to visualize the high-dimensional spaces where modern AI lives. Arrows failed us.

But physics gave us a way out. By reimagining vectors as atoms and waves, we've gained something geometry couldn't provide: a way to see structure that exists beyond three dimensions.

The Yat metric bridges both paradigms. In particle mode, it measures force. In wave mode, it measures coherence. Networks and matrices translate these relationships into images our eyes can parse. And contrastive learning, attention mechanisms, softmax layers—they all fit naturally into this framework.

This is the foundation of physics-inspired AI. By speaking the language of particles and waves, forces and fields, we can finally see what's happening in embedding space—not as abstract math, but as tangible, intuitive structure.

The geometric arrow served us well in three dimensions. But in the spaces where intelligence lives, we need the duality of information.

References

  1. [1] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations. Proceedings of the 37th International Conference on Machine Learning (ICML). arXiv:2002.05709
  2. [2] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:1911.05722
  3. [3] Wang, F., & Liu, H. (2021). Understanding the Behaviour of Contrastive Loss. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:2012.09740
  4. [4] Bridle, J. S. (1990). Probabilistic Interpretation of Feedforward Classification Network Outputs. Neurocomputing: Algorithms, Architectures and Applications, NATO ASI Series.
  5. [5] Oord, A. v. d., Li, Y., & Vinyals, O. (2018). Representation Learning with Contrastive Predictive Coding. arXiv preprint. arXiv:1807.03748