2026-03-26 random matrix theory

The Eigenvalues of Randomness

What universal laws govern the eigenvalues of random matrices — and what do they reveal hidden in the noise of the stock market?

Fill a matrix with random numbers. Compute its eigenvalues. You might expect the result to be featureless — random input producing random output. But this is not what happens. The eigenvalues of random matrices follow universal laws of extraordinary precision, laws that do not depend on the distribution of the entries, the size of the matrix, or anything else you might think would matter.

These laws were first discovered by nuclear physicists trying to understand the energy levels of heavy atoms. They have since appeared in number theory, quantum chaos, wireless communications, ecology, and — most surprisingly — in the noise of financial markets. This article is about what those laws are, why they emerge, and how we can use them to separate signal from noise in real stock market data.

Wigner's Semicircle

In the 1950s, Eugene Wigner was studying the energy levels of heavy nuclei. The Hamiltonian of a uranium atom has thousands of interacting parts — far too complex to solve exactly. Wigner's radical idea: model the Hamiltonian as a random symmetric matrix and hope that the statistical properties of its eigenvalues capture something universal about quantum systems.

Take an N×N symmetric matrix M where each entry above the diagonal is drawn independently from a standard normal distribution (and M_ij = M_ji). This is the Gaussian Orthogonal Ensemble (GOE). As N grows, the distribution of eigenvalues (scaled by 1/√N) converges to a beautiful, parameter-free shape: Wigner's semicircle.

ρ(λ) = (2/π) √(1 − λ²) for |λ| ≤ 1

Try it yourself. Use the slider to change N and watch the histogram converge to the semicircle:

Eigenvalues of a Random Symmetric Matrix (GOE)

N = 100

Eigenvalue histogram

Wigner semicircle

The convergence is astonishing. At N=10, the histogram is noisy but roughly semicircular. At N=500, it's nearly indistinguishable from the theoretical curve. And here's the deepest fact: the semicircle law holds regardless of the distribution of the matrix entries — Gaussian, uniform, Bernoulli, anything with finite variance. This is universality, one of the great themes of random matrix theory.

Eigenvalue Repulsion

The semicircle tells you where the eigenvalues live. But it says nothing about how they arrange themselves at the finest scale. This is where random matrices become truly strange.

If you scattered N points randomly on a line (a Poisson process), the gaps between consecutive points would follow an exponential distribution — many tiny gaps, some large ones. But the eigenvalues of a GOE matrix behave very differently. They repel each other: small gaps are strongly suppressed. The probability of two eigenvalues being very close is not just unlikely — it vanishes quadratically.

Wigner derived a beautiful approximation for the distribution of nearest-neighbor spacings, now called the Wigner surmise:

P(s) = (πs / 2) exp(−πs² / 4)

Compare this to the Poisson (uncorrelated) spacing distribution P(s) = exp(−s). The key difference: the Wigner surmise vanishes linearly as s→0, meaning eigenvalues actively avoid each other. This is level repulsion, and it's one of the most distinctive fingerprints of random matrix statistics.

Level Spacing Statistics: GOE vs. Poisson

Matrices: 200 N = 80

GOE spacings

Wigner surmise

Poisson (uncorrelated)

Level repulsion is not a mathematical curiosity. It reveals something profound: the eigenvalues of random matrices are a correlated system, even though the matrix entries are independent. The correlation structure that emerges is a kind of "crystallization" — the eigenvalues space themselves out more regularly than randomness alone would produce.

This distinction — GOE spacing vs. Poisson spacing — turns out to be one of the most powerful diagnostic tools in physics. Quantum systems whose classical counterpart is chaotic show GOE-type repulsion. Integrable systems show Poisson statistics. The spacing distribution is, quite literally, a signature of chaos.

The Marchenko-Pastur Law

So far we've been studying symmetric matrices with random entries. But in the real world, we usually encounter a different kind of random matrix: the sample covariance matrix.

Suppose you observe T measurements of N variables. Your data is a T×N matrix X, and the sample correlation matrix is C = X^TX / T (after standardizing each column). If the variables are truly independent and the entries of X are iid, what do the eigenvalues of C look like?

If T were infinite (infinite data), C would be the identity matrix — all eigenvalues equal to 1. But with finite data, sampling noise spreads the eigenvalues. The question is: how much?

Marchenko and Pastur answered this in 1967. As N and T both grow with their ratio Q = T/N held fixed, the eigenvalue distribution of C converges to:

ρ(λ) = (Q / 2πσ²) √((λ₊ − λ)(λ − λ₋)) / λ

where the eigenvalues are confined to the interval [λ₋, λ₊] with:

λ_± = σ²(1 ± 1/√Q)²

This is the Marchenko-Pastur distribution. It tells you exactly how much "fake" structure sampling noise creates in a correlation matrix. Any eigenvalue inside the MP bounds is indistinguishable from noise. Any eigenvalue outside those bounds carries genuine signal.

Marchenko-Pastur Law: Eigenvalues of a Random Covariance Matrix

N = 80 Q = T/N = 5

Sample eigenvalues

Marchenko-Pastur theory

The key insight: When Q is small (not much more data than variables), the MP distribution is wide — sampling noise creates large fake eigenvalues that look like real correlations. When Q is large, the distribution narrows around 1, and it becomes easier to distinguish signal from noise. This is why "more data" helps: it sharpens the noise floor.

99 Stocks, 1001 Days

Now let's apply these ideas to real data. I downloaded daily returns for 99 of the largest S&P 500 stocks over approximately four years (January 2022 to December 2025 — 1,001 trading days). This gives us Q = T/N ≈ 10.1.

The empirical correlation matrix C is 99×99. If stocks were truly uncorrelated, its eigenvalues would follow the Marchenko-Pastur distribution with λ₊ ≈ 1.73. Let's see what actually happens:

Loading stock data...

Separating Signal from Noise

The Anatomy of the Signal

Level Repulsion in Real Data

The Shape of Financial Noise

We've now seen how random matrix theory separates signal from noise in a correlation matrix. But there's a deeper question: does the noise itself follow the universal laws of random matrices?

If the noise eigenvalues (those below the Marchenko-Pastur bound) genuinely represent random correlations, their spacing statistics should follow the Wigner surmise — showing the characteristic level repulsion of random matrices. If instead they were somehow structured (perhaps hiding weak signals below our detection threshold), their spacing might look more Poisson.

What Random Matrix Theory Teaches Us

The application of RMT to finance, developed by Laloux, Cizeau, Bouchaud, and Potters in the late 1990s, reversed the usual logic of statistical analysis. Normally, you build a model of your signal and test whether data fits it. RMT instead builds a precise model of noise — and anything that doesn't fit the noise model must be signal.

This is a profound shift. You don't need to know what the signal looks like. You just need to know what pure randomness looks like — and random matrix theory tells you that with extraordinary precision. The Marchenko-Pastur distribution, the Wigner surmise, the Tracy-Widom distribution for the largest eigenvalue — these are all exact, universal predictions that don't depend on any details of the underlying distribution.

The practical consequence: Of the 99 eigenvalues of the stock correlation matrix, only 7 carry genuine information. The remaining 92 are noise — artifacts of finite sampling that are indistinguishable from what you'd get if stocks were truly independent. Any portfolio optimization or risk model that treats all 99 eigenvalues as meaningful is fitting noise.

The cleaned correlation matrix — rebuilt from only the signal eigenvalues — is a better predictor of future correlations than the raw empirical matrix. This has been confirmed repeatedly in out-of-sample tests and is now standard practice in quantitative finance.

But the deepest lesson of random matrix theory goes beyond finance. It's this: even in pure noise, there is structure. The eigenvalues of random matrices are not random in the colloquial sense. They repel each other, follow universal distributions, and exhibit a hidden order that emerges from the mathematics of high-dimensional geometry. Randomness, at this level, is anything but random.

The semicircle, the Marchenko-Pastur curve, the Wigner surmise — these are the shapes of chance itself. And the fact that they are universal, appearing in nuclear physics, number theory, wireless communications, and financial markets alike, suggests that they are telling us something fundamental about the geometry of high-dimensional spaces that we are only beginning to understand.