← Back to Research
2026-04-05 Culture / Data

The Attention Atlas

What 20 cultures write about on Wikipedia

Every Wikipedia language edition is written by a different community. English Wikipedia has over 130,000 active editors; Hindi has closer to 1,000. But even among large, well-established editions, the same topics receive vastly different amounts of attention. The article on sake in Japanese Wikipedia is more than ten times the relative length of sake in English Wikipedia. Polish Wikipedia's article on vodka dwarfs everyone else's. These aren't accidents — they're cultural fingerprints.

To make this visible, I collected article sizes for 215 topics across 20 Wikipedia language editions, spanning 10 categories from food and sports to religion and science. For each article in each language, I computed a normalised attention score: how much space does this topic occupy relative to other topics in the same language? The comparison isn't "which Wikipedia is bigger" but "given its size, what does each community choose to write about?"

215
Topics
20
Languages
10
Categories
4,200+
Data Points

Cultural Fingerprints

Select a language to see its cultural fingerprint — the topics it writes about disproportionately more (or less) than the global median. A positive deviation means "this language gives this topic more attention than most languages do." A negative deviation means the opposite.

Select a language above

The Most Divisive Topics

Some topics are universally important — every language writes about them at roughly the same scale. Others are deeply divisive: one culture lavishes attention while another barely mentions them. The "spread" below measures the gap between the most and least interested language editions, in standard deviations.

Topics with the highest cross-cultural variation

Topic Explorer

Search for any topic to see how 20 languages distribute their attention. Bars above zero mean this language gives the topic more space than average; bars below zero mean less. The gold line marks the global median.

Sake

Which Cultures Think Alike?

If two languages consistently give the same topics similar levels of attention, their editorial communities share something — perhaps cultural proximity, shared history, or a similar worldview. The heatmap below shows the cosine similarity between every pair of languages based on their attention vectors across all 215 topics. Bright means similar; dark means different.

Language similarity heatmap

The clustering reveals geography: Chinese and Vietnamese cluster tightly, as do German and Dutch, and Russian and Italian (perhaps reflecting shared Orthodox/Catholic heritage and historical cultural exchange). The European languages form a broad supercluster, while Turkish and Indonesian — both large Muslim-majority nations — share similarities not captured by geography alone.

Attention by Category

Averaging across all topics in a category reveals how different languages allocate their encyclopaedic attention. Select a language to compare its category profile against the global average (zero line).

Select a language above

A striking pattern: every language prioritises countries over everything else. Country articles are long everywhere. But after that, cultures diverge. English and French invest heavily in historical figures and religion. Turkish, Arabic, and Korean prioritise science. Food and drink articles are universally under-written relative to countries — perhaps because encyclopedias aspire to be about "serious" topics, while food is considered too quotidian for extended treatment.

What This Shows (and What It Doesn't)

Wikipedia article size is a noisy proxy for cultural attention. A long article might reflect a single passionate editor rather than broad cultural interest. Bot-generated content, translation projects, and editorial policies all add noise. Some of the results here are clearly editorial artifacts — Korean Wikipedia's long article on bullfighting is probably the work of one dedicated contributor, not a national obsession.

But in aggregate, the patterns are too consistent to be noise. Hindi Wikipedia overrepresents Holi and Diwali. Turkish Wikipedia overrepresents baklava and Saladin. Hebrew Wikipedia overrepresents Hanukkah and the Grand Canyon (a popular Israeli travel destination). These make cultural sense. The fingerprints are real.

The deeper finding

Wikipedia is often treated as a universal reference — the same knowledge, available in every language. But the data shows it's really 300+ different encyclopedias, each shaped by its community's interests, history, and identity. The "neutral point of view" policy governs how topics are covered, but not how much. The allocation of attention is itself a cultural act, and it varies enormously.

Method: For each of 215 topics (selected across 10 categories), article byte sizes were collected from 20 Wikipedia language editions via the MediaWiki API. Sizes were log-transformed and z-scored within each language to normalise for edition size and script encoding differences. Cultural deviation is measured as the difference between a language's z-score and the median z-score across all languages for that topic. Language similarity uses cosine similarity on the full attention vectors, with average-linkage hierarchical clustering.

Data: 215 articles × 20 languages = 4,200+ data points. Collected April 2026. Some articles are missing in smaller language editions.