Interactive atlas

Clusters, flows & the deep-time frontier

Toggle the layers below. Population clusters are spatio-temporal groups found by DBSCAN, sized by sample count and coloured by region. Migration flows are the top modelled directional links between an older source and a younger, genetically similar neighbour. Pleistocene samples are the rare Ice-Age genomes (≥15,000 years old).

The findings

Top five migration events

Ranked for global significance, prioritising the Out-of-Africa dispersal. Each event pairs the genomic signal with the environmental and technological pressures that drove it.

1

The Out-of-Africa dispersal

~70,000 – 40,000 years ago · NE Africa → Arabia → all of Eurasia

Modern humans expanded from Africa along a southern coastal arc and a northern continental route, reaching Siberia (Ust'-Ishim, ~45 ka), Europe (Zlatý kůň, Bacho Kiro) and China (Tianyuan, ~40 ka) — meeting and interbreeding with Neanderthals on the way.

Drivers: MIS 3 climate swings and humid "Green Arabia" pulses opened and closed corridors; fully modern cognition powered rapid range expansion.

Confidence: Moderate — outcome confirmed, route inferred
2

Anatolian farmers settle Europe

~8,500 – 6,000 years ago · Çatalhöyük → Danube & Mediterranean → Iberia

The first farmers spread agriculture across Europe in a wave of people, not just ideas. The diagnostic Y-haplogroup G2a dominates early European farmers and is near-absent in the hunter-gatherers they replaced.

Drivers: early-Holocene warming made rain-fed cereal farming viable; agriculture's higher carrying capacity fuelled demic diffusion.

Confidence: High
3

The Steppe (Yamnaya) expansion

~5,000 – 4,500 years ago · Pontic-Caspian steppe → central & northern Europe

Pastoralists from the steppe (Khvalynsk → Yamnaya, carrying R1b/R1a) swept west as Corded Ware and Bell Beaker, in places replacing up to ~75% of the local gene pool — a second remaking of Europe.

Drivers: wagons, horses and dairy pastoralism; the 5.2 ka and 4.2 ka aridification events favoured mobile herders; possibly early plague.

Confidence: High
4

Peopling of the Americas

~16,000 – 13,000 years ago · NE Siberia → Beringia → the Americas

A founding population crossed the Bering land bridge and spread the length of two continents with astonishing speed. Every Indigenous American cluster carries the Beringian founder signature (Y-hg Q1b); Anzick-1, ~12.7 ka anchors the lineage.

Drivers: low glacial sea level exposed Beringia; post-LGM deglaciation opened both coastal and interior routes after a genetic "standstill".

Confidence: High
5

The Austronesian / Lapita expansion

~3,500 – 700 years ago · Island SE Asia / Taiwan → Remote Oceania

Seafarers carried an Asian genetic signature (Y-hg O) across the Pacific to Vanuatu, the Marianas (Guam Latte / Chamoru) and ultimately Polynesia, later mixing with Papuan populations.

Drivers: outrigger and double-hulled canoes plus open-ocean navigation; a farming-and-fishing package that made distant islands habitable.

Confidence: High
Other waves worth noting: the Sintashta–Andronovo Indo-Iranian expansion eastward into Central & South Asia (~4–3 ka, carrying R1a-Z93); late-Holocene movements along the East African / Swahili coast; and Bronze-Age gene flow within the Levant. These appear in the cluster data but sit just outside the top five.
An honest map of our ignorance

The data is Eurocentric and Holocene-heavy

A responsible analysis foregrounds its own gaps. Ancient DNA survives best in cool, recent, well-excavated places — which badly skews what we can see.

Histogram showing ancient-DNA samples are overwhelmingly from the last 10,000 years
Sampling through time. 99.6% of ancient samples are younger than 30,000 years; the Out-of-Africa window holds just 68. The deep past is a data desert.
World map of Pleistocene ancient-DNA samples, all in Eurasia, none in Africa
Every Ice-Age genome (≥15 ka) sits in Eurasia. Africa — the source of all dispersals — has no ancient DNA this old, because warm climates destroy it.
Flagged limitations. (1) Africa is just ~1% of samples and none older than ~18.5 ka, inverting the true demographic history. (2) The raw genotype matrix was not processed — "genetic similarity" uses curated population labels and uniparental haplogroups as proxies. (3) The climate layer is a date-keyed Quaternary chronology, not sample-measured cores, so climate is context rather than a tested cause. Throughout, confirmed findings rest on in-hand aDNA; inferred findings are reasoned from downstream evidence and labelled as such.
For a mixed audience of geneticists & archaeologists

Reading the human journey from a partial archive

The story this dataset tells is, at first, a story about its own gaps. We assembled 19,029 ancient individuals from the Allen Ancient DNA Resource, each carrying a place, a date, a curated ancestry label and, often, a paternal and maternal haplogroup. Yet when we plot those samples through time, 99.6 percent of them are younger than 30,000 years, and more than two-thirds come from Europe. The very window we set out to prioritise — the Out-of-Africa dispersal between 70,000 and 30,000 years ago — contains just sixty-eight individuals, and the African homeland that launched every later migration is represented by fewer samples than a single medieval European cemetery. Any honest reconstruction must therefore separate what the genomes confirm from what we infer around their silences.

What the genomes confirm is remarkable enough. Unsupervised DBSCAN clustering on a combined geographic-and-temporal metric recovered 226 spatio-temporal populations whose haplogroup signatures align precisely with the known archaeological record. We can watch a single thread run from Çatalhöyük's farmers in Anatolia (Y-haplogroup G2a) out across Europe as the Linearbandkeramik, their agricultural package advancing in step with early-Holocene warming and largely overwriting the resident hunter-gatherers. We can watch a second thread gather on the Pontic-Caspian steppe — Khvalynsk, then Yamnaya, marked by R1b — and burst westward around 5,000 years ago on the back of wagons, horses and dairying, remaking the European gene pool a second time as Corded Ware and Bell Beaker. The same steppe engine turns east as Sintashta and Andronovo, carrying R1a-Z93 and chariot technology toward Central and South Asia. Each of these events surfaced not as an assumption but as a top-ranked edge in our migration-flow model, where an older population is linked to a younger, genetically similar, geographically reachable neighbour.

Beyond Eurasia, the founder signatures are equally clean. Every Indigenous American cluster, from Anzick-1 in Pleistocene Montana to the Classic Maya, carries the Beringian Q1b paternal lineage — the fingerprint of a small population that paused in a glacial refugium while sea levels were low, then spread the length of two continents once deglaciation opened the coastal and interior routes. In Remote Oceania, the Mariana Islands' Latte-period people and the settlers of Vanuatu carry East Asian O lineages, the maritime calling-card of the Austronesian expansion that outrigger canoes made possible. In each case the driver is legible: a climate door opening, a technology unlocking new range, or a demographic engine — farming's carrying capacity — pushing one people into another's land.

The deepest and most important migration, however, we can only see in silhouette. The Out-of-Africa dispersal left no in-window genomes in its African or Arabian source. We infer it instead from its descendants: Ust'-Ishim in Siberia and Zlatý kůň and Bacho Kiro in Europe, all near 45,000 years old, with the Oase individual preserving a Neanderthal ancestor only a handful of generations back. From these downstream points, and from the climate record of MIS 3 with its intermittently "green" Arabia, we reconstruct a dispersal threading humid corridors out of Africa during favourable pulses — a confident outcome resting on an inferred route.

The methodological honesty matters as much as the findings. We did not process the multi-gigabyte genotype matrix; our "genetic similarity" rests on curated labels and uniparental markers, and our climate layer is a date-keyed chronology rather than sample-measured cores. The clustering's resolution depends on a tuned space-time exchange rate. None of this overturns the signal, but all of it bounds our certainty. The clearest scientific message is therefore double-edged: ancient DNA has turned the last ten thousand years of Eurasian prehistory into something we can read almost like a ledger, while the older, tropical and African chapters — the ones that actually made us a global species — remain frustratingly, instructively blank.

Reproducibility

Methods, data & downloads

The full pipeline and every intermediate output are available below.

World map of population clusters and modelled migration flows
The full analysis map: 154 mapped population clusters (coloured by region, sized by sample count) and the 55 highest-scoring modelled migration flows.
StepMethodOutput
SourceAADR v66.p1 annotation file (Harvard Dataverse, doi:10.7910/DVN/FFIDCW)23,089 individuals → 19,029 ancient samples
Clean & unifyFilter to dated, geolocated ancient samples; derive region & climate contextunified_samples.csv
ClusterDBSCAN in 3-D Earth-Cartesian geography + time (2.5 km/yr exchange rate, eps=360 km)226 clusters · cluster_summary.csv
Model flowsDirected edges: older→younger, genetic-similarity (haplogroup cosine) × proximity × recency8,205 flows · migration_flows.csv

Suggested next steps

  • Ingest the genotype matrix and run genome-wide PCA / qpAdm / F-statistics to replace haplogroup proxies and quantify admixture per migration.
  • Merge real paleoclimate series (NOAA/PANGAEA cores) by space-time nearest-neighbour to test climate causation statistically.
  • Target the gaps: prioritise African, Arabian and South/SE Asian Pleistocene sampling; integrate sedimentary aDNA.
  • Apply spatially-explicit coalescent / diffusion models to estimate migration rates and directions with uncertainty.
  • Overlay pathogen aDNA (e.g. Yersinia pestis) to test disease as a co-driver of population turnovers.