The Long Walk / Articles / Neanderthal DNA in Your Raw Data
DNA & Genetics

How to Find the Neanderthal DNA Hiding in Your Raw Data

A consumer DNA test leaves you with a plain text file of about 600,000 genetic markers. Buried inside are fragments you inherited from Neanderthals — and with a little Python, you can map exactly where they sit on your chromosomes.

The short answer

Everyone whose ancestors left Africa carries roughly 1–2% Neanderthal DNA, broken into short segments scattered across the genome. You can find those segments yourself: download your raw DNA file, compare it against sequenced Neanderthal genomes and African allele frequencies, then slide a window along each chromosome to "paint" the archaic stretches. It's a satisfying weekend approximation — not a clinical measurement.

If your ancestors left Africa, you are part Neanderthal — not metaphorically, but measurably. Roughly 1–2% of the genome of a typical person of non-African descent was inherited from Homo neanderthalensis, a cousin lineage that vanished around 40,000 years ago. Most people meet that number as a single line in a testing company's dashboard: "You have more Neanderthal DNA than 60% of customers." Tidy, but abstract.

Here's the part the dashboards don't advertise: the same raw file those companies let you download contains enough information to find the actual Neanderthal segments yourself. I spent a weekend building a small tool to do exactly that — to turn "you're about 2% Neanderthal" into a literal picture of which stretches of which chromosomes came from an archaic ancestor. This is how it works, and how you can try it on your own file.

Where the Neanderthal DNA in your genome comes from

Around 50,000–60,000 years ago, modern humans expanding out of Africa walked into a Eurasia that Neanderthals had occupied for hundreds of thousands of years. The two groups met, and they interbred — more than once, in more than one place. When the first draft of the Neanderthal genome was published in 2010, it revealed the smoking gun: people outside Africa carry a small but unmistakable slice of Neanderthal ancestry.

Because that mixing happened after the migration out of Africa, the pattern is geographic. Sub-Saharan Africans carry very little Neanderthal DNA, while Europeans, East Asians, and their descendants carry the most — with East Asians averaging slightly more than Europeans. A second archaic group, the Denisovans, also interbred with modern humans, leaving their largest footprint in the people of Melanesia, New Guinea, and Aboriginal Australia.

Over the roughly 2,000 generations since, recombination — the reshuffling of DNA each generation — chopped that archaic contribution into thousands of short segments scattered across the genome. Add the fragments up and you get the familiar 1–2%. The distribution isn't random, either: there are "archaic deserts," regions almost scrubbed clean of Neanderthal ancestry (parts of the X chromosome, and stretches around genes tied to fertility and the brain), where natural selection weeded archaic variants out. Elsewhere, some Neanderthal variants stuck around because they were useful — several relate to immune response and skin biology.

Approximate archaic DNA by populationNeanderthalDenisovan
Sub-Saharan African~0–0.5%~0%
European~1.8–2.4%trace
East Asian~2.3–2.6%small
Papuan / Melanesian~1.5–2%~3–5%

These ranges are approximate and shift depending on the study and the method used — which is exactly why a homemade estimate should be read with humility, as we'll see.

What's actually in a raw DNA file

When you download "raw data" from 23andMe, AncestryDNA, or MyHeritage, you get a text file with one row per genetic marker. Each row lists a marker ID, a chromosome, a position, and your genotype — two letters like AG. There are usually 600,000–700,000 rows. That's a sparse sample of a genome three billion letters long, but it's a clever sample: it concentrates on positions where humans are known to differ, which is exactly where archaic ancestry leaves its fingerprints.

How the chromosome painting works

The trick is to compare your genotypes against two reference datasets and hunt for one specific signature. It comes down to three moves.

1. Build a list of archaic-informative markers

The Max Planck Institute has published high-quality sequences of several Neanderthals and a Denisovan. From these you can find positions where a Neanderthal carried a particular allele. Then you cross-reference the 1000 Genomes Project — a catalogue of modern human variation — and keep only the positions where that allele is rare in Africans but present in non-Africans. Carrying such an allele is a clue you inherited it from a Neanderthal ancestor rather than from the shared ancestral population. These filtered positions are your "archaic-informative markers."

2. Score your own genotypes

For each marker, count how many copies of the Neanderthal allele you carry: zero, one, or two. Most will be zero — these alleles are rare by design — but inside a genuine inherited segment, they cluster together.

3. Slide a window and paint

Move a window along each chromosome and measure how enriched each stretch is for Neanderthal alleles. Where the signal stays elevated across a run of markers, flag it as a candidate Neanderthal segment and colour it on a diagram of your chromosomes. The result is a picture where most of your genome is blank, with a scattering of short bands marking your deep archaic inheritance.

What the result actually means

When I ran this on my own file, the total landed right where the textbooks say it should: somewhere in the ~1–2% range typical of people whose ancestry traces outside Africa. I'm deliberately not quoting a number to the decimal — a homemade pipeline like this can't honestly justify that kind of precision.

That's the honest framing: a result like this is a confirmation, not a measurement. It agrees with the population baseline partly because it's built to detect exactly that signal, and the segment boundaries it draws are fuzzy approximations. Professional methods — tools like SPrime and IBDmix — use phased haplotypes, recombination maps, and proper statistical models to do this rigorously; a weekend script does not. Treat your painting as a window onto the idea, not a clinical readout. It's a genealogical, educational exercise — not a medical or diagnostic test.

Try it yourself: the short version

  1. Export your raw data. In your testing account's settings, find "download raw DNA data." Note the genome build — most consumer files are GRCh37 / build37, which matters for the next step.
  2. Gather the reference data. Download archaic genomes from the Max Planck archaic genome resources (Altai or Vindija Neanderthal, plus Denisova) and African allele frequencies from the 1000 Genomes Project. Fair warning: these files are large — gigabytes per chromosome.
  3. Build your marker list. Keep the positions where the archaic allele is rare in Africans but present elsewhere — your archaic-informative markers.
  4. Run the painting. A short Python script (pandas, NumPy, matplotlib) intersects your genotypes with the markers, scores the archaic alleles, slides a window along each chromosome, and renders the coloured map.
  5. Sanity-check. Your total should land near that 1–2% baseline. If it's wildly off, suspect a genome-build or strand mismatch before you believe anything exciting — that's the most common DIY pitfall.

Why bother?

Because it makes deep time personal. A band sitting on chromosome 1 or chromosome 8 isn't a statistic — it's a physical inheritance from someone who lived roughly 2,000 generations ago, in an ice-age Eurasia, in a population that no longer exists anywhere except, in scattered fragments, inside us. The same out-of-Africa migration that seeded the entire planet is the reason that DNA is in your file at all. Seeing it drawn on your own chromosomes turns an abstract fact into something you can point at.

Neanderthal DNA entered our species during the great expansion out of Africa. See where those paths crossed on the interactive migration map — then trace where Neanderthals branch from us on the deep-time tree.

Explore the migration map →
Sources & further reading
  1. Green, R. E. et al. (2010). "A Draft Sequence of the Neandertal Genome." Science 328. science.org
  2. Prüfer, K. et al. (2014). "The complete genome sequence of a Neanderthal from the Altai Mountains." Nature 505. nature.com
  3. Sankararaman, S. et al. (2014). "The genomic landscape of Neanderthal ancestry in present-day humans." Nature 507. nature.com
  4. 1000 Genomes Project Consortium (2015). "A global reference for human genetic variation." Nature 526. internationalgenome.org
  5. Max Planck Institute for Evolutionary Anthropology — archaic genome resources. eva.mpg.de