The Genetic Map of India: How Ancient Migrations Made Us Who We Are
India is one of the most genetically diverse countries on Earth. With over 1.4 billion people speaking more than 780 languages across dozens of distinct ethnic and tribal groups, the subcontinent is a living archive of human migration history stretching back tens of thousands of years. But what makes India truly extraordinary from a genetic perspective is not just the diversity itself -- it is the fact that every Indian alive today carries DNA from multiple ancient migration waves, layered one upon another like geological strata.
Over the past two decades, advances in ancient DNA extraction and computational genomics have allowed scientists to reconstruct the genetic map of India with remarkable precision. Landmark studies -- including Narasimhan et al. 2019, Shinde et al. 2019, and Reich et al. 2009 -- have identified at least three major ancestral populations whose mixing, in different proportions and at different times, produced the genetic diversity we see across the subcontinent today.
This article traces those migration layers from the earliest to the most recent, explains how they combine differently across Indian regions, and shows what modern genetics reveals about the deep ancestry that every Indian carries in their DNA.
Layer 1: The First Indians -- Out of Africa (~65,000 Years Ago)
The oldest layer of Indian ancestry traces back to the great Out of Africa migration. Sometime around 65,000 to 70,000 years ago, a small group of anatomically modern humans crossed from the Horn of Africa into the Arabian Peninsula and began moving eastward along the southern Asian coastline. Archaeological evidence from sites like Jwalapuram in Andhra Pradesh suggests that modern humans had reached the Indian subcontinent by at least 74,000 years ago, though the dating remains debated.
These first migrants were the ancestors of a population that geneticists now call Ancient Ancestral South Indians (AASI). The AASI are not directly sampled from ancient DNA -- no skeletal remains old enough and well-preserved enough have been recovered from tropical South Asia. Instead, their genetic profile has been reconstructed computationally by analyzing the genomes of populations that carry the highest proportion of this ancient ancestry.
The closest living proxy for AASI ancestry are the indigenous Andamanese peoples -- particularly the Onge and Great Andamanese of the Andaman Islands. These island populations, isolated by the Bay of Bengal for tens of thousands of years, preserved a genetic signature that is almost entirely descended from the first Out of Africa migrants into South Asia. Genetic studies by Chaubey et al. (2011) and Reich et al. (2009) confirmed that the Andamanese represent an early branch of the same population that contributed the deepest ancestral layer to all mainland Indians.
Key fact: AASI ancestry is present in every Indian population tested to date. It is the foundational genetic layer of the subcontinent -- the bedrock upon which all subsequent migrations were added. South Indian tribal populations (Adivasi groups) carry the highest proportions, sometimes exceeding 70%, while northwest Indian populations carry the lowest, typically 20-35%.
For roughly 50,000 years -- from their arrival until the Neolithic period -- the AASI and their descendants were the sole inhabitants of the Indian subcontinent. They were hunter-gatherers who adapted to India's extraordinarily diverse environments, from the tropical forests of the Western Ghats to the semi-arid plains of the Deccan. During this immense span of time, they diversified into regional populations, developing the deep genetic structure that still echoes in modern Indian genomes. Learn more about this deep ancestry in our guide to Dravidian ancestry and genetics.
Layer 2: Iranian-Related Farmers (~7,000-3,000 BCE)
The second major migration into the subcontinent brought farming, animal domestication, and -- eventually -- the seeds of one of the world's great civilizations. Beginning roughly 9,000 years ago, populations related to the early farmers of the Iranian plateau and the Zagros Mountains began moving eastward. This was not a single, sudden migration but a slow, multi-millennial process of expansion driven by the spread of agricultural practices.
These migrants carried a genetic signature that geneticists call Iranian-related farmer ancestry. It is important to note that this does not mean they came directly from modern-day Iran. Rather, they belonged to a broadly related population cluster that had developed agriculture independently in western and central Asia. Ancient DNA from sites like Ganj Dareh and Tepe Abdul Hosein in the Zagros region (Broushaki et al. 2016) provided the reference genomes that helped identify this ancestral component in South Asian populations.
As these farming populations moved into the Indian subcontinent, they encountered and mixed with the existing AASI hunter-gatherers. The result of this mixing, over thousands of years, was the population that would go on to build one of the ancient world's most remarkable urban civilizations.
The Indus Valley Civilization Connection
The landmark 2019 study by Shinde et al. sequenced ancient DNA from a female individual buried at Rakhigarhi, one of the largest Indus Valley Civilization (IVC) sites, dating to approximately 2500 BCE. The results were striking: this individual's genome was a mixture of AASI and Iranian-related farmer ancestry, with zero detectable steppe ancestry. This confirmed that the Harappan civilization -- with its sophisticated urban planning, standardized weights, and extensive trade networks -- was built by a population descended from the mixing of indigenous South Asians and Iranian-related farming migrants, without any contribution from the steppe pastoralists who would arrive later.
The Narasimhan et al. (2019) study, which analyzed ancient DNA from over 500 individuals across Central and South Asia, reached the same conclusion. The genetic profile of Indus Valley populations was consistent: a mixture of AASI and Iranian-related ancestry, representing a population that had been blending for thousands of years before the IVC's urban phase.
This AASI + Iranian farmer mixture forms what geneticists later termed part of the Ancestral South Indian (ASI) component -- a composite ancestry that is found at varying levels in all modern Indians and is highest in south Indian populations. We explore this in detail in our article on ANI and ASI ancestry in India.
Layer 3: Steppe Pastoralists (~2,000-1,000 BCE)
The third and most debated migration layer arrived during the Bronze Age. Populations from the Pontic-Caspian steppe -- the vast grasslands stretching across modern southern Russia, Ukraine, and Kazakhstan -- began expanding both westward into Europe and southeastward into Central and South Asia sometime after 2000 BCE.
These steppe pastoralists, associated archaeologically with the Sintashta and related cultures, were semi-nomadic herders who had domesticated horses, developed spoke-wheeled chariots, and practiced elaborate burial rituals. Genetically, they carried a mixture of Eastern European Hunter-Gatherer (EHG) and Caucasus Hunter-Gatherer (CHG) ancestry -- a profile now widely termed "steppe ancestry" in population genetics literature.
The evidence for this migration into South Asia comes from multiple converging lines of data, thoroughly documented in our article on Aryan migration and DNA evidence:
- Ancient DNA chronology: The Rakhigarhi IVC individual (~2500 BCE) had zero steppe ancestry. Ancient DNA from the Swat Valley in Pakistan (Narasimhan et al. 2019) shows steppe ancestry appearing only after ~1200 BCE. This brackets the arrival to the second millennium BCE.
- Y-chromosome evidence: The haplogroup R1a-Z93, strongly associated with steppe pastoralists, is widespread in modern South Asia (particularly among Indo-European-speaking populations) but absent in all pre-2000 BCE South Asian samples tested to date.
- Geographic gradient: Modern Indians show a clear northwest-to-southeast gradient of steppe ancestry -- highest in Punjab, Kashmir, and Sindh, and lowest in southern India and among tribal populations.
- Linguistic correlation: Steppe ancestry correlates with Indo-European language distribution. Populations speaking Indo-Aryan languages consistently show higher steppe ancestry than Dravidian-speaking populations in the same geographic region.
The steppe migration brought with it the Indo-European language family that would eventually give rise to Hindi, Bengali, Marathi, Punjabi, Gujarati, and dozens of other languages spoken across northern and central India today. It also brought cultural practices with parallels in the Vedic texts, including horse sacrifice, fire rituals, and a pastoral lifestyle centered on cattle. For a deeper dive into this ancestry component, read our guide on steppe ancestry in India.
Important nuance: The steppe migration was not a sudden invasion or conquest. Genetic evidence indicates a gradual, centuries-long process of migration, cultural exchange, and intermarriage. The incoming steppe populations mixed extensively with the existing IVC-descended communities, and modern Indians -- even those with the highest steppe ancestry -- still carry substantially more AASI and Iranian-related farmer ancestry than steppe ancestry.
Layer 4: Later Historical Migrations
While the three major ancestral layers -- AASI, Iranian farmer, and steppe -- account for the vast majority of Indian genetic variation, the subcontinent's position at the crossroads of Asia meant that smaller but significant migration events continued throughout recorded history.
Greeks and Indo-Greeks (~300-100 BCE)
Alexander's campaigns and the subsequent Indo-Greek kingdoms (Gandhara, Bactria) left a modest genetic footprint in the northwest. Some Pashtun and Kalash populations show traces of Western Eurasian admixture that may partly reflect this era, though disentangling it from earlier steppe ancestry is challenging.
Scythians (Saka) and Kushana (~200 BCE-300 CE)
Central Asian nomadic groups, including the Saka (Scythians) and Kushana, established empires across northern India. The Jat, Gujjar, and some Rajput communities have traditionally claimed descent from these groups, and genetic studies have found slightly elevated Central Asian ancestry in some of these populations, though the signal is subtle.
Huna (Hephthalites) (~5th-6th century CE)
The Huna invasions brought another wave of Central Asian genetic influence into northwestern India. Their impact was likely concentrated in Punjab, Kashmir, and Rajasthan.
Mughal and Central Asian (~11th-18th century CE)
Turkic, Afghan, and Persianate migrations during the medieval period added small amounts of Central and West Asian ancestry to specific communities across northern India. Muslim populations in India sometimes show slightly elevated West Asian or Central Asian admixture compared to neighboring Hindu populations, though the overall genetic impact at the population level was modest relative to the ancient layers.
East Asian and Tibeto-Burman Migrations
Northeast India received substantial gene flow from East and Southeast Asian populations at various times. Tibeto-Burman-speaking groups (Naga, Mizo, Manipuri) and Tai-speaking groups (Ahom, Khamti) carry significant East Asian ancestry, creating a genetic landscape in the northeast that is distinct from the rest of the subcontinent.
Uncover Your Place on the Genetic Map
Helixline's DNA analysis reveals exactly how these ancient migration layers -- AASI, Iranian farmer, and steppe -- combine in your personal genome.
Get Your DNA KitRegional DNA Diversity: How the Layers Combine Across India
The most remarkable aspect of India's genetic map is not the existence of these ancestral layers but the dramatically different proportions in which they combine across regions, communities, and even neighboring villages. The following table summarizes approximate ancestry proportions across major Indian regions, based on published population genetics studies (Moorjani et al. 2013, Narasimhan et al. 2019, Chaubey et al. 2011):
| Region | AASI % | Iranian Farmer % | Steppe % | East Asian % |
|---|---|---|---|---|
| Northwest India (Punjabi, Kashmiri, Sindhi) | 20-30% | 35-45% | 20-30% | <2% |
| Gangetic Plains (UP, Bihar, MP) | 35-45% | 30-40% | 10-20% | <5% |
| South India (Tamil, Telugu, Kannada, Malayalam) | 45-55% | 30-40% | 5-15% | <2% |
| Northeast India (Assamese, Naga, Mizo) | 15-30% | 15-25% | 5-15% | 30-60% |
| Tribal Groups (Adivasi, Munda, Gond) | 55-75% | 15-30% | 2-10% | <5% |
| Andamanese (Onge, Great Andamanese) | 85-95%+ | <5% | <1% | <5% |
Note: These are approximate ranges based on published studies. Individual variation within each region can be substantial. The percentages reflect deep ancestral components, not modern ethnic identity.
Northwest India: The Highest Steppe Ancestry
The Punjabi, Kashmiri, and Sindhi populations of northwest India consistently show the highest proportions of steppe ancestry on the subcontinent -- typically 20-30%, and sometimes higher in specific communities. This is consistent with the northwest being the gateway through which steppe migrants entered South Asia. These populations also carry substantial Iranian-related farmer ancestry, reflecting the region's position at the western edge of the IVC's geographic range. The AASI component, while present in all individuals, is proportionally lower than in other regions.
The Gangetic Plains: A Balanced Mix
Populations of the Gangetic Plain -- encompassing Uttar Pradesh, Bihar, Madhya Pradesh, and surrounding areas -- show a more balanced mixture of all three major components. Steppe ancestry is moderate (10-20%), Iranian farmer ancestry is substantial (30-40%), and AASI is significant (35-45%). This region was historically the zone where Indo-Aryan-speaking populations (carrying higher steppe ancestry) gradually mixed with existing populations over centuries, producing the intermediate genetic profiles seen today. The Bengali population in the eastern end of this belt also carries a small but detectable East Asian component (3-8%), reflecting proximity to Southeast Asian-influenced populations.
South India: Higher AASI, Lower Steppe
South Indian populations -- including Tamil, Telugu, Kannada, and Malayalam speakers -- show a pattern that mirrors the ANI-ASI cline: higher AASI ancestry (45-55%) and lower steppe ancestry (5-15%) compared to the north. Iranian-related farmer ancestry remains substantial (30-40%), consistent with the IVC's southern extent and the farming traditions that preceded it. The genetic differences between north and south India are real and measurable, but they represent a gradient rather than a sharp boundary -- a crucial point that undermines simplistic notions of a "north-south divide."
Northeast India: The East Asian Dimension
Northeast India stands apart from the rest of the subcontinent in its genetic profile. Tibeto-Burman and Tai-speaking populations in states like Nagaland, Mizoram, Manipur, and Arunachal Pradesh carry 30-60% East Asian ancestry, reflecting migration waves from Southeast and East Asia that are largely separate from the three-component model that describes the rest of India. The Ahom of Assam, who established a kingdom in the 13th century, are a well-documented example of Tai-speaking migrants who mixed with existing populations. Even Assamese Brahmin populations show detectable East Asian admixture alongside their South Asian ancestral components.
Tribal Populations: The Deepest Ancestry
India's Adivasi (tribal) populations -- including the Munda, Gond, Bhil, Santhal, Irula, Paniya, and many others -- carry the highest proportions of AASI ancestry found on the mainland, often exceeding 60-75%. These groups have maintained higher levels of genetic continuity with the first South Asian populations, partly due to geographic isolation and partly due to endogamous marriage practices. Austroasiatic-speaking tribes (Munda, Santhal, Ho) additionally carry a detectable East Asian component (5-15%), consistent with linguistic evidence linking Austroasiatic languages to Southeast Asia.
The Andaman Islands: A Window into the Deep Past
The Andamanese peoples -- particularly the Onge and Jarawa -- represent the most extreme case of AASI ancestry preservation anywhere in the world. Their genomes are 85-95%+ AASI, with minimal or no Iranian farmer, steppe, or East Asian admixture. This extraordinary genetic isolation, maintained across tens of thousands of years by the ocean barrier of the Bay of Bengal, makes the Andamanese invaluable for reconstructing the ancestral genetic profile of the earliest South Asians. They are not "frozen in time" -- their own populations have continued to evolve and adapt -- but they provide the closest available reference for the AASI ancestral component.
The ANI-ASI Framework: How Geneticists Describe Indian Ancestry
In 2009, David Reich and colleagues at Harvard introduced a framework that transformed how scientists think about Indian population genetics. Their landmark paper in Nature proposed that modern Indians could be modeled as mixtures of two ancestral populations:
- Ancestral North Indian (ANI): Genetically related to Western Eurasians -- particularly Central Asians, Europeans, and Middle Easterners. We now know that ANI is itself a composite of Iranian-related farmer ancestry and steppe pastoralist ancestry.
- Ancestral South Indian (ASI): A population with no close relatives outside South Asia. We now know that ASI is itself a composite of AASI (the deepest indigenous layer) and Iranian-related farmer ancestry.
The key insight was that virtually all Indian populations fall on a single ANI-ASI cline -- a gradient from higher ANI in the northwest to higher ASI in the south and among tribal groups. This cline is remarkably smooth, suggesting that ANI and ASI populations mixed extensively across the subcontinent before endogamy (see below) froze the proportions within individual communities. For a full explanation, see our detailed article on ANI and ASI ancestry in India.
Important: ANI and ASI are not "pure" ancestral populations. They are themselves mixtures. ANI = Iranian farmer + steppe. ASI = Iranian farmer + AASI. The Iranian-related farmer component is shared by both, which means that even the most "ANI-shifted" and "ASI-shifted" Indian populations share a substantial common ancestry through the Iranian farmer layer.
The three-component model (AASI + Iranian farmer + steppe) that replaced the simpler ANI-ASI framework provides a more accurate picture, but the ANI-ASI terminology remains widely used in both scientific literature and consumer genetics because of its intuitive simplicity.
How Caste Endogamy Preserved Genetic Diversity
One of the most striking findings from Indian population genetics is the role of endogamy -- the practice of marrying within one's own community -- in shaping the genetic landscape of the subcontinent. A 2013 study by Moorjani et al. demonstrated that extensive mixing between ANI and ASI populations occurred across India between roughly 4,000 and 1,500 years ago, and then abruptly stopped in most communities.
This cessation of inter-group mixing, likely enforced by the formalization of the caste system and other social structures, had profound genetic consequences. Each endogamous group -- numbering in the thousands across India -- became a genetically isolated population, preserving whatever ANI-ASI proportion it had at the time mixing stopped. This is why, within any given region, different caste groups can show measurably different ancestry proportions despite having lived in geographic proximity for millennia.
The practical effects of this endogamy are striking. Genomic studies have found that many Indian jati (sub-caste) groups show levels of genetic drift comparable to populations that went through severe bottlenecks -- the founding effect of a small population that then expanded in isolation. Nakatsuka et al. (2017) estimated that some Indian groups have effective founding populations as small as a few hundred individuals, despite numbering in the millions today.
This pattern of endogamy has medical implications as well. Isolated populations with small founding sizes accumulate recessive disease-causing variants at higher frequencies. Understanding the genetic legacy of caste endogamy is therefore relevant not just to ancestry research but to public health and genetic counseling in India.
What Your DNA Can Reveal About Your Personal Migration Story
Every Indian genome is a palimpsest -- a document written over and over, with each layer of ancestry still legible beneath the ones that followed. Modern ancestry DNA testing can read these layers with increasing precision:
- Ancestral proportions: Your genome can be decomposed into approximate percentages of AASI, Iranian-related farmer, steppe, and (where applicable) East Asian ancestry. These proportions reflect the cumulative effect of thousands of years of migration and mixing in your specific lineage.
- Haplogroups: Your mitochondrial DNA (maternal line) and Y-chromosome (paternal line, if male) haplogroups can trace specific migration routes. For instance, Y-haplogroup R1a-Z93 traces to the Bronze Age steppe migration, while haplogroup H is the most common maternal lineage in South Asia, likely tracing to early AASI-descended populations.
- Regional affinity: Your overall genetic profile can be compared to reference populations across India and beyond, revealing which modern populations you are most closely related to genetically.
- Endogamy signature: The pattern of identical-by-descent (IBD) segments in your genome can reveal the degree of endogamy in your recent ancestral history -- how genetically isolated your community has been over the past 50-100 generations.
It is important to approach these results with nuance. Ancestry percentages are statistical estimates based on reference populations and computational models. They describe deep ancestral history (thousands of years) rather than recent ethnic identity. Two siblings will get slightly different estimates due to the randomness of genetic recombination. And the reference populations used by different testing companies may produce different numbers for the same person.
That said, the broad patterns are robust and scientifically validated. If you are from northwest India, you will almost certainly show higher steppe and Iranian farmer ancestry than someone from south India. If you have northeast Indian heritage, East Asian ancestry will be a significant component. If you belong to a tribal community, your AASI proportion will likely be among the highest on the subcontinent.
Frequently Asked Questions
What are the main ancestral components in Indian DNA?
Modern Indian DNA is composed of three major ancestral components: (1) Ancient Ancestral South Indian (AASI) -- the oldest layer, descended from the first humans who reached South Asia roughly 65,000 years ago; (2) Iranian-related farmer ancestry -- carried by Neolithic migrants who mixed with AASI populations between approximately 7,000 and 3,000 BCE to form the population that built the Indus Valley Civilization; and (3) Steppe pastoralist ancestry -- brought by Bronze Age migrants from the Pontic-Caspian steppe between approximately 2,000 and 1,000 BCE, associated with the spread of Indo-European languages. Additionally, northeast Indian populations carry significant East Asian ancestry, and various historical migrations have added smaller layers in specific regions.
What is the ANI-ASI framework in Indian genetics?
The ANI-ASI framework was introduced by David Reich and colleagues in 2009 to describe two ancestral populations that contributed to modern Indian DNA. ANI (Ancestral North Indian) is a composite of Iranian-related farmer and steppe pastoralist ancestry, while ASI (Ancestral South Indian) is a composite of Iranian-related farmer and AASI ancestry. Most Indians fall on a single ANI-ASI cline -- higher ANI in the northwest, higher ASI in the south and among tribal groups. The framework has since been refined into the three-component model (AASI + Iranian farmer + steppe), but ANI-ASI terminology remains widely used.
Why is India so genetically diverse?
India's extraordinary genetic diversity results from several converging factors: multiple waves of ancient migration (AASI, Iranian farmers, steppe pastoralists, East Asian populations) layered over tens of thousands of years; immense geographic diversity (Himalayas, deserts, coastlines, forests) that created natural barriers; and the practice of endogamy reinforced by the caste system, which preserved distinct genetic signatures within thousands of population groups over the past 1,500-2,000 years. Studies estimate that genetic diversity between some Indian population groups is comparable to the diversity between Europeans and East Asians.
Can a DNA test reveal my specific migration ancestry?
Yes. Modern DNA testing can estimate your proportions of AASI, Iranian-related farmer, steppe, and East Asian ancestry. It can also identify your maternal and paternal haplogroups, which trace specific migration routes. However, ancestry percentages are statistical estimates rather than exact measurements, and they describe deep ancestral history spanning thousands of years rather than recent ethnic identity. Results should be interpreted as part of a broader understanding of population history rather than as definitive labels.
Conclusion
The genetic map of India tells a story of extraordinary depth and complexity. From the first humans who walked out of Africa 65,000 years ago, to the Neolithic farmers who introduced agriculture and helped build the Indus Valley Civilization, to the Bronze Age pastoralists who brought Indo-European languages from the steppe, to the countless historical migrations that followed -- each wave added a new thread to the genetic tapestry of the subcontinent.
What makes this story remarkable is that these layers did not replace one another. They mixed. Every Indian alive today -- whether Kashmiri or Tamil, Brahmin or Adivasi, Punjabi or Naga -- carries genetic traces of multiple ancient migrations. The proportions differ, sometimes dramatically, but the shared ancestry runs deep. The AASI foundation that was laid 65,000 years ago persists in every Indian genome. The Iranian farmer ancestry that drove the rise of agriculture is present from the northwest to the deep south. Even steppe ancestry, the most variable component, is detectable across the entire subcontinent.
Understanding this genetic map is not about categorizing people or ranking populations. It is about recognizing the profound shared history that connects all Indians -- and all humans -- through an unbroken chain of migration, mixing, and adaptation stretching back to our common origins in Africa. The DNA you carry is a record of that journey, written in four billion letters, carrying the signature of every ancestral population that contributed to making you who you are.