How Does a DNA Ancestry Test Actually Work? Step by Step
You have probably seen DNA ancestry tests advertised online and wondered: what actually happens between spitting into a tube and receiving a colourful ancestry pie chart? The process involves cutting-edge molecular biology, high-throughput microarray technology, and sophisticated computational algorithms - but it can all be understood in plain language.
In this guide we walk through every stage of a DNA ancestry test, from the moment you place your order to the moment your results appear on screen. Whether you are considering a test from Helixline, 23andMe, AncestryDNA, or any other provider, the underlying science is remarkably similar. Understanding it will help you appreciate what your results mean - and what they do not.
Quick Overview: A DNA ancestry test reads hundreds of thousands of specific positions (called SNPs) in your genome using a microarray chip, then compares your SNP pattern to reference populations from around the world to estimate where your ancestors lived. The entire journey from saliva to results typically takes 4-6 weeks.
Step 1: Ordering Your DNA Kit Online
The journey begins when you order a DNA testing kit from a provider's website. When you place an order with Helixline, for example, a kit is dispatched to your address within India. The kit box typically contains:
- A saliva collection tube: A specially designed plastic tube with a funnel lid and a stabilising buffer solution sealed inside the cap. The buffer preserves your DNA at room temperature during transit.
- Detailed instructions: A step-by-step leaflet explaining how to provide your sample correctly, including tips like not eating, drinking, smoking, or chewing gum for at least 30 minutes before collecting saliva.
- A prepaid return envelope or shipping label: So you can send your sample back to the laboratory at no extra cost.
- A unique barcode or kit ID: This links your physical sample to your online account. You register this code on the company's website or app before mailing the sample.
Registration is a critical step. Without it, the lab has no way to connect a tube of saliva to your digital profile. Most companies require you to create an account, agree to terms of service, and consent to how your data will be used before your sample can be processed.
Step 2: Saliva Collection at Home
Collecting your DNA sample is surprisingly simple. You do not need a blood draw or a cheek swab (although some kits use swabs). Most major ancestry tests, including Helixline, use passive drool saliva collection.
Why Saliva?
Saliva contains thousands of epithelial cells shed from the lining of your mouth, and each of those cells contains a complete copy of your DNA - all 3.2 billion base pairs of it. A typical 2 mL saliva sample provides between 10 and 100 micrograms of genomic DNA, which is far more than the ~0.5 micrograms needed for genotyping. This generous surplus means the lab can repeat the analysis if the first attempt produces inconclusive results.
How to Collect Properly
- Do not eat, drink, smoke, chew gum, or brush your teeth for at least 30 minutes before collection. Food particles and certain chemicals can inhibit downstream enzymatic reactions.
- Gently rub the inside of your cheeks with your tongue for 30 seconds to loosen epithelial cells.
- Allow saliva to pool in your mouth, then let it flow through the funnel into the collection tube until it reaches the fill line (usually 2 mL, excluding bubbles).
- Close the cap firmly. On most tubes, closing the cap automatically releases a stabilising buffer (typically containing ethanol and detergent) into the saliva. This buffer lyses cells, denatures proteins, and stabilises the released DNA for weeks at room temperature.
- Gently invert the tube 5-10 times to mix the buffer thoroughly with your saliva.
Common Concern: "What if I cannot produce enough saliva?" This is more common than you might think, especially for older adults or people on certain medications. Staying well hydrated in the hours before collection helps. Some providers offer cheek swab alternatives for people who have difficulty producing saliva.
Step 3: Shipping Your Sample to the Lab
Once your sample is sealed and mixed with the stabilising buffer, you place the tube in the provided return packaging and mail it to the laboratory. In India, Helixline uses tracked domestic courier services to ensure safe and timely delivery.
The stabilising buffer is crucial during this phase. Without it, bacterial contamination would begin degrading your DNA within hours at room temperature. The buffer keeps DNA stable for up to 12 months at room temperature, although samples are typically received by the lab within 1-2 weeks. DNA in stabilised saliva has been shown to remain viable even after extended exposure to temperatures of 37 degrees Celsius, making it suitable for Indian postal conditions.
Step 4: DNA Extraction from Cheek Cells
When your sample arrives at the laboratory, it is logged into the system using its barcode, and the extraction process begins. DNA extraction is the process of separating your genomic DNA from the rest of the biological material in the saliva sample - proteins, lipids, carbohydrates, and cell debris.
The Extraction Process
- Cell Lysis: Although the stabilising buffer has already partially broken open cells, the lab applies additional lysis agents (such as Proteinase K enzyme and SDS detergent) to ensure complete disruption of cell membranes and nuclear envelopes.
- Protein Removal: Proteins are denatured and precipitated using high salt concentrations or organic solvents. This step removes histones and other DNA-binding proteins.
- DNA Binding: The mixture is passed through a silica membrane or magnetic bead column. DNA binds to the silica under high-salt, low-pH conditions while contaminants flow through.
- Washing: Ethanol-based wash buffers are passed through the column to remove residual salts, proteins, and other impurities while the DNA remains bound.
- Elution: Pure DNA is released from the silica using a low-salt elution buffer (typically Tris-EDTA). The result is a clear solution containing your purified genomic DNA.
The lab then measures the quantity of DNA using a fluorometer (such as the Qubit system) and the quality using spectrophotometry (measuring the A260/A280 ratio, where a value near 1.8 indicates high-purity DNA). A typical successful extraction yields 10-50 micrograms of DNA from a 2 mL saliva sample. Only about 200 nanograms are needed for genotyping, leaving ample material for potential re-runs.
Step 5: SNP Genotyping on a Microarray Chip
This is the core technological step that makes affordable consumer DNA testing possible. Rather than sequencing your entire genome (which would cost thousands of rupees), ancestry tests use SNP genotyping microarrays to read specific, pre-selected positions across your genome.
What Are SNPs?
SNP stands for Single Nucleotide Polymorphism (pronounced "snip"). A SNP is a position in the genome where different people carry different DNA letters (A, T, G, or C). For example, at a particular position on chromosome 15, you might carry the letter A while someone else carries the letter G.
The human genome contains approximately 4 to 5 million common SNPs - positions where at least 1% of the global population carries a different variant. These natural variations are what make each person genetically unique and are the raw material that ancestry tests analyse.
SNPs are not mutations in the disease sense. They are normal variations that have accumulated in human populations over tens of thousands of years. Because different SNP variants arose in different geographic regions at different times, your particular combination of SNPs can reveal which populations your ancestors belonged to.
The Microarray (DNA Chip) Technology
The most widely used platform for consumer DNA testing is the Illumina Global Screening Array (GSA), though some companies use custom versions or competing platforms. Here is how it works:
- The Chip: A microarray chip is a glass slide roughly the size of a postage stamp, containing millions of microscopic beads. Each bead is coated with a short, synthetic DNA probe (about 50 nucleotides long) designed to bind to the region flanking a specific SNP.
- DNA Preparation: Your purified DNA is amplified (copied) using whole-genome amplification, then fragmented into pieces of approximately 300-500 base pairs.
- Hybridisation: The fragmented DNA is washed over the chip under carefully controlled temperature and salt conditions. Each DNA fragment finds and binds to its complementary probe on the chip - a process called hybridisation that takes 16-24 hours.
- Single-Base Extension: After hybridisation, a special enzyme (DNA polymerase) adds a single fluorescently labelled nucleotide to the end of each probe. The colour of the fluorescent label depends on which DNA letter (A, T, G, or C) is present at that SNP position in your genome.
- Scanning: A high-resolution laser scanner reads the fluorescent signals across the entire chip. Each SNP produces a characteristic colour pattern that reveals your genotype at that position - whether you are homozygous (two copies of the same variant) or heterozygous (one copy of each variant).
Scale of the Technology: A single Illumina GSA chip can read over 700,000 SNP positions simultaneously. The entire process from hybridisation to scanning takes approximately 24-48 hours. Modern chips can process 24 or 96 samples in parallel on a single run, which is a key factor in keeping costs low for consumers.
Step 6: Quality Control Checks
Before your genotyping data advances to the analysis stage, it must pass a series of rigorous quality control (QC) checks. These are critical for ensuring the accuracy and reliability of your results.
Sample-Level QC
- Call Rate: The percentage of SNPs that produced a clear, unambiguous reading. A typical threshold is 98% or higher - meaning at least 98% of the 700,000+ targeted SNPs must have been successfully genotyped. Samples below this threshold are flagged for re-processing or re-collection.
- Sex Check: The genotyping data is used to confirm the biological sex of the sample (based on X chromosome heterozygosity) and compared against the sex reported during registration. A mismatch may indicate a sample swap.
- Contamination Check: If the data shows an unusually high rate of heterozygous calls across the genome, it may indicate that two people's DNA was mixed in the tube, and the sample is rejected.
- Duplicate Check: The data is compared against other samples in the same processing batch to ensure no two samples are genetically identical (which would indicate a labelling or handling error).
SNP-Level QC
- Hardy-Weinberg Equilibrium (HWE): SNPs that deviate significantly from expected genotype frequencies may indicate genotyping errors and are flagged or removed.
- Minor Allele Frequency (MAF): Very rare variants (MAF below 1%) are sometimes excluded from ancestry analysis because they provide less statistical power for population assignment.
- Missingness per SNP: If a particular SNP fails to genotype in a high percentage of samples across the batch, the problem may lie with the probe rather than the samples, and that SNP is excluded.
Step 7: Bioinformatics Analysis - Comparing to Reference Populations
Once your data passes QC, the real analytical work begins. The goal is to determine which global populations your DNA most closely resembles. This is fundamentally a statistical comparison problem.
What Are Reference Populations?
Reference populations are groups of individuals whose ancestry has been carefully verified through genealogical records, geographic origin data, and genetic analysis. These individuals serve as the "standard" against which your DNA is compared.
A reference panel might include, for example:
- 1000 Genomes Project: 2,504 individuals from 26 global populations
- Human Genome Diversity Project (HGDP): 929 individuals from 51 populations
- Company-proprietary panels: Helixline maintains its own curated reference panel of 75+ South Asian populations, including community-specific reference groups from every major region of India
The quality and diversity of the reference panel directly determines how detailed and accurate your ancestry breakdown can be. This is why choosing a provider with strong South Asian reference data is critical for Indian users.
The Analytical Algorithms
Several computational methods are used in combination to estimate your ancestry:
- Principal Component Analysis (PCA): PCA reduces the complexity of hundreds of thousands of SNPs into a small number of principal components (typically 2-10) that capture the major axes of genetic variation. When plotted, individuals from the same population cluster together, and your position on the PCA plot reveals which populations you are genetically closest to.
- ADMIXTURE / Model-Based Clustering: This algorithm (also known as STRUCTURE) models each individual as a mixture of K hypothetical ancestral populations. It uses maximum likelihood estimation to determine what proportion of your genome derives from each ancestral group. The output is the familiar ancestry percentage breakdown (e.g., 45% South Indian, 30% North Indian, 15% Central Asian, 10% Southeast Asian).
- Phasing and Local Ancestry: More advanced analyses determine which specific segments of each chromosome came from which ancestral population. This is done by first phasing your genome (determining which SNPs are on which of your two chromosome copies) and then assigning each genomic segment to a reference population using hidden Markov models.
- Identity-by-Descent (IBD) Analysis: By identifying long DNA segments shared identically with reference individuals, the algorithm can pinpoint more recent ancestry connections (within the last 5-10 generations).
Step 8: Ancestry Percentage Calculation
The ancestry percentages in your report represent the statistical best estimate of how your genome can be explained as a mixture of the reference populations. Here is what you need to know about how these numbers are generated and what they mean.
How the Percentages Are Computed
The ADMIXTURE algorithm works by iteratively adjusting the proportion of each ancestral population assigned to your genome until the model best explains your observed SNP data. Technically, it maximises the likelihood function:
For each SNP position, the algorithm calculates the probability of observing your genotype given different possible ancestry proportions. It then finds the set of proportions that makes your overall genotype data most probable across all 700,000+ SNPs simultaneously.
What the Percentages Do and Do Not Mean
- They are estimates, not exact measurements. Your true ancestry percentages have confidence intervals. A result of "35% ANI ancestry" might really mean "30-40% with 90% confidence."
- They depend on the reference panel. If the reference panel does not include a population that is part of your actual ancestry, the algorithm will assign that ancestry to the closest available population, which may not be exactly right.
- They reflect averages across your genome. Due to the random nature of genetic inheritance (recombination), siblings can receive different percentages despite having the same parents.
- They are most accurate at the continental level (South Asian vs. European vs. East Asian) and become less precise at sub-regional levels (e.g., distinguishing Tamil from Telugu ancestry).
For Indian Users: Mainstream international tests often lump all South Asian ancestry into a single "South Asian" category. Helixline's India-specific reference panels with 75+ sub-populations allow for much finer resolution, distinguishing between regional and community-level ancestries across the subcontinent.
Step 9: Haplogroup Assignment
In addition to autosomal ancestry percentages, DNA tests assign you to haplogroups - ancient lineage markers that trace specific migration paths taken by your distant ancestors.
Y-DNA Haplogroups (Paternal Line)
If you are male, the test analyses SNPs on your Y chromosome to determine your Y-DNA haplogroup. The Y chromosome is passed from father to son with minimal change, so your Y haplogroup traces your direct paternal line (your father's father's father, and so on) back thousands of years.
Common Y-DNA haplogroups in India include:
- R1a (R-M420): Found at high frequencies in North India, associated with Indo-European-speaking populations and steppe pastoralist migrations
- H (H-M69): One of the most common haplogroups in India, particularly in tribal and Dravidian-speaking populations, with deep South Asian roots
- L (L-M20): Found across South Asia, with concentrations in western and southern India, possibly linked to Indus Valley Civilization era expansions
- J2 (J-M172): Present in parts of western and southern India, linked to Neolithic farming expansions from the Near East
- O (O-M175): Found in northeastern India and among Tibeto-Burman-speaking populations, reflecting East Asian connections
mtDNA Haplogroups (Maternal Line)
Both males and females inherit mitochondrial DNA (mtDNA) from their mother. Your mtDNA haplogroup traces your direct maternal line (your mother's mother's mother, and so on). Common mtDNA haplogroups in India include M, R, U, and their many sub-branches. The M macro-haplogroup and its subclades (M2, M3, M4, M5, M6, etc.) are particularly prevalent across the Indian subcontinent and trace back to the earliest modern human settlement of South Asia over 50,000 years ago.
Step 10: Report Generation and Delivery
The final step is translating all of the computational outputs into a clear, visually engaging report that you can understand and explore.
What Your Report Includes
- Ancestry Composition: A pie chart or bar graph showing your estimated ancestry percentages across different reference populations, typically broken down by region and sub-region
- Ancestry Timeline: An estimated timeline showing when your ancestors from different populations may have mixed, based on the lengths of ancestral DNA segments
- Haplogroup Reports: Your Y-DNA (if male) and mtDNA haplogroup assignments, with information about the historical migrations associated with your lineages
- Chromosome Painting: A visual map of each of your 22 autosomes showing which segments are assigned to which ancestral population
- Raw Data Download: The ability to download your full SNP genotyping data as a standard text file for use with third-party analysis tools
Reports are delivered digitally through your online account. Helixline sends an email notification when your results are ready, and you can log in to explore your ancestry interactively on the web platform or app.
Complete DNA Testing Timeline
Here is a detailed breakdown of the entire process and approximate timing for each stage:
| Step | What Happens | Timeline |
|---|---|---|
| 1. Order Kit | Kit is dispatched from warehouse to your address | 3-5 days |
| 2. Saliva Collection | You collect 2 mL of saliva into the tube at home | 5-10 minutes |
| 3. Return Shipping | Sample travels via courier to the laboratory | 3-7 days |
| 4. DNA Extraction | Purified genomic DNA is isolated from saliva cells | 1-2 days |
| 5. SNP Genotyping | 700,000+ SNPs read on Illumina GSA microarray chip | 1-3 days |
| 6. Quality Control | Sample-level and SNP-level QC checks performed | 1-2 days |
| 7. Bioinformatics | PCA, ADMIXTURE, phasing, and local ancestry analysis | 3-5 days |
| 8. Ancestry Calculation | Percentages computed against reference populations | 1-2 days |
| 9. Haplogroup Assignment | Y-DNA and mtDNA haplogroups determined | 1-2 days |
| 10. Report Generation | Results compiled into visual report and delivered online | 1-2 days |
Total estimated time from ordering to results: 4-6 weeks. The longest single phase is typically the laboratory queue - samples are batched together and processed in groups of 24 or 96 on the microarray chip, so your sample may wait until a full batch is assembled before genotyping begins.
Ready to Discover Your Ancestry?
Order your Helixline DNA kit today and experience the entire process first-hand. India's most detailed ancestry analysis with 75+ South Asian reference populations.
Get Your DNA KitUnderstanding the Science: Key Concepts Explained
What Exactly Is a SNP?
To truly understand DNA testing, you need to understand SNPs at a deeper level. Your genome is a string of 3.2 billion nucleotide bases (A, T, G, C). If you lined up the genomes of two unrelated people, they would be 99.9% identical. The 0.1% difference amounts to roughly 3-4 million positions where the base letter differs between individuals. These are SNPs.
Each SNP has two possible variants (called alleles). For example, at a given SNP, the two alleles might be A and G. You inherit one allele from each parent, so your genotype at that SNP could be AA, AG, or GG. The frequency of each allele varies between populations, and it is these frequency differences that allow ancestry estimation.
How Reference Populations Work in Practice
Imagine a simplified example with just three SNPs and three reference populations:
| SNP | Population A (North Indian) Freq of allele T | Population B (South Indian) Freq of allele T | Population C (Central Asian) Freq of allele T |
|---|---|---|---|
| SNP1 | 70% | 30% | 80% |
| SNP2 | 40% | 85% | 20% |
| SNP3 | 55% | 60% | 90% |
If your genotype at these three SNPs is T, T, C, the algorithm calculates which mixture of populations A, B, and C makes observing that genotype most likely. In reality, this calculation is performed across 700,000+ SNPs simultaneously, producing highly precise estimates.
Why Different Companies Give Different Results
You may have heard that taking tests from two different companies can yield different ancestry percentages. This is not because the genotyping is inaccurate - the SNP readings themselves are consistent across platforms (concordance rates exceed 99.5%). The differences arise from:
- Different reference panels: Each company uses its own curated set of reference populations, with different sizes and geographic coverage
- Different algorithms: The statistical methods and parameters used for ancestry estimation vary between providers
- Different population labels: One company might label a component "South Asian" while another calls it "Dravidian" or "South Indian" - the underlying genetics may be the same
- Different K values: The number of ancestral populations (K) assumed by the model affects how ancestry is partitioned. A model with K=5 will produce different percentages than one with K=20
What Happens to Your Data After Testing
Data privacy is a legitimate concern for anyone considering a DNA test. Here is what typically happens to your data at each stage:
- Physical Sample: Your saliva sample is destroyed after DNA extraction and genotyping are complete. It is not stored indefinitely.
- Genotyping Data: Your digital SNP data is stored in encrypted form on secure servers. This data is needed for your account to function and for you to access your results.
- Aggregate Research: Some companies use anonymised, aggregated data for research purposes (improving reference panels, discovering new genetic associations). At Helixline, participation in research is always optional and requires explicit consent.
- Data Deletion: You can request complete deletion of your account and associated data at any time. Helixline complies with Indian data protection regulations and honours all deletion requests.
Frequently Asked Questions
How long does a DNA ancestry test take from start to finish?
The entire process from ordering your DNA kit to receiving your ancestry report typically takes 4 to 6 weeks. Shipping the kit to you takes 3-5 days, saliva collection takes about 5 minutes, return shipping takes 3-7 days, and laboratory processing (DNA extraction, genotyping, quality control, and bioinformatics analysis) takes approximately 3-4 weeks. You will receive an email notification when your results are ready to view online.
What happens to my saliva sample after DNA is extracted?
After DNA is successfully extracted from your saliva sample, the remaining biological material is destroyed within a defined period according to the company's data retention policy. At Helixline, your physical saliva sample is destroyed after DNA extraction and genotyping are complete. Your digital genetic data is stored securely and encrypted, and you retain full control over your data with the ability to request deletion at any time. The extracted DNA itself is not stored indefinitely.
How many SNPs are tested in a DNA ancestry test?
Modern DNA ancestry tests typically analyse between 600,000 and 750,000 SNPs using microarray genotyping chips like the Illumina Global Screening Array (GSA). Helixline tests over 700,000 SNPs specifically selected for their relevance to ancestry determination, haplogroup assignment, and population differentiation. While the human genome contains roughly 4 to 5 million common SNPs, the selected markers provide comprehensive coverage because many SNPs are inherited together in blocks called haplotypes - testing one SNP in a haplotype block effectively captures information about all the other SNPs in that block.
How accurate is the DNA testing process?
The genotyping process itself is extremely accurate, with a concordance rate exceeding 99.9% for modern microarray platforms. This means that for every 1,000 SNPs read, fewer than 1 will be incorrectly called. However, it is important to distinguish between genotyping accuracy and ancestry interpretation accuracy. Genotyping accuracy (correctly reading your DNA) is near-perfect. Ancestry interpretation depends on the quality and diversity of reference populations used for comparison - continental-level estimates (South Asian vs. European) are highly reliable at 95%+, while sub-regional or community-level estimates depend heavily on the reference panel. This is why Helixline invests heavily in building the most comprehensive South Asian reference dataset available.
Conclusion
A DNA ancestry test is a remarkable convergence of molecular biology, semiconductor technology, and computational statistics. From the moment you spit into a tube to the moment your ancestry pie chart loads on screen, your sample has been through DNA extraction, microarray hybridisation, laser scanning, quality control filtering, and sophisticated statistical modelling against global reference populations.
Understanding these steps empowers you to interpret your results more intelligently. When you see a result like "42% South Indian ancestry," you now know that this number was derived from comparing your 700,000+ SNPs against a reference panel using algorithms like ADMIXTURE, and that it represents a statistical estimate with inherent confidence intervals rather than an absolute measurement.
For Indian users, the single most important factor in getting meaningful results is the quality of the South Asian reference panel. Helixline's panel of 75+ curated South Asian populations provides the finest resolution available for Indian ancestry, distinguishing between regional and community-level ancestries that international providers simply cannot match.
Ready to see your own ancestry breakdown? Order your Helixline DNA kit and experience the complete journey from saliva to ancestry insights.