Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;56(1):143-151.
doi: 10.1038/s41588-023-01582-w. Epub 2023 Dec 20.

Accurate detection of identity-by-descent segments in human ancient DNA

Affiliations

Accurate detection of identity-by-descent segments in human ancient DNA

Harald Ringbauer et al. Nat Genet. 2024 Jan.

Abstract

Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the ancIBD algorithm.
a, Sketch of the ancIBD HHM. The HMM has five states: one background state of no allele sharing and four states modelling the four possible IBD-sharing states between two phased diploid genomes. We model phase switch errors within a true IBD segment as a transition between the four IBD states. b, Visualization of the full pipeline to call IBD. First, aDNA data are imputed and phased using GLIMPSE and a panel of modern reference haplotypes. We note that users can customize these upstream steps; for example, use other tools to obtain genotype likelihoods or use different reference panels. Our core software (ancIBD) is then applied to the imputed data to screen for IBD. It produces two tables, one listing all inferred IBD segments and one listing IBD summary statistics for each pair of individuals.
Fig. 2
Fig. 2. Performance of ancIBD on simulated IBD segments.
a, Power and segment length errors. We copied-in IBD segments of lengths 4, 8, 12, 16 and 20 cM into synthetic diploid samples. We simulated shotgun-like and 1240k-like data (Supplementary Note 2) and visualize false positive, power and length bias for 2×, 1×, 0.5× and 0.25× coverage (rows). For each parameter set and IBD length, we simulated 500 replicates of pairs of chromosome 3, each pair with a single, randomly placed, copied-in IBD segment. The power (or recall) of detecting IBD segments of each simulated length is indicated in the text next to the corresponding grey vertical bar. Results for other coverages are shown in Supplementary Fig. 4. b, False positive rate. We downsampled high-quality empirical aDNA data without IBD segments (Supplementary Table 6) to establish false positive rates of IBD segments for various coverage and IBD lengths (Supplementary Note 7). The y axis shows the mean number of false positive IBD segments per pair of chromosome 3 in each length bin (bin width 0.25 cM). To contextualize these false positive rates, we also depict expected IBD sharing assuming various constant population sizes (dotted lines, calculated as described in ref. ). If the false positive rate is on a similar order of magnitude or larger than expected for a population of that effective population size (Ne), individual IBD calls of that length for that coverage and demographic scenario are likely to be false positives. Source data
Fig. 3
Fig. 3. Inferring biological relatives in the aDNA record using long IBD inferred with ancIBD.
a, Inferred IBD among pairs of 4,248 ancient Eurasian individuals. The plot visualizes both the count (y axis) as well as the summed length (x axis) of all IBD >12 cM long. For comparison, we colour-code pairs on the basis of relatedness estimates from pairwise mismatch rates (PMR) that can detect up to third-degree relatives (Supplementary Note 9). We also annotate new relatives found by ancIBD, indicated by at least three very long IBD segments (>20 cM) typical of up to sixth-degree relatives. b, Simulated IBD among pairs of relatives. For each relative class, we simulated 100 replicates using the software ped-sim, as described in Supplementary Note 8. As in a, we depict the summed length and the count of all IBD at least 12 cM long. c, Inferred IBD among four ancient English Neolithic individuals, who lived about 5,700 years ago and were entombed at Hazleton North long cairn. A full pedigree was previously reconstructed using first- and second-degree relatives inferred using pairwise SNP matching rates. We depict all IBD at least 12 cM long. The four individuals were genotyped using 1240k aDNA capture (I12438, 3.7× average coverage on target; I12440, 2.1×; I13896, 1.1×; I12439, 6.7×). Source data
Fig. 4
Fig. 4. Inferred IBD segments between various Eneolithic and Bronze Age West Eurasian Groups.
We visualize IBD segments 12–16 cM long (for IBD sharing in other length classes see Extended Data Fig. 3). We applied ancIBD to identify IBD segments between all pairs of 304 West Eurasian ancient individuals (all previously published data; Supplementary Table 3) organized into 24 archaeological groups. The number in the parenthesis indicates the sample size for each archaeological group. For each pair of groups, we plot the fraction of all possible pairs of individuals that share at least one IBD 12–16 cM long, which we obtained by dividing the total number of pairs that share such IBD segments by the total number of all possible pairs: between two different groups of n1 and n2 individuals, one has n1n2 pairs, while within a group (on the diagonal in the figure) of size n one has n(n − 1)/2 pairs. LN, Late Neolithic; BAC, Battle Axe Culture; C, Chalcolithic; TRB, Trichterbecherkultur (Funnelbeaker culture); GAC, Globular Amphora Culture. Source data
Fig. 5
Fig. 5. A geographically distant pair of ancient biological relatives detected with ancIBD.
a, When screening ancient Eurasian individuals for IBD segments (Fig. 3), we detected a pair of biological relatives whose remains were buried 1,410 km apart, one in central Mongolia and one in Southern Russia. The two individuals were previously published in two different publications,. Both individuals are archaeologically associated with the Afanasievo culture and genetically cluster with other Afanasievo individuals,. b, Posterior of non-IBD state on chromosome 12, which has the longest inferred IBD segment (39.1 cM long, indicated as a dark blue bar). We also plot opposing homozygotes (upper grey dots), whose absence is a necessary signal of IBD. Only SNPs where both markers have an imputed genotype probability >0.99 are plotted. c, Plot of all inferred IBD segments longer than 12 cM. d, Histogram of inferred IBD segment lengths, as well as theoretical expectations for various types of relatives (calculated using formulas described in ref. ). Panels bd were all created using default plotting functions bundled into the ancIBD software package. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Pipeline to simulate IBD segment data.
We visualize our steps to simulate IBD segment data (see detailed description in Supplementary Note 2). Starting from TSI (Tuscany) high-quality reference haplotypes in the 1000 Genome panel (A), we created haplotype mosaics (B) as any long IBD segment is removed from those. We then copied over IBD segments of the target length (C). We grouped two mosaic haplotypes to obtain diploid individuals but to simplify visualization here we do not depict the second haplotype per individual. (D): To create data typical for imputed low-coverage aDNA, we matched each genotype to a random matching genotype in a panel of aDNA diploid genotypes called from high-coverage aDNA (either 1240k or WGS aDNA data). We then downsampled the high-coverage aDNA panel to the target coverage, imputed genotype probabilities and copied those back to each match.
Extended Data Fig. 2
Extended Data Fig. 2. Precision and recall of ancIBDand IBISat various length bins and coverages.
We applied both methods with their default settings to genotype data imputed after downsampling to various coverages. For each coverage, we report the average precision and recall of each length bin across 50 independent replicates. The error bar represents ± SE of the estimated precision and recall. Each row represents a length bin and each column represents one input data type (either WGS data or 1240k data). Note that the y axis ranges are different for different rows. Source data
Extended Data Fig. 3
Extended Data Fig. 3. IBD sharing matrix of various Eneolithic & Bronze Age West Eurasian Groups for four IBD length scales.
As in Fig. 4, but for shared IBD [8 − 12 cM], [12 − 16 cM], [16 − 20 cM], > 20 cM long. We used ancIBD to infer IBD segments between all pairs of groups and visualize the fraction of pairs that share at least one IBD for each pair of populations and for the four different IBD length bins. Source data
Extended Data Fig. 4
Extended Data Fig. 4. Downsampling of Hazelton pedigree samples.
We downsampled all individuals from a previously published English Neolithic pedigree with coverage at least 1x both to 1x and 0.75x. For each coverage, we downsampled 10 times, each with different random seeds, to create 10 replicates. Therefore, not all dots are independent pairs of relatives; they may be the same pair downsampled with different random seeds. The relationship annotations are obtained from Supp. Table 5 of ref. . All relatives more distant than 3rd degree are depicted as hollow dots. Source data
Extended Data Fig. 5
Extended Data Fig. 5. Runtime Benchmarks of ancIBD.
To benchmark runtimes, we applied ancIBD on empirical ancient DNA data in .hdf5 format imputed at 1240k sites. We used the imputed hdf5 file from the Eurasian application (Fig. 3), choosing samples and pairs at random. Left: For each sample pair, all autosomes are screened for IBD. In one experiment all pairs of samples were run independently, leading to a linear dependency on pair number, as expected. In a second experiment, all samples were loaded into memory and then each sample pair was screened for IBD. The apparent sub-linear behaviour is due to the fact that loading n samples scales slower than the actual runtime of n(n − 1)/2 sample pairs. Right: We depict the runtimes normalized per sample pair when screening all pairs of sample batches of various sizes for IBD. We visualize the loading time (the time it takes to load the hdf5 genotype data into memory), the preprocessing time (including preparing the transition and emission matrix), as well as the runtime of screening for IBD that includes the forward-backward algorithm as well as postprocessing. Due to the decrease in the impact of the time to load the data, which scales linearly with batch size while the number of sample pair scales quadratically, we observe substantially increased runtimes per pair. Source data

Update of

Similar articles

Cited by

  • Ancient Rapanui genomes reveal resilience and pre-European contact with the Americas.
    Moreno-Mayar JV, Sousa da Mota B, Higham T, Klemm S, Gorman Edmunds M, Stenderup J, Iraeta-Orbegozo M, Laborde V, Heyer E, Torres Hochstetter F, Friess M, Allentoft ME, Schroeder H, Delaneau O, Malaspinas AS. Moreno-Mayar JV, et al. Nature. 2024 Sep;633(8029):389-397. doi: 10.1038/s41586-024-07881-4. Epub 2024 Sep 11. Nature. 2024. PMID: 39261618 Free PMC article.
  • Shared chromosomal segments connect ancient human societies.
    Bergström A. Bergström A. Nat Genet. 2024 Jan;56(1):10-11. doi: 10.1038/s41588-023-01606-5. Nat Genet. 2024. PMID: 38123641 No abstract available.
  • The rise and transformation of Bronze Age pastoralists in the Caucasus.
    Ghalichi A, Reinhold S, Rohrlach AB, Kalmykov AA, Childebayeva A, Yu H, Aron F, Semerau L, Bastert-Lamprichs K, Belinskiy AB, Berezina NY, Berezin YB, Broomandkhoshbacht N, Buzhilova AP, Erlikh VR, Fehren-Schmitz L, Gambashidze I, Kantorovich AR, Kolesnichenko KB, Lordkipanidze D, Magomedov RG, Malek-Custodis K, Mariaschk D, Maslov VE, Mkrtchyan L, Nagler A, Fazeli Nashli H, Ochir M, Piotrovskiy YY, Saribekyan M, Sheremetev AG, Stöllner T, Thomalsky J, Vardanyan B, Posth C, Krause J, Warinner C, Hansen S, Haak W. Ghalichi A, et al. Nature. 2024 Nov;635(8040):917-925. doi: 10.1038/s41586-024-08113-5. Epub 2024 Oct 30. Nature. 2024. PMID: 39478221 Free PMC article.
  • Long shared haplotypes identify the Southern Urals as a primary source for the 10th century Hungarians.
    Gyuris B, Vyazov L, Türk A, Flegontov P, Szeifert B, Langó P, Mende BG, Csáky V, Chizhevskiy AA, Gazimzyanov IR, Khokhlov AA, Kolonskikh AG, Matveeva NP, Ruslanova RR, Rykun MP, Sitdikov A, Volkova EV, Botalov SG, Bugrov DG, Grudochko IV, Komar O, Krasnoperov AA, Poshekhonova OE, Chikunova I, Sungatov F, Stashenkov DA, Zubov S, Zelenkov AS, Ringbauer H, Cheronet O, Pinhasi R, Akbari A, Rohland N, Mallick S, Reich D, Szécsényi-Nagy A. Gyuris B, et al. bioRxiv [Preprint]. 2024 Jul 23:2024.07.21.599526. doi: 10.1101/2024.07.21.599526. bioRxiv. 2024. PMID: 39091721 Free PMC article. Preprint.
  • Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics.
    Childebayeva A, Zavala EI. Childebayeva A, et al. iScience. 2023 Oct 4;26(11):108066. doi: 10.1016/j.isci.2023.108066. eCollection 2023 Nov 17. iScience. 2023. PMID: 37927550 Free PMC article. Review.

References

    1. Palamara, P. F. & Pe’er, I. Inference of historical migration rates via haplotype sharing. Bioinformatics29, i180–i188 (2013). - PMC - PubMed
    1. Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol.11, e1001555 (2013). - PMC - PubMed
    1. Ringbauer, H., Coop, G. & Barton, N. H. Inferring recent demography from isolation by distance of long shared sequence blocks. Genetics205, 1335–1351 (2017). - PMC - PubMed
    1. Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res.19, 318–326 (2009). - PMC - PubMed
    1. Browning, B. L. & Browning, S. R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet.88, 173–182 (2011). - PMC - PubMed