Abstract
The genomes of living lungfishes can inform on the molecular-developmental basis of the Devonian sarcopterygian fish–tetrapod transition. We de novo sequenced the genomes of the African (Protopterus annectens) and South American lungfishes (Lepidosiren paradoxa). The Lepidosiren genome (about 91 Gb, roughly 30 times the human genome) is the largest animal genome sequenced so far and more than twice the size of the Australian (Neoceratodus forsteri)1 and African2 lungfishes owing to enlarged intergenic regions and introns with high repeat content (about 90%). All lungfish genomes continue to expand as some transposable elements (TEs) are still active today. In particular, Lepidosiren’s genome grew extremely fast during the past 100 million years (Myr), adding the equivalent of one human genome every 10 Myr. This massive genome expansion seems to be related to a reduction of PIWI-interacting RNAs and C2H2 zinc-finger and Krüppel-associated box (KRAB)-domain protein genes that suppress TE expansions. Although TE abundance facilitates chromosomal rearrangements, lungfish chromosomes still conservatively reflect the ur-tetrapod karyotype. Neoceratodus’ limb-like fins still resemble those of their extinct relatives and remained phenotypically static for about 100 Myr. We show that the secondary loss of limb-like appendages in the Lepidosiren–Protopterus ancestor was probably due to loss of sonic hedgehog limb-specific enhancers.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Genome assemblies and sequencing data are available from NCBI Bioprojects PRJNA808321, PRJNA808322, PRJNA813994, PRJNA813995 and PRJNA981572 and at BioSamples SAMN26083907 and SAMN26533844. Gene and repeat annotations are available at Figshare (https://figshare.com/articles/dataset/Lungfish_genome_annotation/24147732)102.
Code availability
References
Meyer, A. et al. Giant lungfish genome elucidates the conquest of land by vertebrates. Nature 590, 284–289 (2021).
Wang, K. et al. African lungfish genome sheds light on the vertebrate water-to-land transition. Cell 184, 1362–1376.e1318 (2021).
Irisarri, I. et al. Phylotranscriptomic consolidation of the jawed vertebrate timetree. Nat. Ecol. Evol. 1, 1370–1378 (2017).
Krefft, J. L. G. Description of a gigantic amphibian allied to the genus Lepidosiren from the Wide-Bay district, Queensland. Proc. Zool. Soc. Lond. 1870, 221–224 (1870).
Meyer, A. & Dolven, S. I. Molecules, fossils, and the origin of tetrapods. J. Mol. Evol. 35, 102–113 (1992).
Kemp, A. The biology of the Australian lungfish, Neoceratodus forsteri (Krefft 1870). J. Morphol. 190, 181–198 (1986).
Nowoshilow, S. et al. The axolotl genome and the evolution of key tissue formation regulators. Nature 554, 50–55 (2018).
Shao, C. et al. The enormous repetitive Antarctic krill genome reveals environmental adaptations and population insights. Cell 186, 1279–1294.e1219 (2023).
Oliveira, C. et al. Chromosome formulae of neotropical freshwater fishes. Rev. Brasil. Genet. 11, 577–624 (1988).
Suzuki, A. & Yamanaka, K. Chromosomes of an African Lungfish, Protopterus annectens. Proc. Jpn Acad. B Phys. Biol. Sci. 64, 119–121 (1988).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Irisarri, I. & Meyer, A. The identification of the closest living relative(s) of tetrapods: phylogenomic lessons for resolving short ancient internodes. Syst. Biol. 65, 1057–1075 (2016).
Brownstein, C. D., Harrington, R. C. & Near, T. J. The biogeography of extant lungfishes traces the breakup of Gondwana. J. Biogeogr. 50, 1191–1198 (2023).
Simakov, O. et al. Deeply conserved synteny resolves early events in vertebrate evolution. Nat. Ecol. Evol. 4, 820–830 (2020).
Simakov, O. et al. Deeply conserved synteny and the evolution of metazoan chromosomes. Sci. Adv. 8, eabi5884 (2022).
Muffato, M. et al. Reconstruction of hundreds of reference ancestral genomes across the eukaryotic kingdom. Nat. Ecol. Evol. 7, 355–366 (2023).
Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, 199 (2018).
Meyer, A. & Schartl, M. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol. 11, 699–704 (1999).
Thomson, K. S. An attempt to reconstruct evolutionary changes in the cellular DNA content of lungfish. J. Exp. Zool. 180, 363–371 (1972).
Gregory, T. R. The bigger the C-value, the larger the cell: genome size and red blood cell size in vertebrates. Blood Cells Mol. Dis. 27, 830–843 (2001).
Nystedt, B. et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).
Falcon, F., Tanaka, E. M. & Rodriguez-Terrones, D. Transposon waves at the water-to-land transition. Curr. Opin. Genet. Dev. 81, 102059 (2023).
Brennecke, J. et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089–1103 (2007).
Yi, M. et al. Rapid evolution of piRNA pathway in the teleost fish: implication for an adaptation to transposon diversity. Genome Biol. Evol. 6, 1393–1407 (2014).
Wang, J. et al. Transposable element and host silencing activity in gigantic genomes. Front. Cell Dev. Biol. 11, 1124374 (2023).
Song, J. et al. Variation in piRNA and transposable element content in strains of Drosophila melanogaster. Genome Biol. Evol. 6, 2786–2798 (2014).
Aravin, A. A. et al. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Mol. Cell 31, 785–799 (2008).
Wang, W. et al. The initial uridine of primary piRNAs does not create the tenth adenine that is the hallmark of secondary piRNAs. Mol. Cell 56, 708–716 (2014).
Pasquesi, G. I. M. et al. Vertebrate lineages exhibit diverse patterns of transposable element regulation and expression across tissues. Genome Biol. Evol. 12, 506–521 (2020).
Kofler, R. piRNA clusters need a minimum size to control transposable element invasions. Genome Biol. Evol. 12, 736–749 (2020).
Liu, X. et al. Transposable element expansion and low-level piRNA silencing in grasshoppers may cause genome gigantism. BMC Biol. 20, 243 (2022).
Yang, P., Wang, Y. & Macfarlan, T. S. The role of KRAB-ZFPs in transposable element repression and mammalian evolution. Trends Genet. 33, 871–881 (2017).
Imbeault, M., Helleboid, P.-Y. & Trono, D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543, 550–554 (2017).
Kaessmann, H., Vinckenbosch, N. & Long, M. RNA-based gene duplication: mechanistic and evolutionary insights. Nat. Rev. Genet. 10, 19–31 (2009).
Carelli, F. N. et al. The life history of retrocopies illuminates the evolution of new mammalian genes. Genome Res. 26, 301–314 (2016).
Chen, M. et al. Evolutionary patterns of RNA-based duplication in non-mammalian chordates. PLoS ONE 6, e21466 (2011).
Okabe, M. & Graham, A. The origin of the parathyroid gland. Proc. Natl Acad. Sci. USA 101, 17716–17719 (2004).
Li, C. et al. Genome sequences reveal global dispersal routes and suggest convergent genetic adaptations in seahorse evolution. Nat. Commun. 12, 1094 (2021).
Kerr, T. The scales of modern lungfish. Proc. Zool. Soc. Lond. 125, 335–345 (1955).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Di-Poï, N., Montoya-Burgos, J. I. & Duboule, D. Atypical relaxation of structural constraints in Hox gene clusters of the green anole lizard. Genome Res. 19, 602–610 (2009).
Feiner, N. Accumulation of transposable elements in Hox gene clusters during adaptive radiation of Anolis lizards. Proc. Biol. Sci. 283, 20161555 (2016).
Woltering, J. M., Noordermeer, D., Leleu, M. & Duboule, D. Conservation and divergence of regulatory strategies at Hox loci and the origin of tetrapod digits. PLoS Biol. 12, e1001773 (2014).
Berlivet, S. et al. Clustering of tissue-specific sub-TADs accompanies the regulation of HoxA genes in developing limbs. PLoS Genet. 9, e1004018 (2013).
Kemp, A., Cavin, L. & Guinot, G. Evolutionary history of lungfishes with a new phylogeny of post-Devonian genera. Palaeogeogr. Palaeoclimatol. Palaeoecol. 471, 209–219 (2017).
Díaz-González, F. et al. Biallelic cGMP-dependent type II protein kinase gene (PRKG2) variants cause a novel acromesomelic dysplasia. J. Med. Genet. 59, 28–38 (2022).
Lewandowski, J. P. et al. Spatiotemporal regulation of GLI target genes in the mammalian limb bud. Dev. Biol. 406, 92–103 (2015).
Breslow, D. K. et al. A CRISPR-based screen for Hedgehog signaling provides insights into ciliary function and ciliopathies. Nat. Genet. 50, 460–471 (2018).
Yang, L. et al. Enlarged fins of Tibetan catfish provide new evidence of adaptation to high plateau. Sci. China Life Sci. 66, 1554–1568 (2023).
Letelier, J. et al. The Shh/Gli3 gene regulatory network precedes the origin of paired fins and reveals the deep homology between distal fins and digits. Proc. Natl Acad. Sci. USA 118, e2100575118 (2021).
Woltering, J. M. et al. Sarcopterygian fin ontogeny elucidates the origin of hands with digits. Sci. Adv. 6, eabc3510 (2020).
Kvon, E. Z. et al. Comprehensive in vivo interrogation reveals phenotypic impact of human enhancer variants. Cell 180, 1262–1271.e1215 (2020).
Roscito, J. G. et al. Convergent and lineage-specific genomic differences in limb regulatory elements in limbless reptile lineages. Cell Rep. 38, 110280 (2022).
Ovchinnikov, V. et al. Caecilian genomes reveal the molecular basis of adaptation and convergent evolution of limblessness in snakes and caecilians. Mol. Biol. Evol. 40, msad102 (2023).
Lopez-Rios, J. The many lives of SHH in limb development and evolution. Semin. Cell Dev. Biol. 49, 116–124 (2016).
Farrell, E. R. & Münsterberg, A. E. csal1 is controlled by a combination of FGF and Wnt signals in developing limb buds. Dev. Biol. 225, 447–458 (2000).
Carneiro, J. et al. Evidence of cryptic speciation in South American lungfish. J. Zool. Syst. Evol. Res. 59, 760–771 (2021).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021); https://pubmed.ncbi.nlm.nih.gov/33436076/.
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci USA 117, 9451–9457 (2020); https://pubmed.ncbi.nlm.nih.gov/32300014/.
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999); https://pubmed.ncbi.nlm.nih.gov/9862982/.
Bao, Z., & Edyy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Chalopin, D., Naville, M., Plard, F., Galiana, D. & Volff, J.-N. Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates. Genome Biol. Evol. 7, 567–580 (2015).
Conte, M. A. et al. Chromosome-scale assemblies reveal the structural evolution of African cichlid genomes. Gigascience 8, giz030 (2019).
Brawand, D. et al. The genomic substrate for adaptive radiation in African cichlid fish. Nature 513, 375–381 (2014).
Kong, Y. et al. Transposable element expression in tumors is associated with immune infiltration and increased antigenicity. Nat. Commun. 10, 5228 (2019).
Yang, W. R., Ardeljan, D., Pacyna, C. N., Payer, L. M. & Burns, K. H. SQuIRE reveals locus-specific regulation of interspersed repeat expression. Nucleic Acids Res. 47, e27 (2019).
Peona, V. et al. The avian W chromosome is a refugium for endogenous retroviruses with likely effects on female-biased mutational load and genetic incompatibilities. Philos. Trans. R. Soc. Lond. B Biol. Sci. 376, 20200186 (2021).
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2008).
Steinbiss, S., Willhoeft, U., Gremme, G. & Kurtz, S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 37, 7002–7013 (2009).
Llorens, C. et al. The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 39, D70–D74 (2011).
Groza, C., Chen, X., Wheeler, T. J., Bourque, G. & Goubert, C. GraffiTE: a unified framework to analyzetransposable element insertion polymorphisms using genome-graphs. Preprint at bioRxiv https://doi.org/10.1101/2023.09.11.557209 (2023).
She, R., Chu, J. S., Wang, K., Pei, J. & Chen, N. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res. 19, 143–149 (2009).
Pearson, W. R. Finding protein and nucleotide similarities with FASTA. Curr. Protoc. Bioinform. 53, 3.9.1–3.9.25 (2016).
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Sellitto, A. et al. Molecular and functional characterization of the somatic PIWIL1/piRNA pathway in colorectal cancer cells. Cells 8, 1390 (2019).
Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).
Rosenkranz, D. & Zischler, H. proTRAC-a software for probabilistic piRNA cluster detection, visualization and analysis. BMC Bioinform. 13, 5 (2012).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).
Delsuc, F., Brinkmann, H., Chourrout, D. & Philippe, H. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439, 965–968 (2006).
Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
Thomson, K. S. & Muraszko, K. Estimation of cell size and DNA content in fossil fishes and amphibians. J. Exp. Zool. 205, 315–320 (1978).
Huang, Z. et al. Three amphioxus reference genomes reveal gene and chromosome evolution of chordates. Proc. Natl Acad. Sci. USA 120, e2201504120 (2023).
Kautt, A. F. et al. Contrasting signatures of genomic divergence during sympatric speciation. Nature 588, 106–111 (2020).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Deng, W., Nickle, D. C., Learn, G. H., Maust, B. & Mullins, J. I. ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user’s datasets. Bioinformatics 23, 2334–2336 (2007).
Montavon, T. et al. A regulatory archipelago controls Hox genes transcription in digits. Cell 147, 1132–1145 (2011).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 19, 151 (2018).
Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Taylor, W. & Van Dyke, G. Revised procedures for staining and clearing small fishes and other vertebrates for bone and cartilage study. Cybium 9, 107–119 (1985).
Kvon, E. Z. et al. Progressive loss of function in a limb enhancer during snake evolution. Cell 167, 633–642.e611 (2016).
Osterwalder, M. et al. in Craniofacial Development Vol. 2403 (ed. Dworkin, S.) 147−186 (Humana, 2022).
Du, K. Lungfish genome annotation. figshare https://doi.org/10.6084/m9.figshare.24147732.v1 (2024).
Acknowledgements
This work was supported by the German Research Foundation (DFG) through a grant to A.M., T. Burmester and M.S. (Me1725/24-1, Bu956/23-1, Scha408/16-1). O.S. was supported by the European Research Council’s Horizon 2020: European Union Research and Innovation Programme, grant no. 945026. Next-generation sequencing data production and data analysis were carried out at the DRESDEN-concept Genome Center, supported by the DFG Research Infrastructure Programme (project 407482635) and part of the Next Generation Sequencing Competence Network (project 423957469).
Author information
Authors and Affiliations
Contributions
A.M. and M.S. conceived the study and coordinated the work and, together with T. Burmester, secured the funding. Additional funding was provided by E. Myers. A.M. and M.S. wrote the manuscript with contributions from all other authors. S.W., M.P. and T. Brown performed high molecular weight DNA extraction, sequencing and genome assembly into contigs and Hi-C scaffolding. E. Myers supervised Hi-C and genomic sequencing, genome assembly and analysed data. P.F. undertook transcriptome analysis and annotation. K.D. performed the genome annotation and retrogene analysis. J.M.W. analysed and annotated hox clusters and performed gene loss analysis. I.S., L.O., E. Monteiro, D.B.A. and J.F.S. performed and analysed the lungfish treatment experiments. Z.C., S.J. and E.Z.K. analysed the L. paradoxa enhancer in mice. I.I. generated phylogenetic analyses and molecular clock and ancestral character state reconstructions. M.A. prepared the piRNAs for sequencing. S.K. performed positive selection analysis and analysed the piRNA landscapes. J.L., D.C. and A.S. performed transposon and repeat analyses. O.S. and M.L. performed synteny analyses.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Phylogenomics of lungfish.
a, Loci selection for phylogenomics. Graphs show different properties (root-to-tip variance, level of saturation, average patristic distance, compositional heterogeneity, proportion of variable sites, average bootstrap support, Robinson-Foulds similarity) for the 8,339 loci as inferred by genesortR. The graph of gene-wise log-likelihood differences shows support of each locus for two relevant alternative hypotheses (see Supplementary Information 2). b, Bayesian phylogram showing the evolutionary relationships and relative rates of the three lungfish genomes within the context of vertebrate phylogeny. The phylogeny was reconstructed as the consensus of 100 Markov chains (MCMC) from 100 independent gene jackknife replicates analyzed by PhyloBayes-MPI under the CAT mixture model (indicated with numbers on the internal edges, 1 = 100 replicates). The scale bar is the expected amino acid replacements per site. c. Bayesian time-calibrated phylogeny inferred from the set of 8,323 orthologs. Posterior probability distributions of estimated ages of common ancestors are plotted on tree nodes. X axis is in million years and major geological periods are indicated (O. Ordovician, S. Silurian, De. Devonian, Ca. Carboniferous, P. Permian, Tr. Triassic, Ju. Jurassic, Cr. Cretaceous, P. Paleogene, N. Neogene).
Extended Data Fig. 2 High retention of ancestral linkage groups lungfish genomes.
a-d, Species-to-species dotplots showing high degree of retained collinearity in the African and South American lungfish genomes, despite their genome size. b-d, Oxford dotplots representing orthologous genes shared on the previously reported ancestral linkage groups (ALGs)15. Chromosome numbering corresponds to the homologous lungfish linkage groups which have independently fused in individual lineages. Neoceratodus with its 27 chromosomes represented the most ancestral (unfused) state. e, Retention rates of lungfish chromosomes. Often only one alpha copy is present in lungfishes, e.g. descendants of several chromosomal elements have two alpha chromosomes in gar and Australian lungfish but only one clear alpha chromosome remains in South American and African lungfish (with the alpha copies having lost genes). Retention rates were computed as the percentage of the retained (present) ohnologs of gene families that comprise a given ancestral linkage group. Total number of gene families per chromosome was counted and their position was not taken into account. Only chromosomes with at least 5% ancestral linkage group retention were counted. Lower plots show retention on individual chromosomes (represented by dots) grouped by their ancestral linkage group in different lungfishes and gar.
Extended Data Fig. 3 Genomic composition of repetitive elements.
a, Overall composition of repetitive elements from unmasked assemblies (two rounds of transposable element annotation) for the three lungfish (Lpa=Lepidosiren paradoxa, Pan=Protopterus annectens, Nfo=Neoceratodus forsteri), axolotl (Ame=Ambystoma mexicanum), and coelacanth (Lch=Latimeria chalumnae). The total TE coverage for each species is shown under each pie chart. RC, rolling-circle transposon; SINE, short interspersed element; LINE, long interspersed element; LTR, long terminal repeat; DNA, cut-and-paste DNA transposons. Total repeat coverage of other species analyzed in this study: Xenopus ~25%; Platyfish ~23%; Burtoni and Midas cichlids ~30%; and Pufferfish ~8%. b, Different repeat superfamilies expanded in lungfish genomes. Heatmap shows the repeat superfamily content of coelacanth (Lch=Latimeria chalumnae), axolotl (Ame=Ambystoma mexicanum) and three lungfish (Lpa=Lepidosiren paradoxa, Pan=Protopterus annectens, Nfo=Neoceratodus forsteri). The color is scaled to the genomic content across repeat superfamilies.
Extended Data Fig. 4 Expression of transposable element families.
a, b, Expression estimated for each transposable element family from poly (A)-enriched RNA-seq data. In all tissues, SINEs are more highly expressed than any other subclass in the African lungfish, while both LINEs and SINEs are slightly more expressed than any other subclass in the South American lungfish. n = 2029 (African lungfish) and 1897 (South American lungfish) transposable element families. Wilcoxon Signed Ranks Test (one-sided) was applied with * indicating p-value < 0.05, ** p-value < 0.005, *** p-value < 0.0005 and **** p-value < 0.00005. The box bounds the interquartile range divided by the median value, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. c, d, Higher expression of young transposable element families. When transposable element families are divided into young or old copies based on Kimura 2-parameter distance to consensus values (0–10% is young, >10% is old), young TEs are significantly higher expressed than old ones, suggesting that several types of TEs remain active and contribute to the ongoing expansion of the lungfish genomes. Out of the 13 SINE families of Protopterus annectens, only copies from the SINE/t-RNA-V-RTE are considered as young. e, f, | Correlation between expression of transposable element families and copy number. Expression was estimated for each transposable element family using poly (A)-enriched RNA-seq data. For all tissues and transposable element classes, a positive correlation is observed between expression level and copy number. When a transposable element family is highly expressed, this family tends to have more copies. All analyzed correlations are significantly positive (p-values < 0.001). A linear model estimated trend line and calculated 95% confidence interval around the trend (gray fill) are plotted (two-sided). Lpa, Lepidosiren paradoxa; Pan, Protopterus annectens.
Extended Data Fig. 5 Age estimation and comparison of full-length TEs across lungfish genomes.
a, Landscape of subclasses of transposable elements. Kimura substitution level (%) for each copy against its consensus sequence used as proxy for expansion history of the transposable elements. Older copies accumulated more nucleotide substitutions and show higher distance to the consensus sequences. The phylogeny depicts the estimation of divergence times among the five studied species. RC, rolling-circle transposon; SINE, short interspersed element; LINE, long interspersed element; LTR, long terminal repeat. b, Copy numbers of full-length TEs within orders. c, Copy numbers of full-length TEs within superfamilies, color scaled to copy number. d, Percentage of transcribed TEs. e. Example of synteny to show one full-length copy from LINE/CR1 exclusively present in our Protopterus genome and absent in the other individual’s genome. f, Comparison of expression between full-length and fragmented TEs. n = 122, 832, 031 (South American lungfish), 66, 736, 976 (African lungfish) and 58, 296, 831 transposable elements. Wilcoxon Signed Ranks Test (one-sided) was applied with **** indicating p-value < 0.00005. The box bounds the interquartile range divided by the median value, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box and the middle dots indicate mean values. Lpa=Lepidosiren paradoxa, South American lungfish; Pan=Protoperus annectens, African lungfish; Nfo=Neoceratodus fosteri; Australian lungfish.
Extended Data Fig. 6 Size distribution and correlation between piRNA content and genome size.
a, Size distribution of clean reads of unoxidized small RNA libraries of the same individuals as used for the piRNA analysis, with the position of the peaks for miRNA and piRNA marked with dotted lines. In contrast to the oxidized samples African and South American lungfish have a clear peak at the expected size range of miRNAs (~24 nts), but unlike the other species no second distinct peak at the expected size range of piRNAs. b, Spearman rank correlation between genome size (log scale) and %RNA of clean tag) from the oxidized testis small RNAs (silhouettes as in a).
Extended Data Fig. 7 Signature nucleotides of piRNAs, piRNA cluster structure and KZFP genes.
a, Proportion of nucleotides of the small RNA reads at the first position (left) and the tenth position (right) of the three lungfish, amphibian and fish samples. b, Graphical proTRAC output of a representative piRNA cluster for the pufferfish (left panel) and the South American lungfish (right panel). The top part visualizes the number of genomic hits produced by the query piRNA sequence. Dark green indicating that there is only one sequence hit in the genome, dark red indicating more than 1000 hits. Below is the sequence read coverage plot (blue: reads on the plus strand, red: reads on the minus strand). The RepeatMasker bar shows TEs annotated by RepeatMasker in this region. Lungfish clusters tend to have lower diversity and a higher read count. c, C2H2 zinc-finger and KRAB domain protein (KZFP) gene counts and genomic organization in sarcopterygians. Left, number of KZFP genes in indicated genomes. Right, gene length of KZFP genes in indicated species. n = 1168 KZFPs. Wilcoxon Signed Ranks Test (one-sided) was applied with **** indicating p-value < 0.00005. The box bounds the interquartile range divided by the median value, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. Lpa=Lepidosiren paradoxa; Pan=Protopterus annectens; Nfo=Neoceratodus forsteri; Lch=Latimeria chalumnae; Hsa=Homo sapiens; Gga=Gallus gallus.
Extended Data Fig. 8 Positively selected genes and gene losses.
a, Positively selected genes in all three lungfishes related to lungfish biology. b, Numerous gene losses in Lepidosiren paradoxa and Protopterus annectens indicate a cellular milieu that is permissive of transposon spreading due to a reduction in the DNA damage response and apoptosis. Due to low piRNA levels (through an as of yet unidentified mechanism) high activity of transposable elements is present in the germline resulting in frequent insertions and high levels of genotoxic stress due to double stranded DNA breaks which tend to result in G1 arrest and apoptosis as part of the DNA damage response which provides a mechanism for somatic selection against compromised cells. These gene losses are expected to reduce the levels of such selection and create a permissive environment for DNA transposition and helps explain the rapid expansion of the lungfishes’ genomes. c, The synteny block spanning RASGEF1B to ANTXR2 is widely preserved across vertebrates. The region containing RASGEF1B to PRDM8 has been deleted in Lepidosiren paradoxa and Protopterus annectens. The ciliary CFAP299 gene is still present in both species as an intronless retrogene. Loss of BMP3 can be linked to the reduced squamation of the derived Lepidoserenidae, while loss of PRKG2 and RASGEF1B can be linked to their derived fins. In the ray finned fish Astatotilapia burtoni, BMP3 is strongly expressed in the developing scales at 12 dpf. d, TTC23 is a component of the primary cilia and involved in the cellular perception of the shh signal transduction pathway. TTC23 is located in a highly conserved gene block which is also preserved in Lepidosiren paradoxa and Protopterus annectens, however without an identifiable TTC23 gene present. This “ghost locus” was further analyzed using Lagan Vista. Paired Lagan using the translated anchoring option and the Coelacanth sequence as baseline identifies the TTC23 exons in human, spotted gar and Neoceratodus forsteri, but not in Lepidosiren. paradoxa and Protopterus annectens.
Extended Data Fig. 9 Expanded hox clusters preserve regulatory landscape architecture.
a, In spite of a dramatic expansion of the lungfish Hox clusters whereby the Lepidosiren paradoxa clusters are approximately 20-fold enlarged compared to mouse, which is lower than the proportional difference in genome size. Consistent with this observation is that all four clusters preserve a conserved core subcluster (indicated in red) that has expanded relatively little and is low in repeat content. These regions are hoxa4-a11, hoxb2-b9, hoxc4-c11 and hoxd8-d11 indicating topological constraints on the expansion of these regions. In addition, hoxa3 and hoxd3 (purple) show expansion of their intronic region, which is similar to the expansion of the hoxa3 intron in the expanded axolotl Hoxa cluster7. An interesting difference is that the hoxa11-hoxa13 intergenic shows a tendency for expansion in lungfishes but not in axolotl, potentially related to additional constraints induced by the fin to limb transition. Furthermore, signatures of repeat insertion in the anterior Hoxc and posterior Hoxb clusters mirror those observed in anolis lizards41. b, HiC analysis for Midas cichlid, human and Protopterus annectens Hoxa and Hoxd clusters. Despite the approximate 70 times size difference between these species there is a remarkable conservation of the flanking regulatory landscapes whereby both clusters are present on the intersection of a 3’ and 5’ TAD. Known fin and limb enhancers (blue ovals) are conserved in an expected fashion (open ovals for Lepidosirenidae mm406 and e10 indicate secondary loss), altogether suggesting that long range regulatory landscapes remain preserved under conditions of genome expansion. Synteny regions shown encompass the following sizes: HoxA; Pan 3.2 Mb, Hsa 3.1 Mb Aci 0.31 Mb, Hoxd; Pan 28 Mb, Hsa 2.8 Mb, Aci 0.41 Mb. Species name abbreviations are the same as in the other figures.
Extended Data Fig. 10 Functional analysis of lungfishes ZRS and SAG treatment of Lepidosiren paradoxa regenerating fins.
a, Mouse transgenesis and LacZ staining for the Neoceratodus forsteri and Lepidosiren paradoxa ZRS sequences. Genotyping indicates whether insertion was either in a single or double copy at the targeted locus, or randomly integrated in the genome. Neoceratodus forsteri ZRS gives ZPA staining in 16/16 embryos, whereas the Lepidosiren paradoxa ZRS does not give staining in 15/15 embryos. b, Regeneration of pectoral fins in presence of the shh agonist SAG does not result in radial growth in Lepidosiren paradoxa (n = 3 for SAG treated animals, n = 3 for DMSO-treated animals; representative images of one animal per treatment are shown).
Supplementary information
Supplementary Information
Supplementary Information sections 1–6, including Tables 9–15, Fig. 1, appendix and references.
Supplementary Table 1
Statistics for assemblies and genome annotations of lungfishes.
Supplementary Table 2
Naming of lungfish chromosomes according to ancestral lungfish units.
Supplementary Table 3
piRNA sequencing statistics.
Supplementary Table 4
proTRAC result of TE mapping piRNAs.
Supplementary Table 5
a, presence of piRNA machinery genes in genomes of lungfish and other vertebrates. b, Expression of piRNA machinery genes in lungfish.
Supplementary Table 6
Retrocopy genes in the genomes of the South American and African lungfish and the coelacanth.
Supplementary Table 7
a, Positively selected genes site class 3. b, Positively selected genes site class 4.
Supplementary Table 8
Assemblies used for comparative analyses for positively selected genes, piRNA landscape and repeat content.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Schartl, M., Woltering, J.M., Irisarri, I. et al. The genomes of all lungfish inform on genome expansion and tetrapod evolution. Nature 634, 96–103 (2024). https://doi.org/10.1038/s41586-024-07830-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-024-07830-1