Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;48(D1):D132-D141.
doi: 10.1093/nar/gkz885.

MirGeneDB 2.0: the metazoan microRNA complement

Affiliations

MirGeneDB 2.0: the metazoan microRNA complement

Bastian Fromm et al. Nucleic Acids Res. .

Erratum in

  • MirGeneDB 2.0: the metazoan microRNA complement.
    Fromm B, Domanska D, Høye E, Ovchinnikov V, Kang W, Aparicio-Puerta E, Johansen M, Flatmark K, Mathelier A, Hovig E, Hackenberg M, Friedländer MR, Peterson KJ. Fromm B, et al. Nucleic Acids Res. 2020 Jan 8;48(D1):D1172. doi: 10.1093/nar/gkz1016. Nucleic Acids Res. 2020. PMID: 31642479 Free PMC article. No abstract available.

Abstract

Small non-coding RNAs have gained substantial attention due to their roles in animal development and human disorders. Among them, microRNAs are special because individual gene sequences are conserved across the animal kingdom. In addition, unique and mechanistically well understood features can clearly distinguish bona fide miRNAs from the myriad other small RNAs generated by cells. However, making this distinction is not a common practice and, thus, not surprisingly, the heterogeneous quality of available miRNA complements has become a major concern in microRNA research. We addressed this by extensively expanding our curated microRNA gene database - MirGeneDB - to 45 organisms, encompassing a wide phylogenetic swath of animal evolution. By consistently annotating and naming 10,899 microRNA genes in these organisms, we show that previous microRNA annotations contained not only many false positives, but surprisingly lacked >2000 bona fide microRNAs. Indeed, curated microRNA complements of closely related organisms are very similar and can be used to reconstruct ancestral miRNA repertoires. MirGeneDB represents a robust platform for microRNA-based research, providing deeper and more significant insights into the biology and evolution of miRNAs as well as biomedical and biomarker research. MirGeneDB is publicly and freely available at http://mirgenedb.org/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The evolution of the 1275 microRNA families across the 45 metazoan species currently annotated in MirGeneDB. Conserved gains are shown in red; species-specific gains are shown in pink, and losses are shown in blue; and these gains and losses are mapped onto a generally accepted topology of these species rooted between the deuterostomes and protostomes with branch lengths corresponding to gains and losses, respectively. Note though that this topology is largely recovered when just analyzing the gains and losses of the miRNA families themselves as shown by the bootstrap values indicated at the nodes (Supplementary Methods); the only known nodes not recovered are nodes within the placental mammals and Ecdysozoa, primarily due to losses in rodents and nematodes, respectively.
Figure 2.
Figure 2.
The annotation of microRNA sequences and the implementation of transcriptional and processing information for each miRNA gene in MirGeneDB. (A) The structure and read stacks for Hsa-Mir-203. The precursor sequence is shown in bold; mature reads are shown in red and star reads in blue with the reads per million for each major transcript detected shown to the far right. A ‘CNNC’ processing motif (68) is shown in yellow. Also shown are the 5′ and 3′ miRNA offset reads (magenta), which clearly conform to the indicated Drosha cut (staggered line, left) given the reads processed from this locus. The Dicer cut (staggered line, right) results in two primary mature forms (dark vs light red), what we term ‘variants’ (v) that are offset from one another by 1 nucleotide (gray). The 5′ end of variant one starts with the ‘G’ whereas the 5′ end of variant two is moved 1 nucleotide 3′ and starts with the ‘U.’ Each of these two major Dicer products is accompanied by the appropriate star sequence, with variant 1 shown in dark blue and variant 2 in light blue. The mature form of variant 2—but not version 1—is heavily mono-uridylated at its 3′ end (green circle) and is thus a ‘Group 2’ miRNA (59,66). (B) The quantification of Hsa-Mir-203 read across various human-specific data sets. As expected (e.g. (72)) expression in skin is about ∼2 orders of magnitude higher relative to other organs sampled (e.g., brain, liver, stomach, lung, uterus, pancreas, testes, colorectum, small intestine and kidney) and the detection of the mature form is nearly 3 orders of magnitude relative to the star. Consistent with Mir-203 being a bona fide miRNA, expression is nearly abrogated in DROSHA and DICER knock-outs, and greatly diminished in the EXPORTIN-5 knock-out (59).
Figure 3.
Figure 3.
Metazoan miRNA complements are homogeneous between closely related species. Top: miRBase community-report based complements show high heterogeneity in the numbers of families (red) and genes (blue) for closely related species. For instance, in miRBase, human and macaque differ by 1300 genes (Hsa 1917, Mml 617) and 1081 families (Hsa: 1543, Mml: 462). Bottom: MirGeneDBs curated complements are homogeneous for both gene and family numbers (see Supplementary Figure S3 for conserved families, genes in comparison to novel families and genes). For instance, in MirGeneDB, human and macaque differ by 55 genes (Hsa 556, Mml 501) and only one conserved family (Hsa: 206, Mml: 205). Asterisks mark species that are found in MirGeneDB, but not in miRBase.
Figure 4.
Figure 4.
Improved web interface of MirGeneDB. For each species in MirGeneDB an overview browse page exists that lists all genes. For each gene the following information is provided and sortable: hyperlinked names (both MirGeneDB ID and miRBase ID linking to MirGeneDB and miRBase, respectively), family- and seed- assignments, and arm preference (A), genomic coordinates (B); inferred phylogenetic origin of both the gene locus and family (C); information on the presence or absence of 3′ NTU’s and sequence motifs (D); and a normalized heatmap for available datasets (E).
Figure 5.
Figure 5.
Nomenclature comparison between MirGeneDB and miRBase for representative chordate Let-7s. Shown is the accepted topology (81) for the three major subgroups of chordates, and for each taxon, a (unscaled) representation of the genomic organization of its Let-7 genes/sequences. MirGeneDB names are shown below each of the loci symbols, and the miRBase sequence names are above. The primitive condition is to possess a single Let-7 gene linked to the two Mir-10 genes (light gray box), as is still found in many bilaterian taxa. In the amphioxus Branchiostoma floridae, this single Let-7 duplicated, and this new paralogue is now positioned at the 3′ end of the cluster. In the Olfactores there is a separate gene duplication event generating another paralogue that is not linked to the original Let-7 cluster in any known urochordate, like Ciona intestinalis, or any vertebrate, including human (H. sapiens) and the platypus (O. anatinus). Further distinguishing this paralogue is that in all Olfactores these Let-7 genes (shown in the dark gray boxes) are Group 2 miRNAs, each with an untemplated mono-uridylated 3′ end (green circles) (see (66)). False negatives (i.e. loci present and transcribed that are present in MirGeneDB, but not in miRBase) are shown in blue. A single false positive (i.e. a sequence present in miRBase—cin-let-7e—but without a corresponding locus in the genome) is shown in red. Note that let-7e also names two sequences derived from two non-orthologous genes in human and platypus—a canonical Group 1 Let-7 (Let-7-P1b) in human, but a Group 2 miRNA (Let-7-P2a4) in platypus. This locus is also present in diapsids (birds and ‘reptiles’), as well as in the teleost fish Danio rerio, but is lost in therian (i.e. placental and marsupial) mammals (see also (82)). Despite the fact that the monophyly of these Group 2 Let-7s in Olfactores appears robust, how the ancestral cluster of the three Let-7-P2s in vertebrates is related to the five linked P2 genes in C. intestinalis remains unknown. Hence, MirGeneDB identifies these genes with this phylogenetic opacity in mind.

Similar articles

Cited by

References

    1. Matera A.G., Terns R.M., Terns M.P.. Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs. Nat. Rev. Mol. Cell Biol. 2007; 8:209–220. - PubMed
    1. Hamilton A.J., Baulcombe D.C.. A species of small antisense RNA in posttranscriptional gene silencing in plants. Science. 1999; 286:950–952. - PubMed
    1. Lau N.C., Seto A.G., Kim J., Kuramochi-Miyagawa S., Nakano T., Bartel D.P., Kingston R.E.. Characterization of the piRNA complex from rat testes. Science. 2006; 313:363–367. - PubMed
    1. Lee R.C., Feinbaum R.L., Ambros V.. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993; 75:843–854. - PubMed
    1. Lagos-Quintana M., Rauhut R., Lendeckel W., Tuschl T.. Identification of novel genes coding for small expressed RNAs. Science. 2001; 294:853–858. - PubMed

Publication types