Statistical features of human exons and their flanking regions
- PMID: 9536098
- DOI: 10.1093/hmg/7.5.919
Statistical features of human exons and their flanking regions
Abstract
To facilitate gene finding and for the investigation of human molecular genetics on a genome scale, we present a comprehensive survey on various statistical features of human exons. We first show that human exons with flanking genomic DNA sequences can be classified into 12 mutually exclusive categories. This classification could serve as a standard for future studies so that direct comparisons of results can be made. A database for eight categories (related to human genes in which coding regions are split by introns) was built from GenBank release 87.0 and analyzed by a number of methods to characterize statistical features of these sequences that may serve as controls or regulatory signals for gene expression. The statistical information compiled includes profiles of signals for transcription, splicing and translation, various compositional statistics and size distributions. Further analyses reveal novel correlations and constraints among different splicing features across an internal exon that are consistent with the Exon Definition model. This information is fundamental for a quantitative view of human gene organization, and should be invaluable for individual scientists to design human molecular genetics experiments.
Similar articles
-
Fission yeast gene structure and recognition.Nucleic Acids Res. 1994 May 11;22(9):1750-9. doi: 10.1093/nar/22.9.1750. Nucleic Acids Res. 1994. PMID: 8202381 Free PMC article.
-
A relationship between GC content and coding-sequence length.J Mol Evol. 1996 Sep;43(3):216-23. doi: 10.1007/BF02338829. J Mol Evol. 1996. PMID: 8703087
-
The 5' leader of plant PgiC has an intron: the leader shows both the loss and maintenance of constraints compared with introns and exons in the coding region.Mol Biol Evol. 2002 Sep;19(9):1613-23. doi: 10.1093/oxfordjournals.molbev.a004223. Mol Biol Evol. 2002. PMID: 12200488
-
Biased distribution of adenine and thymine in gene nucleotide sequences.J Mol Evol. 1994 Nov;39(5):439-47. doi: 10.1007/BF00173412. J Mol Evol. 1994. PMID: 7528807
-
Advances in the Exon-Intron Database (EID).Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9. Brief Bioinform. 2006. PMID: 16772261 Review.
Cited by
-
Cryptic transcripts from a ubiquitous plasmid origin of replication confound tests for cis-regulatory function.Nucleic Acids Res. 2012 Aug;40(15):7280-90. doi: 10.1093/nar/gks451. Epub 2012 May 22. Nucleic Acids Res. 2012. PMID: 22618870 Free PMC article.
-
ABRAXAS (FAM175A) and Breast Cancer Susceptibility: No Evidence of Association in the Breast Cancer Family Registry.PLoS One. 2016 Jun 7;11(6):e0156820. doi: 10.1371/journal.pone.0156820. eCollection 2016. PLoS One. 2016. PMID: 27270457 Free PMC article.
-
Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units.Sci Transl Med. 2012 Oct 3;4(154):154ra135. doi: 10.1126/scitranslmed.3004041. Sci Transl Med. 2012. PMID: 23035047 Free PMC article.
-
Significant non-existence of sequences in genomes and proteomes.Nucleic Acids Res. 2021 Apr 6;49(6):3139-3155. doi: 10.1093/nar/gkab139. Nucleic Acids Res. 2021. PMID: 33693858 Free PMC article.
-
Human genomic sequences that inhibit splicing.Mol Cell Biol. 2000 Sep;20(18):6816-25. doi: 10.1128/MCB.20.18.6816-6825.2000. Mol Cell Biol. 2000. PMID: 10958678 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases