Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Mar 1;29(5):1216-21.
doi: 10.1093/nar/29.5.1216.

Prediction of operons in microbial genomes

Affiliations

Prediction of operons in microbial genomes

M D Ermolaeva et al. Nucleic Acids Res. .

Abstract

Operon structure is an important organization feature of bacterial genomes. Many sets of genes occur in the same order on multiple genomes; these conserved gene groupings represent candidate operons. This study describes a computational method to estimate the likelihood that such conserved gene sets form operons. The method was used to analyze 34 bacterial and archaeal genomes, and yielded more than 7600 pairs of genes that are highly likely (P: >/= 0.98) to belong to the same operon. The sensitivity of our method is 30-50% for the Escherichia coli genome. The predicted gene pairs are available from our World Wide Web site http://www.tigr.org/tigr-scripts/operons/operons.cgi.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Conserved S gene pair. (B) A gene pair that has a higher similarity between its own genes than with a gene pair of the other genome is not considered to be conserved.
Figure 2
Figure 2
Conserved D gene pairs.
Figure 3
Figure 3
kP(conserved) dependence (gray line) and its approximation kP(conserved) (black line). The approximation was done by dividing the P(conserved) into intervals with length 0.01 and calculating average value of k on each interval.
Figure 4
Figure 4
Dependence of specificity on the P cutoff.
Figure 5
Figure 5
Normalized distribution of N for conserved SO gene pairs in E.coli. N is the number of genomes with gene pairs homologous to the given E.coli gene pair. Solid line, all conserved pairs with P ≥ 0.98; dashed line, false positives.
Figure 6
Figure 6
Dependence of the number of predicted SO pairs in E.coli (with P ≥ 0.98) on the number of genomes to which E.coli was compared.
Figure 7
Figure 7
Number of predicted gene pairs (with P ≥ 0.98) in different bacterial and archaeal genomes. The x-axis shows number of genes in the genome and y-axis shows number of found gene pairs in these genomes scaled by number of genes.

Similar articles

Cited by

References

    1. Hodgman T.C. (2000) A historical perspective on gene/protein functional assignment. Bioinformatics, 16, 10–15. - PubMed
    1. Huerta A.M., Salgado,H., Thieffry,D. and Collado-Vides,J. (1998) RegulonDB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res., 26, 55–59. - PMC - PubMed
    1. Yada T., Nakao,M., Totoki,Y. and Nakai,K. (1999) Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. Bioinformatics, 15, 987–993. - PubMed
    1. Homuth G., Masuda,S., Mogk,A., Kobayashi,Y. and Schumann,W. (1997) The dnaK operon of Bacillus subtilis is heptacistronic. J. Bacteriol ., 179, 1153–1164. - PMC - PubMed
    1. Tsui H.C., Zhao,G., Feng,G., Leung,H.C. and Winkler,M.E. (1994) The mutL repair gene of Escherichia coli K-12 forms a superoperon with a gene encoding a new cell-wall amidase. Mol. Microbiol., 11, 189–202. - PubMed

Publication types