Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Feb 21:7:31.
doi: 10.1186/1471-2164-7-31.

Further understanding human disease genes by comparing with housekeeping genes and other genes

Affiliations
Comparative Study

Further understanding human disease genes by comparing with housekeeping genes and other genes

Zhidong Tu et al. BMC Genomics. .

Abstract

Background: Several studies have compared various features of heritable disease genes with other so called non-disease genes, but they have yielded some conflicting results. A potential problem in those studies is that the non-disease genes contained a large number of essential genes--genes which are indispensable for humans to survive and reproduce. Since a functional disruption of an essential gene has fatal consequences, it's more reasonable to regard essential genes as extremely severe "disease" genes. Here we perform a comparative study on the features of human essential, disease, and other genes.

Results: In the absence of a set of well defined human essential genes, we consider a set of 1,789 ubiquitously expressed human genes (UEHGs), also known as housekeeping genes, as an approximation. We demonstrate that UEHGs are very likely to contain a large proportion of essential genes. We show that the UEHGs, disease genes and other genes are different in their evolutionary conservation rates, DNA coding lengths, gene functions, etc. Our findings systematically confirm that disease genes have an intermediate essentiality which is less than housekeeping genes but greater than other human genes.

Conclusion: The human genome may contain thousands of essential genes having features which differ significantly from disease and other genes. We propose to classify them as a unique group for comparisons of disease genes with non-disease genes. This new way of classification and comparison enables us to have a clearer understanding of disease genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of Ka, Ks and Ka/Ks (a) The cumulative density of Ka, Ks and Ka/Ks derived from human-mouse orthologous pairs. Ka, the number of non-synonymous substitutions per non-synonymous sites. Ks, the number of synonymous substitutions per synonymous site, and the Ka/Ks ratio. Three groups of human genes are represented in different colors and the number of genes in each group is listed right to the line symbols. (b) The box plots are drawn based on the same data. For each category, the central box depicts the middle 50% of the data between the 25th and 75th percentile, and the enclosed red horizontal line represents the median value of the distribution. Extreme values are indicated by solid blue dots that occur outside the main bodies of data.
Figure 2
Figure 2
Codon conservation of the three gene groups. The conservation score of amino acids of the three groups of genes are compared. (a) The distribution of disease causing mutation sites' conservation score is plotted in the solid line. The dotted line is drawn based on the conservation scores of all the sites in the coding region (i.e., the distribution of the conservation score when sites are randomly chosen). (b) The distribution of polymorphism mutation sites' conservation score vs. the random distribution as in (a). (c) The distribution of conservation score for UEHGs (black line), disease gene (broken blue line) and disease causing mutation sites (red broken line). (d) The distribution of the fraction of the highly conserved regions (Cons. Score>0.9). Each human gene group is represented in a different color.
Figure 3
Figure 3
Comparison of gene essentiality between human and C. elegans Human genome is divided to three groups as described in the main text and 20,488 C. elegans genes are mapped to each group based on homology. The essentiality of C. elegans gene is obtained from RNAi-interference experiment as described in the main text. Different phenotypes are represented by different colors and the number of the homologs in each group is listed. The fraction of human genes with C. elegans homologs is shown under the group name.
Figure 4
Figure 4
Distribution of protein physical interaction degrees. UEHGs, disease genes, and other genes are shown in three different colors in the histogram. It can be seen that as the interaction degree increases, the fraction of UEHGs also increases. For the summary statistics, see main text. The number of genes with at least one interaction in HPRD is listed for each gene group.
Figure 5
Figure 5
Function annotation of genes in the three groups. GO categories are described by the row labels and columns are the three classes of genes. A color scheme (scale shown on the right) is used to display the significance level of over-representation (numbered as negative logarithm of the P-value, upper half of the scale) or under-representation (numbered as logarithm of the P-value, lower half of the scale) for certain gene group and function category. Hyper-geometric distribution is used for the calculation of the P-value.
Figure 6
Figure 6
Correlation of disease onset age with Ka/Ks. The correlation of disease onset age with Ka/Ks. Disease genes are divided into 5 groups based on disease onset age. The weighted linear regression is applied to disease genes (group 2 to 5) and is shown as the dotted line. The coefficient for onset age is +0.0086 and P-value is 0.02, derived from the regression. UEHGs and other genes are plotted on the two sides of the diseases genes for visual comparison. The standard deviation is indicated by the short horizontal bar and mean is denoted by the solid circle. The large variation in each group hints for other confounding factors which also affect Ka/Ks.

Similar articles

Cited by

References

    1. Lopez-Bigas N, Ouzounis CA. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 2004;32:3108–3114. doi: 10.1093/nar/gkh605. - DOI - PMC - PubMed
    1. Bortoluzzi S, Romualdi C, Bisognin A, Danieli GA. Disease genes and intracellular protein networks. Physiol Genomics. 2003;15:223–227. - PubMed
    1. Simth NGC, Eyre-Walker A. Human disease genes: patterns and predictions. Gene. 2003;318:169–175. doi: 10.1016/S0378-1119(03)00772-8. - DOI - PubMed
    1. Huang H, Winter EE, Wang H, Weinstock KG, Xing H, Goodstadt L, Stenson PD, Cooper DN, Smith D, Alba MM, Pointing CP, Fechtel K. Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes. Genome Biol. 2004;5:R47. doi: 10.1186/gb-2004-5-7-r47. - DOI - PMC - PubMed
    1. Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, Arkin AP, Astromoff A, Bakkoury ME, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian K, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Guldener U, Hegemann JH, Hempel S, Herman Z, Jaramillow DF, Kelly DE, Kelly SL, Kotter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, Ross-Macdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, Sookhai-Mahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang C, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman E, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnson M. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418:387–391. doi: 10.1038/nature00935. - DOI - PubMed

Publication types

MeSH terms