Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Apr 29;111(17):6131-8.
doi: 10.1073/pnas.1318948111. Epub 2014 Apr 21.

Defining functional DNA elements in the human genome

Affiliations
Review

Defining functional DNA elements in the human genome

Manolis Kellis et al. Proc Natl Acad Sci U S A. .

Abstract

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The complementary nature of evolutionary, biochemical, and genetic evidence. The outer circle represents the human genome. Blue discs represent DNA sequences acted upon biochemically and partitioned by their levels of signal [combined 10th percentiles of different ENCODE data types for high, combined 50th percentiles for medium, and all significant signals for low (see Reconciling Genetic, Evolutionary, and Biochemical Estimates and Fig. 2)]. The red circle represents, at the same scale, DNA with signatures of evolutionary constraint (GERP++ elements derived from 34mammal alignments). Overlaps among the sequences having biochemical and evolutionarily evidence were computed in this work (Fig. 3 and SI Methods). The small purple circle represents protein-coding nucleotides (Gencode). The green shaded domain conceptually represents DNA that produces a phenotype upon alteration, although we lack well-developed summary estimates for the amount of genetic evidence and its relationship with the other types. This summary of our understanding in early 2014 will likely evolve substantially with more data and more refined experimental and analytical methods.
Fig. 2.
Fig. 2.
Summary of the coverage of the human genome by ENCODE data. The fraction of the human genome covered by ENCODE-detected elements in at least one cell line or tissue for each assay is shown as a bar graph. All percentages are calculated against the whole genome, including the portion that is not uniquely mappable with short reads and thus is invisible to the analysis presented here (see Fig. S1). A more detailed summary can be found in Fig. S2. For transcripts, coverage was calculated from RNA-seq–derived contigs (104) using the count of read fragments per kilobase of exon per million reads (FPKM) and separated into abundance classes by FPKM values. Note that FPKMs are not directly comparable among different subcellular fractions, as they reflect relative abundances within a fraction rather than average absolute transcript copy numbers per cell. Depending on the total amount of RNA in a cell, one transcript copy per cell corresponds to between 0.5 and 5 FPKM in PolyA+ whole-cell samples according to current estimates (with the upper end of that range corresponding to small cells with little RNA and vice versa). “All RNA” refers to all RNA-seq experiments, including all subcellular fractions (Fig. S2). DNAse hypersensitivity and transcription-factor (TFBS) and histone-mark ChIP-seq coverage was calculated similarly but divided according to signal strength. “Motifs+footprints” refers to the union of occupied sequence recognition motifs for transcription factors as determined by ChIP-seq and as measured by digital genomic footprinting, with the fuscia portion of the bar representing the genomic space covered by bound motifs in ChIP-seq. Signal strength for ChIP-seq data for histone marks was determined based on the P value of each enriched region (the –log10 of the P value is shown), using peak-calling procedures tailored to the broadness of occupancy of each modification (SI Methods).
Fig. 3.
Fig. 3.
Relationship between ENCODE signals and conservation. Signal strength of ENCODE functional annotations were defined as follows: log10 of signal intensity for DNase and TFBS, log10 of RPKM for RNA, and log10 of −log10 P value for histone modifications. Annotated regions were binned by 0.1 units of signal strength. (A) The number of nucleotides in each signal bin was plotted. (B) The fraction of the genome in each signal bin covered by conserved elements (by genomic evolutionary rate profiling) (115) was plotted.
Fig. 4.
Fig. 4.
Epigenetic and evolutionary signals in cis-regulatory modules (CRMs) of the HBB complex. (Upper) Many CRMs (red rectangles) (106) have been mapped within the cluster of genes encoding β-like globins expressed in embryonic (HBE1), fetal (HBG1 and HBG2), and adult (HBB and HBD) erythroid cells. All are marked by DNase hypersensitive sites and footprints (Gene Expression Omnibus accession nos. GSE55579, GSM1339559, and GSM1339560), and many are bound by GATA1 in peripheral blood derived erythroblasts (PBDEs). (Lower, Left) A DNA segment located between the HBG1 and HBD genes is one of the DNA segments bound by BCL11A (109, 110) and several other proteins (ENCODE uniformly processed data) to negatively regulate HBG1 and HBG2. It is sensitive to DNase I but is not conserved across mammals. (Center) An enhancer located 3′ of the HBG1 gene (red line) (108) is bound by several proteins in PBDEs and K562 cells (from the ENCODE uniformly processed data) and is sensitive to DNase I, but shows almost no signal for mammalian constraint. (Right) The enhancer at hypersensitive site (HS)2 of the locus control region (LCR) (red line) (107) is bound by the designated proteins at the motifs indicated by black rectangles. High-resolution DNase footprinting data (116) show cleavage concentrated between the bound motifs, which are strongly constrained during mammalian evolution, as shown on the mammalian phastCons track (48).

Comment in

  • Getting "function" right.
    Brunet TD, Doolittle WF. Brunet TD, et al. Proc Natl Acad Sci U S A. 2014 Aug 19;111(33):E3365. doi: 10.1073/pnas.1409762111. Epub 2014 Aug 8. Proc Natl Acad Sci U S A. 2014. PMID: 25107292 Free PMC article. No abstract available.
  • Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease.
    Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, Ward LD, Birney E, Crawford GE, Dekker J, Dunham I, Elnitski LL, Farnham PJ, Feingold EA, Gerstein M, Giddings MC, Gilbert DM, Gingeras TR, Green ED, Guigo R, Hubbard T, Kent J, Lieb JD, Myers RM, Pazin MJ, Ren B, Stamatoyannopoulos J, Weng Z, White KP, Hardison RC. Kellis M, et al. Proc Natl Acad Sci U S A. 2014 Aug 19;111(33):E3366. doi: 10.1073/pnas.1410434111. Proc Natl Acad Sci U S A. 2014. PMID: 25275169 Free PMC article. No abstract available.

Similar articles

  • An integrated encyclopedia of DNA elements in the human genome.
    ENCODE Project Consortium. ENCODE Project Consortium. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247. Nature. 2012. PMID: 22955616 Free PMC article.
  • Characterization of noncoding regulatory DNA in the human genome.
    Elkon R, Agami R. Elkon R, et al. Nat Biotechnol. 2017 Aug 8;35(8):732-746. doi: 10.1038/nbt.3863. Nat Biotechnol. 2017. PMID: 28787426 Review.
  • Evidence of abundant purifying selection in humans for recently acquired regulatory functions.
    Ward LD, Kellis M. Ward LD, et al. Science. 2012 Sep 28;337(6102):1675-8. doi: 10.1126/science.1225057. Epub 2012 Sep 5. Science. 2012. PMID: 22956687 Free PMC article.
  • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.
    ENCODE Project Consortium; Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foiss… See abstract for full author list ➔ ENCODE Project Consortium, et al. Nature. 2007 Jun 14;447(7146):799-816. doi: 10.1038/nature05874. Nature. 2007. PMID: 17571346 Free PMC article.
  • Bioinformatics for the 'bench biologist': how to find regulatory regions in genomic DNA.
    Nardone J, Lee DU, Ansel KM, Rao A. Nardone J, et al. Nat Immunol. 2004 Aug;5(8):768-74. doi: 10.1038/ni0804-768. Nat Immunol. 2004. PMID: 15282556 Review.

Cited by

References

    1. Lander ES, et al. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. - PubMed
    1. Waterston RH, et al. Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420(6915):520–562. - PubMed
    1. Lindblad-Toh K, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476–482. - PMC - PubMed
    1. Ponting CP, Hardison RC. What fraction of the human genome is functional? Genome Res. 2011;21(11):1769–1776. - PMC - PubMed
    1. Jones FC, et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature. 2012;484(7392):55–61. - PMC - PubMed

Associated data