Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;36(Database issue):D991-8.
doi: 10.1093/nar/gkm934. Epub 2007 Nov 5.

GreenPhylDB: a database for plant comparative genomics

Affiliations

GreenPhylDB: a database for plant comparative genomics

M G Conte et al. Nucleic Acids Res. 2008 Jan.

Abstract

GreenPhylDB (http://greenphyl.cirad.fr) is a comprehensive platform designed to facilitate comparative functional genomics in Oryza sativa and Arabidopsis thaliana genomes. The main functions of GreenPhylDB are to assign O. sativa and A. thaliana sequences to gene families using a semi-automatic clustering procedure and to create 'orthologous' groups using a phylogenomic approach. To date, GreenPhylDB comprises the most complete list of plant gene families, which have been manually curated (6421 families). GreenPhylDB also contains all of the phylogenomic relationships computed for 4375 families. A total of 492 TAIR, 1903 InterPro and 981 KEGG families and subfamilies were manually curated using the clusters created with the TribeMCL software. GreenPhylDB integrates information from several other databases including UniProt, KEGG, InterPro, TAIR and TIGR. Several entry points can be used to display phylogenomic relationships for A. thaliana or O. sativa sequences, using TAIR, TIGR gene ID, family name, InterPro, gene alias, UniProt or protein/nucleic sequence. Finally, a powerful phylogenomics tool, GreenPhyl Ortholog Search Tool (GOST), was incorporated into GreenPhylDB to predict orthologous relationships between O. sativa/A. thaliana protein(s) and sequences from other plant species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Partial view of the family entry page for the GRAS transcription factor family (fid = 20939). This family was validated based on the DRTF and DATF databases, with 35 A. thaliana and 62 rice loci and links are provided to these databases. The group structure shows that the GRAS family is consistent at the 1.2 level (cluster number 113). At higher stringency levels, the GRAS cluster is subdivided into five clusters (153, 434, 1169 and 1663) annotated as the LAS/SCR/SHR, PAT1, SCL6 and GAI subfamilies, respectively. Mouse movement over any group displays the name of the cluster if available (in this case the GAI subfamily). Beside each cluster the numbers of loci within the cluster are visible between brackets.
Figure 2.
Figure 2.
Sequence entry page for At1g14920.1 (GAI). The Os03g49990.1 (SLR1) rice gene is predicted as the A. thaliana GAI ortholog while At2g01570.1 (RGA) and At1g66350.1 (RGL1) are predicted as A. thaliana GAI Ultraparalogs. GAI classification inside cluster of several inflation values is visible in ‘sequence classification’ followed by ortholog similarity prediction by BBMH and Inparanoid, in this case in full agreement with the phylogenomic prediction. Greenphyl phylogenomic prediction is separated into three sections; the ortholog prediction for the sequence, with the corresponding orthology (o), sub-tree neighbor (n), superorthologs (so) score (in%) and the genetic distance (D); the ultra-parologs prediction for the query, in this case RGA and RGL1 with the associated ultra-paralogy score (p); and finally if the query has tandem/segmental duplicated genes (using TIGR segmental duplications and OrygenesDB tandem duplication data). The phylogenomic tree is accessible through the ‘View phylogenomic tree’.
Figure 3.
Figure 3.
Partial view of the GRAS family phylogenetic tree. Two Arabidopsis thaliana genes (At1g14920.1 and At2g01750.1) are predicted as orthologs to the query [Q] (Os03g49990.1) with a bootstrap support above 50%. Note that all DELLA proteins are members of the same clade.
Figure 4.
Figure 4.
Phylogenomic analysis of a non-Arabidopsis/rice protein sequence. The sequence of wheat RHT1 gene (Q9ST59) is pasted into the text field of the phylogenomic search tool (A). Step 1, the sequence is tentatively attributed to GreenPhylDB clusters by BLASTP. In this case, RHT1 belongs to the GRAS family and the DELLA subfamily (B), and the GRAS cluster was phylogenetically analyzed. The species name ‘wheat’ is then chosen and, after submission, the RHT1 gene integration in the pre-computed tree is initiated (step 2). GOST produces an output list of the rice and A. thaliana orthologs (C) and the phylogenetic tree (D) with bootstrap scores (%).

Similar articles

  • GreenPhylDB v5: a comparative pangenomic database for plant genomes.
    Valentin G, Abdel T, Gaëtan D, Jean-François D, Matthieu C, Mathieu R. Valentin G, et al. Nucleic Acids Res. 2021 Jan 8;49(D1):D1464-D1471. doi: 10.1093/nar/gkaa1068. Nucleic Acids Res. 2021. PMID: 33237299 Free PMC article.
  • GreenPhylDB v2.0: comparative and functional genomics in plants.
    Rouard M, Guignon V, Aluome C, Laporte MA, Droc G, Walde C, Zmasek CM, Périn C, Conte MG. Rouard M, et al. Nucleic Acids Res. 2011 Jan;39(Database issue):D1095-102. doi: 10.1093/nar/gkq811. Epub 2010 Sep 22. Nucleic Acids Res. 2011. PMID: 20864446 Free PMC article.
  • Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants.
    Conte MG, Gaillard S, Droc G, Perin C. Conte MG, et al. BMC Genomics. 2008 Apr 21;9:183. doi: 10.1186/1471-2164-9-183. BMC Genomics. 2008. PMID: 18426584 Free PMC article.
  • TAIR: a resource for integrated Arabidopsis data.
    Garcia-Hernandez M, Berardini TZ, Chen G, Crist D, Doyle A, Huala E, Knee E, Lambrecht M, Miller N, Mueller LA, Mundodi S, Reiser L, Rhee SY, Scholl R, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P. Garcia-Hernandez M, et al. Funct Integr Genomics. 2002 Nov;2(6):239-53. doi: 10.1007/s10142-002-0077-z. Epub 2002 Oct 3. Funct Integr Genomics. 2002. PMID: 12444417 Review.
  • Gramene, a tool for grass genomics.
    Ware DH, Jaiswal P, Ni J, Yap IV, Pan X, Clark KY, Teytelman L, Schmidt SC, Zhao W, Chang K, Cartinhour S, Stein LD, McCouch SR. Ware DH, et al. Plant Physiol. 2002 Dec;130(4):1606-13. doi: 10.1104/pp.015248. Plant Physiol. 2002. PMID: 12481044 Free PMC article. Review.

Cited by

References

    1. Irish VF, Benfey PN. Beyond Arabidopsis. Translational biology meets evolutionary developmental biology. Plant Physiol. 2004;135:611–614. - PMC - PubMed
    1. IRGSP. The map-based sequence of the rice genome. Nature. 2005;436:793–800. - PubMed
    1. AGI. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. - PubMed
    1. Eisen JA, Fraser CM. Phylogenomics: intersection of evolution and genomics. Science. 2003;300:1706–1707. - PubMed
    1. Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. - PMC - PubMed

Publication types