Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 26;16(5):e0252181.
doi: 10.1371/journal.pone.0252181. eCollection 2021.

Genome-wide comparative analyses of GATA transcription factors among 19 Arabidopsis ecotype genomes: Intraspecific characteristics of GATA transcription factors

Affiliations

Genome-wide comparative analyses of GATA transcription factors among 19 Arabidopsis ecotype genomes: Intraspecific characteristics of GATA transcription factors

Mangi Kim et al. PLoS One. .

Abstract

GATA transcription factors (TFs) are widespread eukaryotic regulators whose DNA-binding domain is a class IV zinc finger motif (CX2CX17-20CX2C) followed by a basic region. Due to the low cost of genome sequencing, multiple strains of specific species have been sequenced: e.g., number of plant genomes in the Plant Genome Database (http://www.plantgenome.info/) is 2,174 originated from 713 plant species. Thus, we investigated GATA TFs of 19 Arabidopsis thaliana genome-widely to understand intraspecific features of Arabidopsis GATA TFs with the pipeline of GATA database (http://gata.genefamily.info/). Numbers of GATA genes and GATA TFs of each A. thaliana genome range from 29 to 30 and from 39 to 42, respectively. Four cases of different pattern of alternative splicing forms of GATA genes among 19 A. thaliana genomes are identified. 22 of 2,195 amino acids (1.002%) from the alignment of GATA domain amino acid sequences display variations across 19 ecotype genomes. In addition, maximally four different amino acid sequences per each GATA domain identified in this study indicate that these position-specific amino acid variations may invoke intraspecific functional variations. Among 15 functionally characterized GATA genes, only five GATA genes display variations of amino acids across ecotypes of A. thaliana, implying variations of their biological roles across natural isolates of A. thaliana. PCA results from 28 characteristics of GATA genes display the four groups, same to those defined by the number of GATA genes. Topologies of bootstrapped phylogenetic trees of Arabidopsis chloroplasts and common GATA genes are mostly incongruent. Moreover, no relationship between geographical distribution and their phylogenetic relationships was found. Our results present that intraspecific variations of GATA TFs in A. thaliana are conserved and evolutionarily neutral along with 19 ecotypes, which is congruent to the fact that GATA TFs are one of the main regulators for controlling essential mechanisms, such as seed germination and hypocotyl elongation.

PubMed Disclaimer

Conflict of interest statement

This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Gene structure of AtGATA2 and AtGATA4 in 19 A. thaliana.
(A) shows gene structure of AtGATA2 genes from 19 A. thaliana genomes. (B) displays gene structure of AtGATA4 genes from 19 A. thaliana genomes. Yellow boxes indicate translated regions and black boxes display untranslated regions. Numbers around boxes display relative positions of translated, untranslated, and exons. Names of A. thaliana genomes are printed in the left part of each gene diagram. Dotted and solid lines indicate the conserved and different structure of GATA genes including exon, intron, and untranslated regions, respectively.
Fig 2
Fig 2. Gene structure and protein sequence of alternative splicing forms of AtGATA15 gene in A. thaliana Kn0.
(A) shows gene structure of two alternative splicing forms of AtGATA15 gene in A. thaliana Kn0 genome. Black- or orange-colored boxes indicate untranslated and coding regions in exons, respectively. Black lines mean intron regions. Numbers around exon boxes present relative base pair position started from a transcript start position of the AtGATA15 gene. The chromosomal position of the AtGATA15 gene is displayed on the top of the diagram. (B) exhibits protein sequences of alternative splicing forms of the AtGATA15 gene. Black dots with numbers present the position of amino acids. The amino acids marked in blue letters indicate AtGATA15a specific amino acids.
Fig 3
Fig 3. Gene structure of alternative splicing forms of GATA genes in A. thaliana Col0.
It shows alternative splicing forms of GATA genes in A. thaliana Col0. Black and orange color thick boxes indicate exons and lines means intron. Black- or orange-colored boxes indicate untranslated and coding regions in exons, respectively. Numbers around exon boxes present relative base pair position started from a transcript start position of each gene. Yellow star indicates one of the alternative splicing forms of GATA gene without GATA domain.
Fig 4
Fig 4. Domain structure in A. thaliana Col0 and amino acid varaitions of GATA TFs of 19 A. thaliana.
(A) is the phylogenetic analysis of A. thaliana Col0 GATA domains. This is made of a neighbor-joining tree of GATA domain amino acid sequences from A. thaliana Col0 GATA TFs. Bootstrap values calculated from 10,000 replicates are shown on the tree except that those values are lower than 50. The scale bar corresponds to 0.10 estimated amino acid substitutions per site. (B) is protein domain organization of the corresponding GATA TFs. Black boxes with four different patterns indicate GATA domains with four different types. Type IVb, IVc, IV4, and IVp mean CX2CX18CX2C, CX2CX20CX2C, CX4CX18CX2C, and partial forms, respectively. Yellow- and orange-colored boxes indicate functional domains of TIFY and CCT, respectively. Subfamily names were displayed at the right side. Definitions of each box were presented in the right-top side. (C) shows GATA domain sequence types along with each GATA TF and A. thaliana genome. The X-axis of the matrix presents ecotypes of A. thaliana and Y-axis means each GATA TFs. Four different colors, white, yellow, orange, and green, indicate different amino acids in each Arabidopsis GATA TFs and the blue color presents heterogeneous amino acid in a specific position caused by heterogeneous nucleotide. Dark grey color means missed GATA TFs along with 19 ecotypes.
Fig 5
Fig 5. Amino acid patterns of GATA domain from 19 A. thaliana genomes.
It shows amino acid patterns of GATA domains of GATA TFs from 19 A. thaliana genomes. Purple colored GATA gene name indicates GATA TFs found only in Kn0 genome and grey colored GATA gene names mean that some A. thaliana genomes do not have GATA gene. Blue colored GATA gene name presents uniquely found in A. thaliana Col0 genome. Colors on aligned amino acids of the GATA domain indicate the number of amino acids in that position. Black and purple boxes under the alignment indicate the position of beta-sheet and alpha helixes, respectively. Black and purple border boxes indicate an area of the beta sheet and alpha helix areas.
Fig 6
Fig 6. Chromosomal distribution of A. thaliana GATA genes among 19 genomes.
Gradient purple bars indicate the chromosome of A. thaliana Col0. The left bar indicates the length of the chromosome. Red, green, sky blue, and gray GATA gene names mean subfamilies I, II, III, and IV, respectively. An array of small squares beside chromosomes presents the existence of GATA genes among 18 A. thaliana genomes: yellow color means existence and white color is non-existence case.
Fig 7
Fig 7. Principal components analysis result of 28 characteristics of GATA genes identified from 19 Arabidopsis ecotypes.
It shows the two-dimensional model of 19 Arabidopsis ecotypes derived from principal components analysis of 28 characteristics of GATA genes identified from 19 Arabidopsis ecotypes. Gray, purple, blue, and red circles are corresponding to Type 1, 2, 3, and 4 mentioned in Table 4, respectively. The ecotype name colored blue represents the specific dot.
Fig 8
Fig 8. Phylogenetic relationship of GATA genes and chloroplast genomes of Arabidopsis ecotypes.
(A) is a bootstrapped maximum-likelihood phylogenetic tree of 18 A. thaliana and A. lyrata chloroplast genomes. (B) presents a bootstrapped maximum-likelihood phylogenetic tree of concatenated common GATA genes across 19 A. thaliana ecotypes and A. lyrata. Numbers on branches in both phylogenetic trees indicate supporting values of maximum-likelihood, neighbor-joining, and Bayesian inference tree, respectively. The scale bars of both trees indicate estimated DNA substitutions per site. Gray, purple, blue, and red circles are corresponding to Types 1, 2, 3, and 4 mentioned in Fig 8 and Table 4, respectively. The dotted straight and curved lines connect the same ecotype in both trees.

Similar articles

Cited by

References

    1. Metzker ML. Sequencing technologies—the next generation. Nature reviews genetics. 2010;11(1):31. 10.1038/nrg2626 - DOI - PubMed
    1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics. 2016;17(6):333. 10.1038/nrg.2016.49 - DOI - PMC - PubMed
    1. Bleidorn C. Third generation sequencing: technology and its potential impact on evolutionary biodiversity research. Systematics and biodiversity. 2016;14(1):1–8.
    1. Van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends in genetics. 2014;30(9):418–26. 10.1016/j.tig.2014.07.001 - DOI - PubMed
    1. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome research. 2008;18(12):2024–33. 10.1101/gr.080200.108 - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources