| Literature DB >> 22252370 |
Xue-Cheng Zhang1, Zheng Wang, Xinyan Zhang, Mi Ha Le, Jianguo Sun, Dong Xu, Jianlin Cheng, Gary Stacey.
Abstract
BACKGROUND: Protein domains are the structural, functional and evolutionary units of the protein. Protein domain architectures are the linear arrangements of domain(s) in individual proteins. Although the evolutionary history of protein domain architecture has been extensively studied in microorganisms, the evolutionary dynamics of domain architecture in the plant kingdom remains largely undefined. To address this question, we analyzed the lineage-based protein domain architecture content in 14 completed green plant genomes.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22252370 PMCID: PMC3310802 DOI: 10.1186/1471-2148-12-6
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
The profile of Pfam-predicted protein domain architectures in green plants
| Domain architectures | Cra | Ol | Ot | Cv | Vc | Pp | Sm | Os | Zm | Sb | Vv | At | Pt | Gm |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Overall predictedb | 0.49 | 0.64 | 0.59 | 0.59 | 0.45 | 0.42 | 0.67 | 0.6 | 0.47 | 0.64 | 0.66 | 0.75 | 0.66 | 0.66 |
| Unique percentagec | 0.09 | 0.05 | 0.06 | 0.09 | 0.12 | 0.07 | 0.09 | 0.15 | 0.15 | 0.09 | 0.12 | 0.06 | 0.12 | 0.10 |
| Single-domain | 0.36 | 0.45 | 0.41 | 0.42 | 0.32 | 0.30 | 0.46 | 0.38 | 0.35 | 0.45 | 0.45 | 0.51 | 0.48 | 0.47 |
| Double-domain | 0.09 | 0.13 | 0.12 | 0.11 | 0.09 | 0.08 | 0.12 | 0.11 | 0.08 | 0.11 | 0.12 | 0.14 | 0.12 | 0.12 |
| Triple-domain | 0.03 | 0.05 | 0.04 | 0.04 | 0.03 | 0.03 | 0.05 | 0.06 | 0.03 | 0.04 | 0.05 | 0.05 | 0.04 | 0.04 |
| >= 4-domaind | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.06 | 0.06 | 0.02 | 0.05 | 0.06 | 0.06 | 0.04 | 0.04 |
aSpecies abbreviations are: Cr, Chlamydomonas reinhardtii; Ol, Ostreococcus lucimarinus; Ot, O. tauri; Cv, Chlorella vulgaris; Vc, Volvox carteri; Pp, Physcomitrella patens; Sm, Selaginella moellendorffii; Os, Oryza sativa; Zm, Zea mays; Sb, Sorghum bicolor; Vv, Vitis vinifera; At, Arabidopsis thaliana; Pt, Populus trichocarpa; Gm, Glycine max.
bThe percentage of proteins with at least one Pfam domain predicted at an E-value cutoff of 10-2.
cDenotes the percentages of domain architectures that are unique to each species.
dThe architectures with 4 or more domains.
Note that overall predicted architectures are categorized into single-domain, double-domain, tripe-domain and > = 4-domain architectures. The proportions of overall predicted architectures in each genome should be the sum of the proportions of these above mentioned four categories.
Figure 1Plant genomes maintain homogeneous distributions of protein domain architectures. The categories of protein domain architectures, labeled on top of each histogram and on bottom of each probability plot, are overall predicted (the left panel), species-unique (second to the left panel), as well as single-domain, double-domain, triple-domain and equal to or greater than four-domain architectures (the right four panels). The numbers are mean ± standard error. The x-axis for both the upper and lower panels is the proportions of Pfam-predicted domain architectures per genome in each category. The upper panel shows frequency distributions of the percentages of these categories of domain architectures. The lower panel is the probability plot (5% significance level) of the percentages of these various categories of domain architectures across plant species. The y-axis is the probability distributions relative to the mean values. AD represents the value of Anderson-Darling normality test. Note that the proportions of domain architectures of equal to or greater than four domains do not follow a normal distribution as evidenced by the associated p-value in the probability plot.
Figure 2Evolutionary dynamics of domain architectures reflected by the presence and absence of architectures in plant lineages. Differentially colored boxes represent the presence of architecture in individual lineages or lineage combinations. Domain architecture patterns are defined by lineages or lineage combinations. Pattern A represents algal architectures; B, bryophyte and lycophyte architectures or early diverging architectures; ABCD, universal architectures; BCD, land architectures; CD angiosperm architectures; C, monocot architectures; and D, dicot architectures. Overall denotes the raw architectures without exclusion of less commonly represented architectures in each lineage and prevalent denotes architectures present in the majority of species in each lineage, i.e., at least three out of five algal species, both P. patens and S. moellendorffii species, two out of three monocot species, and three out of four dicot species. Architectures containing WD-40 domain are included as a representative to illustrate the dynamic changes in plant lineages. Numbers before the slash are collective counts of architectures in individual categories. Numbers after the slash denotes the percentages of architectures in individual categories.
Angiosperm plants tend to integrate pre-existing domains into multi-domain architectures
| lineages | overall | |||
|---|---|---|---|---|
| angiosperm | monocot | dicot | angiosperma | |
| Single-domain | 65 (46.7%) | 51 (37.2%) | 15 (30%) | 1124771 (69.2%)e |
| Newly emerged/existing domainbc | 33/32 | 11/40 | 12/3 | |
| Multiple-domain | 74 (53.3%) | 86 (62.8%) | 35 (70%) | 55402 (30.8%)e |
| Newly emerged/existing domaind | 15/59 | 4/82 | 8/27 | |
| Chi-square testf | 0.015 ( | 16.49 ( | 5.4 ( | |
| Chi-square test | 26.16 ( | 70.74 ( | 10.91 ( | |
| Chi-square testh | 23.75 ( | 48.04 ( | 72.09 ( | |
aIncludes the 3 monocot species, Os, Zm, and Sb, and the 4 dicot species, Vv, At, Pt, and Gm.
bNew domains are defined as domains that are not present in algal and early diverging lineages. Pre-existing are domains that are present in algal and early diverging lineages.
cIndicates the numbers of new and pre-existing domains in single-domain architectures in the three lineages examined.
dIndicates the numbers of new and pre-existing domains in multiple-domain architectures in the three lineages examined.
eDenotes the sum and percentage of single-domain and multiple-domain architectures collectively from all 7 angiosperm species.
fChi-square values for the expected ratios of 50% newly emerged: 50% existing domains in multiple-domain architectures in single-domain architectures.
gChi-square values for the expected ratios of 50% newly emerged: 50% existing domains in multiple-domain architectures.
hChi-sqaure values for the expected ratio of 69.2% single-domain architectures: 30.8% multipl-domain architectures.
Figure 3Lineage-wise architecture expansion in plants. Pairwise comparisons of genomic dosages of architectures were made between lineages and colored boxes represent the significant expansion of architectures. Patterns of more than 25 counts of architectures are shown in red and less than 25 in light orange. The numbers denotes the counts and percentages of architectures of each pattern that have undergone significant expansion. Only the patterns that have an incidence higher than 1% are shown.