| Literature DB >> 19646250 |
Yanbin Yin1, Jinling Huang, Ying Xu.
Abstract
BACKGROUND: The cellulose synthase superfamily has been classified into nine cellulose synthase-like (Csl) families and one cellulose synthase (CesA) family. The Csl families have been proposed to be involved in the synthesis of the backbones of hemicelluloses of plant cell walls. With 17 plant and algal genomes fully sequenced, we sought to conduct a genome-wide and systematic investigation of this superfamily through in-depth phylogenetic analyses.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19646250 PMCID: PMC3091534 DOI: 10.1186/1471-2229-9-99
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Plant and algal genomes used in the present study
| Index | Abbr. | Clade | Species | Genome Published/Released | Csl Published? |
|---|---|---|---|---|---|
| 1 | Tp | Diatom | [ | N | |
| 2 | Pht | Diatom | JGI | N | |
| 3 | Pa | brown tide algae | JGI | N | |
| 4 | Cm | red algae | [ | N | |
| 5 | Mpc | green algae | [ | N | |
| 6 | Mpr | green algae | [ | N | |
| 7 | Ol | green algae | [ | N | |
| 8 | Ot | green algae | [ | N | |
| 9 | Cr | green algae | [ | N | |
| 10 | Vc | green algae | JGI | N | |
| 11 | Pp | moss | [ | [ | |
| 12 | Sm | spike moss | JGI | N [ | |
| 13 | Pt | dicot | [ | [ | |
| 14 | At | dicot | [ | [ | |
| 15 | Vv | dicot | [ | N | |
| 16 | Os | monocot | [ | [ | |
| 17 | Sb | monocot | [ | N |
Figure 1The maximum likelihood (ML) phylogenies of the Csl families. a) 217 plant proteins (211 proteins if alternatively splicing variants from Arabidopsis and rice are excluded) that have the Pfam Cellulose_synt domain were used to construct this tree. b) 88 proteins (83 proteins if alternatively splicing variants are excluded) that have the Pfam GT2 domain were used to construct the tree. Both the full length protein sequences and the conserved Pfam domain regions were used in the phylogeny reconstruction and the corresponding bootstrap values are shown and split by '/'.
Sizes (number of genes in each family) of Csl families in 17 plant and algal genomes
| Abbr. a) | GenomeSizeb) | Sum. | CesA | CslD | CslF | CslB | CslH | CslE | CslG | CslJ | CslA | CslC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tp | 11390 | - | - | - | - | - | - | - | - | - | - | - |
| Pht | 10025 | - | - | - | - | - | - | - | - | - | - | - |
| Aa | 11501 | - | - | - | - | - | - | - | - | - | - | - |
| Cm | 5014 | - | - | - | - | - | - | - | - | - | - | - |
| Mpc | 10475 | 1 | - | - | - | - | - | - | - | - | 1 | |
| Mpr | 9815 | 1 | - | - | - | - | - | - | - | - | 1 | |
| Ol | 7651 | 1 | - | - | - | - | - | - | - | - | 1 | |
| Ot | 7725 | 1 | - | - | - | - | - | - | - | - | 1 | |
| Cr | 14598 | 1 | - | - | - | - | - | - | - | - | 1 | |
| Vc | 15544 | 1 | - | - | - | - | - | - | - | - | 1 | |
| Pp | 35938 | 26 | 8 | 8 | - | - | - | - | - | - | 3 | 7 |
| Sm | 34697 | 22 | 10 | 6 | - | - | - | - | - | - | 2 | 4 |
| Pt | 58036 | 50 | 18 | 11 | - | 2 | - | 3 | 4 | 2 | 5 | 5 |
| At | 31921 | 39c) | 10 | 6 | - | 6 | - | 1 | 3 | - | 8 | 5 |
| Vv | 30434 | 58 | 11 | 5 | - | 7 | - | 9 | 15 | 3 | 4 | 4 |
| Os | 66710 | 44 | 10 | 5 | 8 | - | 2 | 3 | - | - | 10 | 6 |
| Sb | 35899 | 49 | 12 | 5 | 11 | - | 3 | 3 | - | 1 | 8 | 6 |
| Total | 294 | 79 | 46 | 19 | 15 | 5 | 19 | 22 | 6 | 45 | 37 | |
| All | 294 | 211 | 83 | |||||||||
a) See Table 1 for species full names
b) Genome size is measured as the number of protein coding genes in each genome.
c) AtCslA1 was not identified in our hmmsearch (see Methods for details), and thus the number of Csl genes in Arabidopsis is shown to be 39 instead of 40.
Figure 2Estimation of evolutionary rates of different Csl families. If the Ka/Ks ratio for a protein is less than one (less amino acid replacements than silent base substitutions), it means the protein is under negative selection; otherwise if Ka/Ks>1, it means the protein is under positive selection. We used the model = 1 implemented in codeml of PAML to compute Ka/Ks, which allows each gene in the tree evolving at its own rate; therefore within each family (tree) a different Ka/Ks value for each gene is obtained (see Methods for details). The distribution of the Ka/Ks values of genes of each Csl family is plotted side by side, showing the minimum, the 25% percentile, the median, the 75% percentile and the maximum values of Ka/Ks. The boxes are drawn with widths proportional to the square-roots of the number of genes in the groups. A notch is drawn on each side of the box towards the median. For panel a) we used Csl genes from all the seven land plants; for b), c) and d) we extracted and plotted genes from only subsets of the seven genomes, namely, the five seed plants, the three dicot plants and the two monocot plants, respectively.
Expression of members of the CslJ family
| CslJ gene ID | NCBI accession numbers of ESTs | UniGene | Tissue/Organ |
|---|---|---|---|
| fgenesh1_pg.C_LG_X000708 | DB888819.1, CV257302.1, DB906752.1 | Mixture of leaf, bud, stem, root | |
| estExt_fgenesh1_pg_v1.C_LG_X0702 | DB885869.1, DN497067.1, BU871140.1, AJ772607.1, DN487448.1, AJ770380.1, DB903904.1, AJ772118.1 | Dormant bud, mixture of leaf, bud, stem, root | |
| Sb03g047220 | CF430961.1, CF431079.1 | Nitrogen-deficient seedlings | |
| GSVIVP00020164001 | CF211163.1, EE094868.1, EE097022.1, EC990611.1, CV100631.1, CF983720.1, EE086006.1, EC925887.1, DT021105.1, EC927377.1, CF211254.1, DT010825.1, CF210160.1, CN006709.1, CF515516.1, CF515427.1, CF210083.1, EE093327.1, EE093253.1, CV179236.1 | Vvi.14469 | Fruit; flower; leaf; mixed; cell culture |
| GSVIVP00020168001 | FC063595.1, EC948646.1, DT004980.1 | Vvi.20726 | Flower, leaf and root |
| GSVIVP00020169001 | EE094198.1, CF983803.1, EE100185.1 | Leaf and berry |
Protein sequences of the CslJ members (first column) were queried to search against the NCBI EST database. With E-value cutoff <= 1e-2, EST matches that are >98% identical to the query protein and are from the same species as the query were collected and listed in the second column. These ESTs matches were manually checked at the NCBI website to find UniGene links (third column) and tissue/organ expression information (fourth column) if they have.
Two sample (pair-wise) nonparametric Wilcoxon test P values (Csl family in the column vs. in the row)
| All plants (a) | CslA | CslC | cesA | CslD | Seed plants (b) | CslA | CslC | cesA | CslD |
|---|---|---|---|---|---|---|---|---|---|
| CslF | 0.1584* | 2.76e-05 | 7.44e-07 | 0.0009 | CslF | 0.2707* | 5.66e-06 | 1.26e-07 | 0.0065 |
| CslB | 3.36e-05 | 1.10e-06 | 5.56e-08 | 2.05e-06 | CslB | 3.52e-05 | 4.93e-10 | 1.78e-08 | 3.93e-08 |
| CslH | 0.0307 | 0.0012 | 0.0006 | 0.0033 | CslH | 0.0349 | 7.06e-05 | 0.0002 | 0.0020 |
| CslE | 0.0017 | 8.94e-07 | 3.64e-08 | 6.22e-06 | CslE | 0.0028 | 7.27e-09 | 3.43e-09 | 3.46e-06 |
| CslG | 3.97e-05 | 1.84e-07 | 2.94e-09 | 3.34e-07 | CslG | 5.26e-05 | 1.07e-10 | 5.71e-10 | 1.21e-08 |
| CslJ | 0.2187* | 0.0261 | 0.0156 | 0.0463 | CslJ | 0.2554* | 0.0178 | 0.0010 | 0.0805* |
| Dicot plants (c) | CslA | CslC | cesA | CslD | Monocot plants (d) | CslA | CslC | cesA | CslD |
| CslB | 0.0010 | 4.99e-08 | 4.97e-08 | 3.23e-08 | CslF | 0.3520* | 0.0008 | 8.69e-06 | 0.0521* |
| CslE | 0.0185 | 2.04e-06 | 9.67e-07 | 0.0001 | CslH | 0.0399 | 0.0019 | 0.0013 | 0.0063 |
| CslG | 0.0017 | 3.77e-09 | 2.13e-09 | 3.18e-08 | CslE | 0.0386 | 0.0024 | 0.0007 | 0.0112 |
| CslJ | 0.0933* | 0.0003 | 0.0006 | 0.0123 | |||||
In a hypothesis testing, a P-value is usually calculated to indicate if the null hypothesis is statistically supported. In this case, genes of Csl families in the column are tested against those in the row in terms of their Ka/Ks values. The null hypothesis is that the column is equal to the row, and the alternative hypothesis is that the column is larger than the row. If the P-value is less than 0.05, the null hypothesis is rejected and the alternative hypothesis is statistically significantly supported. This table shows that except for a very few cases (those with asterisks), the columns are always significantly larger than the rows.
Figure 3The subtree of the CslE family and gene structures of the family members. This ML phylogeny is taken from Figure 1a. The gene structure was plotted using the GSDS server [71]. The branch length is scaled, i.e. proportional to the estimated number of molecular change. A scale bar is shown under the tree. The bootstrap values are shown to indicate the confidence level of the grouping. The intron-exon structure is shown on the right. The intron phase indicates the position of the intron within a codon. If it is not located within a codon (or located between two codons), the phase is 0. If it is located within a codon (or split a codon into two exons) and after the first base of the codon, the phase is 1, otherwise the phase is 2.
Figure 4The subtree of the CslG/J families and gene structures of the family members. See the legend of Figure 3.
Figure 5The subtree of the CslB/H families and gene structures of the family members. See the legend of Figure 3.
Figure 6A phylogeny of the Csl families based on the multiple sequence alignment of 219 full-length protein sequences. 217 proteins were used in Figure 1a. Two cyanobacterial sequences (YP_322086.1 from Anabaena variabilis ATCC 29413 and NP_487797.1 from Nostoc sp. PCC 7120) were used as the out-group to root the phylogeny.