| Literature DB >> 21829504 |
Daojun Yuan1, Lili Tu, Xianlong Zhang.
Abstract
BACKGROUND: Cotton fiber is the world's leading natural fiber used in the manufacture of textiles. Gossypium is also the model plant in the study of polyploidization, evolution, cell elongation, cell wall development, and cellulose biosynthesis. G. barbadense L. is an ideal candidate for providing new genetic variations useful to improve fiber quality for its superior properties. However, little is known about fiber development mechanisms of G. barbadense and only a few molecular resources are available in GenBank. METHODOLOGY AND PRINCIPALEntities:
Mesh:
Substances:
Year: 2011 PMID: 21829504 PMCID: PMC3145671 DOI: 10.1371/journal.pone.0022758
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
EST sequence and assembly statistics.
| Total number of sequence reads | 11,180 |
| High-quality sequences (Q>20 and at least 100 bp in length) | 10,667 |
| After removal of vector, poly-A, contaminating microbial sequences, and very short sequences (>100 bp) (GI:GR706801–GR716890) | 10,090 |
| The sequence were submitted to NCBI at 2006 (GI:EE592400–EE593286, EH122780, EH122781) | 889 |
| Average EST size after trimming (bp) | 643 |
| Longest sequence after trimming (bp) | 1184 |
| Total number of assembled sequences | 10,979 |
| Number of contigs | 1492 |
| Average number of ESTs in contigs | 4.4 |
| Number of singletons | 4360 |
| Number of unique sequences | 5852 |
| Average length of unique sequences (bp) | 706 |
| Average length of contigs (bp) | 915 |
| Average length of singletons (bp) | 633 |
| Longest length of unique sequences (bp) | 3214 |
Figure 1Distribution of 1492 contigs based on the number of clustered ESTs.
Twenty highly abundant genes in the 10,979 ESTs.
| Contig no. | Contig length | No. ESTs | Putative function |
| CO000020 | 3214 | 410 | Putative senescence-associated protein |
| CO000092 | 1882 | 91 | α-tubulin |
| CO000193 | 2009 | 91 | E6-3 protein kinase |
| CO000009 | 1202 | 88 | Fasciclin-like arabinogalactan protein |
| CO000013 | 1807 | 66 | α-tubulin |
| CO000006 | 669 | 65 | High-glycine tyrosine keratin-like protein |
| CO000128 | 643 | 63 | Fiber protein Fb28 |
| CO000056 | 1047 | 56 | Fasciclin-like arabinogalactan protein |
| CO000170 | 936 | 54 | Lipid-binding protein |
| CO000145 | 1937 | 52 | α-expansin |
| CO000026 | 1207 | 43 | Protodermal factor |
| CO000218 | 923 | 43 | Fblate-2 gene |
| CO000075 | 1819 | 42 | α-tubulin 6 |
| CO000083 | 823 | 36 | Glycine-rich RNA-binding protein |
| CO000298 | 1873 | 33 | 18S ribosomal RNA gene |
| CO000364 | 1257 | 32 | Fblate-2 |
| CO000017 | 1296 | 29 | Chitinase-like protein |
| CO000570 | 1066 | 29 | Fasciclin-like arabinogalactan protein 3 |
| CO000095 | 1300 | 28 | Dehydration-induced protein RD22-like protein |
| CO000185 | 1126 | 27 | Ubiquitin |
Figure 2Functional classifications for the 5852 unigenes that were assigned with GO terms (second level GO terms).
The three GO categories, biological process (a), molecular function (b), and cellular component (c) are presented.
The most frequent InterPro families found in G. barbadense EST library.
| InterPro no. | Description | Total of unigenes | Total of ESTs |
| IPR016040 | NAD(P)-binding | 39 | 54 |
| IPR013032 | EGF-like region, conserved site | 37 | 52 |
| IPR001007 | von Willebrand factor, type C | 34 | 42 |
| IPR001806 | Ras GTPase | 34 | 26 |
| IPR005225 | Small GTP-binding protein | 32 | 59 |
| IPR000719 | Protein kinase, core | 31 | 27 |
| IPR006058 | 2Fe-2S ferredoxin, iron sulfur-binding site | 31 | 38 |
| IPR011009 | Protein kinase-like | 29 | 32 |
| IPR012335 | Thioredoxin fold | 29 | 48 |
| IPR012336 | Thioredoxin-like fold | 27 | 45 |
| IPR000217 | Tubulin | 26 | 114 |
| IPR009072 | Histone-fold | 26 | 39 |
| IPR000608 | Ubiquitin-conjugating enzyme, E2 | 25 | 49 |
| IPR016135 | Ubiquitin-conjugating enzyme/RWD-like | 25 | 49 |
| IPR000020 | Anaphylatoxin/fibulin | 24 | 29 |
| IPR012677 | Nucleotide-binding, α-β plait | 24 | 42 |
| IPR017442 | Serine/threonine protein kinase-related | 24 | 26 |
| IPR013753 | Ras | 23 | 36 |
| IPR000626 | Ubiquitin | 22 | 30 |
| IPR007125 | Histone core | 22 | 34 |
| IPR000504 | RNA recognition motif, RNP-1 | 21 | 37 |
*The list included the families with >21 UniGenes.
The distribution of the KEGG pathway.
| Pathway | Total of unigenes | Percent of unigenes (%) | Percent of categories (%) | |
|
| Carbohydrate metabolism | 369 | 6.3 | 28.3 |
| Energy metabolism | 183 | 3.1 | 14.0 | |
| Lipid metabolism | 131 | 2.2 | 10.1 | |
| Nucleotide metabolism | 49 | 0.8 | 3.8 | |
| Amino acid metabolism | 234 | 4.0 | 18.0 | |
| Metabolism of other amino acids | 67 | 1.1 | 5.1 | |
| Glycan biosynthesis and metabolism | 16 | 0.3 | 1.2 | |
| Metabolism of cofactors and vitamins | 60 | 1.0 | 4.6 | |
| Metabolism of terpenoids and polyketides | 38 | 0.7 | 2.9 | |
| Biosynthesis of other secondary metabolites | 69 | 1.2 | 5.3 | |
| Xenobiotics biodegradation and metabolism | 87 | 1.5 | 6.7 | |
|
| Transcription | 51 | 0.9 | 9.9 |
| Translation | 210 | 3.6 | 40.7 | |
| Folding, sorting, and degradation | 229 | 3.9 | 44.4 | |
| Replication and repair | 26 | 0.4 | 5.0 | |
|
| Membrane transport | 3 | 0.1 | 2.8 |
| Signal transduction | 101 | 1.7 | 95.3 | |
| Signaling molecules and interaction | 2 | 0.0 | 1.9 | |
|
| Transport and catabolism | 153 | 2.6 | 44.7 |
| Cell motility | 19 | 0.3 | 5.6 | |
| Cell growth and death | 107 | 1.8 | 31.3 | |
| Cell communication | 63 | 1.1 | 18.4 | |
|
| Immune system | 66 | 1.1 | 21.0 |
| Endocrine system | 80 | 1.4 | 25.5 | |
| Circulatory system | 24 | 0.4 | 7.6 | |
| Digestive system | 18 | 0.3 | 5.7 | |
| Excretory system | 21 | 0.4 | 6.7 | |
| Nervous system | 32 | 0.6 | 10.2 | |
| Sensory system | 18 | 0.3 | 5.7 | |
| Development | 12 | 0.2 | 3.8 | |
| Environmental adaptation | 43 | 0.7 | 13.7 |
GIP: Genetic Information Processing; EIP: Environmental Information Processing; CP: cellular process; OS: organism systems.
Figure 3SimiTri profile of UniGenes.
The 5852 unigenes were searched against the nucleotide databases for ESTs (a) or protein (b, c, d, e) using blastn (a) or blastx (b, c, d, e) (E-value≤10−5). The color was coded based on the highest BLAST score as: red >300; yellow >200; green >150; blue >100, and purple <100.
The categories of cell wall-related genes.
| Categories | Total of ESTs | Total of unigenes | Redundancy |
| 1.1 Sugar 1-kinases (S1K) | 1 | 1 | 1.0 |
| 1.2 Nucleotide-sugar pyrophosphorylases | 16 | 10 | 1.6 |
| 1.3 Nucleotide-sugar interconversion enzymes | 186 | 36 | 5.2 |
| 2.1 Cellulose and galactomannan | 31 | 11 | 2.8 |
| 2.2 Hemicellulose | 33 | 19 | 1.7 |
| 2.3 Callose | 5 | 4 | 1.3 |
| 2.4 Other glycosyl transferases | 64 | 49 | 1.3 |
| 3.1 Cell expansion | 128 | 31 | 4.1 |
| 3.2 Hemicellulose reassembly | 24 | 9 | 2.7 |
| 3.3 Glycoside hydrolases | 79 | 37 | 2.1 |
| 3.4 Lyases | 29 | 8 | 3.6 |
| 3.5 Esterases | 33 | 20 | 1.7 |
| 4.1 Hydroxyproline-rich glycoproteins (HRGP) | 8 | 6 | 1.3 |
| 4.2 Leucine-rich repeat extensins (LRX) | 30 | 21 | 1.4 |
| 4.3 Proline-rich proteins (PRP) | 3 | 2 | 1.5 |
| 4.4 Glycine-rich proteins (GRP) | 0 | 0 | 0.0 |
| 4.5 Arabinogalactan proteins (AGP) | 237 | 12 | 19.8 |
| 5.1 Glycoprotein fucosyltransferases (GFT) | 1 | 1 | 1.0 |
| 5.2 Glycosyl transferases 21A (GT31a) | 3 | 3 | 1.0 |
| 5.3 Glycosyl transferases 31B (GT31b) | 4 | 2 | 2.0 |
| Total | 915 | 282 | 3.2 |
The most abundant putative transcriptional factors(TFs).
| TF family | TF description | Total of ESTs | Total of unigenes | Redundancy | Percent (%) |
| bZIP | Basic leucine zipper (bZIP) motif | 71 | 71 | 1.0 | 9.6 |
| MYB related | N-terminal myb-domain | 370 | 65 | 5.7 | 8.8 |
| bHLH | basic/helix-loop-helix domain | 50 | 50 | 1.0 | 6.8 |
| C2H2 | Zinc finger, C2H2 type | 46 | 46 | 1.0 | 6.3 |
| MYB | Myb-like DNA-binding domain | 44 | 44 | 1.0 | 6.0 |
| C3H | Zinc finger, C-x8-C-x5-C-x3-H type | 43 | 43 | 1.0 | 5.8 |
| NAC | No apical meristem (NAM) protein | 124 | 32 | 3.9 | 4.3 |
| WRKY | WRKY DNA-binding domain | 31 | 31 | 1.0 | 4.2 |
| S1Fa-like | negative cis-element S1F binding site | 68 | 28 | 2.4 | 3.8 |
| G2-like | Golden 2-like (GLK) | 25 | 25 | 1.0 | 3.4 |
| ERF | single AP2/ERF domain | 22 | 22 | 1.0 | 3.0 |
| Trihelix | Trihelix DNA-binding domain | 50 | 22 | 2.3 | 3.0 |
| Dof | DNA binding with one zinc finger | 19 | 19 | 1.0 | 2.6 |
| HD-ZIP | HD domain with a leucine zipper motif | 17 | 17 | 1.0 | 2.3 |
| ARF | Auxin response factor | 16 | 16 | 1.0 | 2.2 |
| M-type | MADS-box transcription factors | 16 | 16 | 1.0 | 2.2 |
| FAR1 | Far-Red-impaired Response 1 | 15 | 15 | 1.0 | 2.0 |
| HB-other | Homeobox domain | 14 | 14 | 1.0 | 1.9 |
| GRAS | three initially identified members, GAI, RGA and SCR | 12 | 12 | 1.0 | 1.6 |
| MIKC | MIKC-type MADS-box gene include three more domains intervening (I) domain, keratin-like coiled-coil (K) domain, and Cterminal (C) domain | 12 | 12 | 1.0 | 1.6 |
| NF-YC | Nuclear Factor Y subunits C proteins | 41 | 12 | 3.4 | 1.6 |
| ARR-B | Arabidopsis response regulators(ARRs) with a Myb-like DNA binding domain(ARRM) | 10 | 10 | 1.0 | 1.4 |
| NF-X1 | NF-X1 type zinc finger | 40 | 10 | 4.0 | 1.4 |
Redundancy is (Total of ESTs)/(Total of Unigenes).
Percent is (Total of unigenes)/(Total of putative TFs, 736).
Features of SSRs.
| Total number of sequences examined | 5852 |
| Total number of identified SSRs | 497 |
| Number of SSR-containing sequences | 460 |
| Number of sequences containing more than one SSR | 31 |
| Total size of examined sequences (kb) | 4125.7 |
| Average distance (kb) | 8.3 |
|
| |
| Number of dinucleotide repeats | 94 (18.9%) |
| Number of trinucleotide repeats | 187 (37.6%) |
| Number of tetranucleotide repeats | 37 (7.4%) |
| Number of pentanucleotide repeats | 98 (19.7%) |
| Number of hexanucleotide repeats | 81 (16.3%) |