| Literature DB >> 15626331 |
Yan Zhou1, Jiabin Tang, Michael G Walker, Xiuqing Zhang, Jun Wang, Songnian Hu, Huayong Xu, Yajun Deng, Jianhai Dong, Lin Ye, Li Lin, Jun Li, Xuegang Wang, Hao Xu, Yibin Pan, Wei Lin, Wei Tian, Jing Liu, Liping Wei, Siqi Liu, Huanming Yang, Jun Yu, Jian Wang.
Abstract
Expressed Sequence Tag (EST) analysis has pioneered genome-wide gene discovery and expression profiling. In order to establish a gene expression index in the rice cultivar indica, we sequenced and analyzed 86,136 ESTs from nine rice cDNA libraries from the super hybrid cultivar LYP9 and its parental cultivars. We assembled these ESTs into 13,232 contigs and leave 8,976 singletons. Overall, 7,497 sequences were found similar to existing sequences in GenBank and 14,711 are novel. These sequences are classified by molecular function, biological process and pathways according to the Gene Ontology. We compared our sequenced ESTs with the publicly available 95,000 ESTs from japonica, and found little sequence variation, despite the large difference between genome sequences. We then assembled the combined 173,000 rice ESTs for further analysis. Using the pooled ESTs, we compared gene expression in metabolism pathway between rice and Arabidopsis according to KEGG. We further profiled gene expression patterns in different tissues, developmental stages, and in a conditional sterile mutant, after checking the libraries are comparable by means of sequence coverage. We also identified some possible library specific genes and a number of enzymes and transcription factors that contribute to rice development.Entities:
Mesh:
Substances:
Year: 2003 PMID: 15626331 PMCID: PMC5172415 DOI: 10.1016/s1672-0229(03)01005-2
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Quality Assessment of the cDNA Libraries
| Library | rRNA | Mitochondria mRNA | G3PD | Actin | Tubulin | MADS |
|---|---|---|---|---|---|---|
| 0.25% | 4.90% | 0.56% | 0.29% | 0.09% | 0.06% | |
| 0.66% | 0.78% | 0.71% | 0.20% | 0.20% | 0.00% | |
| 1.99% | 0.18% | 0.50% | 0.36% | 0.19% | 0.06% | |
| 0.09% | 0.31% | 0.78% | 0.76% | 0.83% | 0.34% | |
| 0.64% | 0.65% | 0.76% | 0.50% | 1.10% | 0.00% | |
| 0.40% | 0.22% | 0.44% | 0.66% | 1.04% | 0.13% | |
| 0.20% | 0.30% | 0.55% | 0.59% | 1.31% | 0.10% | |
| 0.18% | 0.31% | 0.92% | 0.62% | 2.25% | 0.40% | |
| 0.35% | 0.31% | 0.78% | 0.17% | 0.20% | 0.10% | |
| 0.53% | 0.88% | 0.67% | 0.46% | 0.80% | 0.13% | |
| 0.58% | 1.52% | 0.16% | 0.21% | 0.72% | 0.14% | |
| 0.24 | 0.46 | 0.89 | 1.08 | |||
Fig. 1The figure on the left is the length distribution of all the EST sequences passing our quality check. The X-axis is the sequence length. The Y-axis is the number of sequences within the range of sequence length indicated by X-axis with an increase step of 4 bp. Note that in our filter we discarded all ESTs shorter than 100 bp after head/tail trimming and vector masking. The figure on the right is the average quality distribution of all the EST sequences passed our quality check. The X-axis is the average sequence quality score. The Y-axis is the number of sequences within the range of sequence quality score indicated by X-axis with an increase step of 0.5.
Fig. 2We ran pairwise sequence comparison within each library using BLASTN, and grouped sequences that have more than 90% overall similarity. The X-axis stands for the group size or the number of sequences in one group. The Y-axis is the log of the group numbers for every group size. We’ve done the check to all the 9 libraries (Lib1—Lib9). And we used 5 libraries from CGAP (http://cgap.nci.nih.gov/) as controls. These libraries are non-normalized constructed by Krizman protocol 1 (Lib281), LTI non-normalized (Lib6346), Soares non-normalized (Lib185) and Krizman protocol 2 (Lib675 and Lib774). We believe our libraries are quite good compared to the controls.
Fig. 3Contribution of our EST data to rice gene set. The ESTs in this study were combined with public rice ESTs and then aligned with rice indica genomic scaffolds by BLASTN. The BLAST threshold was set at E-value less than 1E-5. Y-axis of the circles represents the total matched genomic sequence length. Y-axis of the stars represents the contig number of the progressive assembly. The rectangles are the number of the progressively generated contigs that Arabidopsis gene hits. They share the same Y-axis with the stars, but note that we increased the Y value of hit number by 10 folds to make the points easy to read. To avoid the error derived from genome duplication, which is common in Arabidopsis and very likely in rice, each contig/EST could only be aligned with genomic sequence once.
EST Assembly Evaluation
| 85.93% | 90.33% | |
| 91.56% | 89.56% | |
| 86.07% | 90.76% | |
| 77.73% | 79.62% | |
| 80.50% | 80.55% | |
| 79.88% | 81.47% | |
| 87.02% | 92.28% | |
| 86.42% | 88.74% | |
| 88.76% | 90.10% | |
| 87.89% | 90.89% | |
| 91.82% | 89.67% | |
| 88.61% | 91.17% | |
Fig. 4To check out chimerics, we aligned both raw data (ESTs) and contig consensi to rice genome. This figure shows the putative intron length distribution by aligning ESTs and contig consensi with rice indica genomic scaffolds using BLASTN (E-value 1E-15). HSPs with identity length greater than 70% of the contigs/ESTs were chosen. The gaps between two HSPs were putative introns. We found that 524 contigs have introns longer than 2 kb but shorter than 5 kb, and 237 contigs have introns longer than 5 kb.
Fig. 5The comparison between different GO ( catalogues of predicted genes on rice indica genome (total 53,398 genes) classified by GO indices for Swissprot proteins, EST contigs (total 32,489 contigs, 86,136 ESTs) classified by GO indices for Swissprot proteins, and EST contigs classified by GO indices for Arabidopsis proteins. The Y-axis stands for different GO categories in molecular function and biological process. The X-axis was the gene/contig numbers linked to the specific category. To make the figures readable, log numbers were used here.
Fig. 6The coverage difference between Arabidopsis thaliana and rice ESTs. A total of 180,602 rice ESTs had been used here, which include different cultivars (LYP9, PA64s, 93 – 11), tissues (leaf, panicle) and different development stages (trefoil, tillering, booting). A total of 99,426 Arabidopsis ESTs had been used here, which include different tissues (Dry seeds, green siliques, inflorescence) and different development stages (cycling cells, greenhouse plants, two- to six-week old). We chose non-normalized libraries to make sure the results are comparable. Each column stands for a metabolism pathway defined in KEGG. The height of the bar means the percentage of the enzymes that found matches in Arabidopsis thaliana (light) and rice (black) ESTs of that pathway. To find matches we ran BLASTX of rice and Arabidopsis ESTs against full length CDS defined in KEGG with threshold E-value 1E-10 and overall identity 30%.
Genes Most Differentially Expressed between 93-11 (Lib 5) and LYP9(Lib 3) Varieties
| Contig Name | BLASTN Annotation | BLASTX Annotation | Change Folds | Chi-square Test |
|---|---|---|---|---|
| Contig13918 Contig5428 | (Q40677) Fructose-bisphosphate aldolase, chloroplast precur | 0.24 | 1.74E-10 | |
| Contig6621 Contig13700 | Unkown | 39.30 | 5.6E-10 | |
| Contig13245 Contig13907 | Unkown | (P51327) Cell division protein ftsH homolog (EC 3.4.24.-) | 0.11 | 4.96E-09 |
| Contig13893, Contig698 | (P18566) Ribulose bisphosphate carboxylase small chain A | 0.03 | 7.15E-09 | |
| Contig13906 | Unkown | 33.45 | 1.13E-08 | |
| Contig13704 | Unkown | 12.82 | 1.33E-08 | |
| Contig13764 | (P27322) Heat shock cognate 70 kDa protein 2 | 0.12 | 5.36E-08 | |
| Contig13913 | Unkown | 11.43 | 1.12E-07 | |
| Contig13767 | Unkown | 2.72 | 2.73E-07 | |
| Contig13736 | (Q43848) Transketolase, chloroplast precursor (EC 2.2.1.1) | 0.11 | 3.02E-07 | |
| Contig13914 | (Q03200) Light regulated protein precursor | 8.57 | 3.83E-07 | |
| Contig13680 | Unkown | Unkown | 0.04 | 1.87E-06 |
| Contig13695 | (P22953) Heat shock cognate 70 kDa protein 1 (Hsc70.1) | 0.08 | 2.58E-06 | |
| Contig13920 | (P93431) Ribulose bisphosphate carboxylase/oxygenase activa | 0.17 | 3.23E-06 | |
| Contig13613 | Unkown | Unkown | 20.90 | 7.49E-06 |
| Contig727 Contig9659 | (P06671) Chlorophyll A-B binding protein, chloroplast precu | 0.34 | 8.61E-06 | |
| Contig13708 | (P26302) Phosphoribulokinase, chloroplast precursor (EC 2.7) | 0.20 | 9.26E-06 | |
LYP9 Genes Mostly Differentially Expressed between Tillering (Lib 3) and Trefoil (Lib 2) Stages
| MasterContig Name | BLASTN Annotation | BLASTX Annotation | Change Folds | Chi-sqare Test |
|---|---|---|---|---|
| Contig13767 | Unkown | 0.10 | 6.34E-15 | |
| Contig13718 Contig13638 | (P00228) Ferredoxin, chloroplast precursor | 9.24 | 3.57E-13 | |
| Contig13727 Contig12420 | (P36886) Photosystem I reaction center subunit X, chloropla | 8.56 | 1.00E-10 | |
| Contig27 Contig13694 | Unkown | Unkown | 15.73 | 2.84E-10 |
| Contig6621 Contig13700 | Unkown | 0.05 | 7.59E-09 | |
| Contig13904 | Unkown | (Q40070) Photosystem II 10 kDa polypeptide, chloroplast pre | 10.84 | 8.03E-09 |
| Contig13906 | Unkown | 0.03 | 3.20E-08 | |
| Contig13913 | Unkown | 0.06 | 8.52E-08 | |
| Contig13751 | (P40880) Carbonic anhydrase, chloroplast precursor (EC 4.2.) | 4.98 | 8.85E-08 | |
| Contig13911 | (P06671) Chlorophyll A-B binding protein, chloroplast pre | 11.90 | 9.67E-08 | |
| Contig13920 | (P93431) Ribulose bisphosphate carboxylase/oxygenase activa | 7.23 | 1.00E-07 | |
| Contig13704 | Unkown | 0.11 | 1.42E-07 | |
| Contig13901 | (P18567) Ribulose bisphosphate carboxylase small chain C | 22.95 | 3.23E-06 | |
| Contig13546 Contig7253 | (P54773) Photosystem II 22 kDa protein, chloroplast pre | 12.75 | 4.24E-06 | |
| Contig12144 Contig13905 | (P27523) Chlorophyll A-B binding protein of LHCII type III | 7.65 | 4.82E-06 | |
| Contig13926 | (P27519) Chlorophyll A-B binding protein, chloroplast pre | 2.59 | 5.92E-06 | |
| Contig13723 Contig150 | Unkown | Unkown | 0.09 | 7.37E-06 |
| Contig13715 Contig1541 | Unkown | (P27522) Chlorophyll A-B binding protein 8, chloroplast pre | 5.74 | 7.54E-06 |
| Contig13576 | Unkown | 12.11 | 8.20E-06 | |
| Contig13914 | (Q03200) Light regulated protein precursor | 0.19 | 9.04E-06 | |
| Contig4926 Contig13475 | (P42815) Ribonuclease 3 precursor (EC 3.1.27.1) | 7.33 | 9.11E-06 | |
| Contig13604 | Unkown | 7.33 | 9.11E-06 | |
| Contig727 Contig9659 | (P06671) Chlorophyll A-B binding protein, chloroplast pre | 2.98 | 9.64E-06 | |
Description of the Surveyed Rice cDNA Libraries and the Number of EST Sequenced in Each Library
| Library | Tissue | Cultival | Stage | Condition | Phenotype | Sequences | Contigs (size>l) | Chimeric | Singletons | Annotated | Novel |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Lib 1 | leaf | trefoil | 7,074 | 801 | 3 | 3,568 | 848 | 3,521 | |||
| Lib 2 | whole plant | trefoil | 7,682 | 940 | 1 | 3,462 | 947 | 3,455 | |||
| Lib 3 | whole plant | tillering | 9,795 | 1,406 | 1 | 4,355 | 1,233 | 4,520 | |||
| Lib 4 | panicle | heading/flowering | high temperature, long sunlight | sterile | 9,483 | 1,213 | 0 | 5,032 | 1,041 | 5,204 | |
| Lib 5 | whole plant | tilering | 8,190 | 1,015 | 5 | 4,403 | 958 | 4,460 | |||
| Lib 6 | panicle | heading/lowering | high temperature, short sunlight | fertile | 10,003 | 1,355 | 2 | 5,569 | 893 | 6,031 | |
| Lib 7 | panicle | heading/flowering | high temperature, short sunlight | fertile | 12,053 | 1,827 | 0 | 5,443 | 1,106 | 6,164 | |
| Lib 8 | panicle | heading/flowering | high temperature, long sunlight | sterile | 12,708 | 1,948 | 0 | 5,796 | 1,210 | 6,534 | |
| Lib 9 | whole plant | booting | 9,148 | 1,386 | 0 | 4,393 | 946 | 4,833 | |||
| Total | 86,136 | 11,891 | 12 | 42,021 | 9,182 | 44,722 | |||||
Fig. 7An overview of the expression patterns of every gene in the nine libraries we’ve sequenced. The bar in the middle shows the percentage of gene expressions in one library only and two libraries and so on to 9 libraries. Not surprisingly, about 91.9% of the uniquely expressed genes are singletons. Unique genes that have more than one EST are showed in the upper pie. Relative abundance of unique expressed genes in genes having the same contig size is showed in the upper bar chart. The X-axis is the contig size, or the ESTs in the contigs, the Y-axis is the number of uniquely expressed genes divided by the total number of the genes having the same contig size. Not surprisingly, the singletons or the contigs with size one are a hundred percent unique genes. The lower pie chart shows the contributions (contig numbers) of libraries to uniquely expressed genes. The lower bar chart is the relative contribution of each library. The X-axis stands for libraries, the Y-axis is the unique gene numbers divided by the total EST numbers in that library.
Fig. 8An overview of the relationship of our EST sequence analysis methods. After library and sequence quality check, high quality EST sequences of good libraries went through sequencing progress monitor to make sure enough sequences had been collected. Then high quality none-redundant dataset were generated by clustering and contig check. Complete ORF search, function assignment and classification and expression profile analysis were performed on those carefully checked EST contigs.
Genes Mostly Differentially Expressed in PA64s between Short Sunlight (Fertile, Lib 6 and 7) and Long Sunlight (Sterile, Lib 4 and 8)
| MasterContig Name | BLASTN Annotation | BLASTX Annotation | Change Folds | Chi-square Test |
|---|---|---|---|---|
| Contig8033 Contig13583 | Unknown | 10.52 | 5.93E-23 | |
| Contig13748 | Unknown | Unknown | 5.76 | 2.27E-09 |
| Contig13702 Contig972 | (Q05431) L-ascorbate peroxidase, cytosolic (EC 1.11.1.11) | 6.46 | 4.67E-09 | |
| Contig13724 Contig964 | (P30298) Sucrose synthase 1 (EC 2.4.1.13) | 0.24 | 5.11E-09 | |
| Contig13413 Contig13912 | (Q9SP07) 14-3-3-like protein | 4.47 | 1.13E-08 | |
| Contig13637 Contig13705 | (P42767) Aquaporin | 3.01 | 2.04E-08 | |
| Contig5735 Contig13762 | (Q08733) Plasma membrane intrinsic protein 1C | 2.61 | 5.32E-08 | |
| Contig13617 | Unknown | (Q9SYQ8) Receptor protein kinase CLAVATA1 precursor | 0.10 | 4.93E-07 |
| Contig11991 Contig13740 | (Q42699) 5-methyltetrahydropteroyl-triglutamate—homocystein | 0.32 | 1.04E-06 | |
| Contig13681 | (O22424) 40S ribosomal protein S4 | 5.63 | 3.33E-06 | |
| Contig8371 Contig12088 | (O64937) Elongation factor 1-alpha (EF-1-alpha) | 0.63 | 5.76E-06 | |
| Contig13741 Contig12901 | (P50156) Tonoplast intrinsic protein, gamma (Gamma TIP) | 3.16 | 6.89E-06 | |
| Contig13766 Contig12761 | (P33126) Heat shock protein 82 | 0.47 | 7.7E-06 | |
| Contig13917 | Unknown | 4.83 | 9.8E-06 | |