| Literature DB >> 18043742 |
Kouji Satoh1, Koji Doi, Toshifumi Nagata, Naoki Kishimoto, Kohji Suzuki, Yasuhiro Otomo, Jun Kawai, Mari Nakamura, Tomoko Hirozane-Kishikawa, Saeko Kanagawa, Takahiro Arakawa, Juri Takahashi-Iida, Mitsuyoshi Murata, Noriko Ninomiya, Daisuke Sasaki, Shiro Fukuda, Michihira Tagami, Harumi Yamagata, Kanako Kurita, Kozue Kamiya, Mayu Yamamoto, Ari Kikuta, Takahito Bito, Nahoko Fujitsuka, Kazue Ito, Hiroyuki Kanamori, Il-Ryong Choi, Yoshiaki Nagamura, Takashi Matsumoto, Kazuo Murakami, Ken-ichi Matsubara, Piero Carninci, Yoshihide Hayashizaki, Shoshi Kikuchi.
Abstract
Rice (Oryza sativa L.) is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA) sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE) genes, 33K annotated non-expressed (ANE) genes, and 5.5K non-annotated expressed (NAE) genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18043742 PMCID: PMC2084198 DOI: 10.1371/journal.pone.0001235
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
FL-cDNA clones mapped to five rice genome assemblies.
| origin sequencing All |
|
| ||||
| Map-base cloning | whole shotgun | |||||
| TIGR | IRGSP4 | IRGSP3 | Syngenta | 93-11 | ||
|
| 35,187 | 32,775 | 32,745 | 32,640 | 31,928 | 30,354 |
|
| 241,854 | 212,598 | 212,539 | 211,564 | 208,606 | 199,001 |
|
| 536,885 | 483,657 | 484,358 | 482,909 | 482,665 | 465,775 |
|
| Chr1 | 4,026 | 4,021 | 4,039 | 4,050 | 3,940 |
| Chr2 | 3,196 | 3,198 | 3,215 | 3,186 | 3,153 | |
| Chr3 | 3,569 | 3,567 | 3,566 | 3,597 | 3,607 | |
| Chr4 | 2,531 | 2,530 | 2,534 | 2,477 | 2,493 | |
| Chr5 | 2,313 | 2,305 | 2,310 | 2,338 | 2,329 | |
| Chr6 | 2,292 | 2,293 | 2,290 | 2,262 | 2,266 | |
| Chr7 | 2,183 | 2,185 | 2,193 | 2,165 | 2,021 | |
| Chr8 | 1,933 | 1,934 | 1,939 | 1,912 | 1,827 | |
| Chr9 | 1,605 | 1,605 | 1,574 | 1,545 | 1,515 | |
| Chr10 | 1,538 | 1,528 | 1,536 | 1,502 | 1,416 | |
| Chr11 | 1,685 | 1,683 | 1,675 | 1,486 | 1,333 | |
| Chr12 | 1,693 | 1,692 | 1,705 | 1,523 | 1,435 | |
| Chr0 (a) | 434 | 497 | ||||
| Total | 28,564 | 28,541 | 28,576 | 28,477 | 27,832 | |
|
| Both mapped | 32730 | 32623 | 31741 | 30162 | |
| Same Chr-Same Strand | 32646 | 32611 | 30422 | 28760 | ||
| Same Chr-Reverse Strand | 80 | 10 | 317 | 335 | ||
| Differential Chr. | 4 | 2 | 1002 | 1067 | ||
| Mapped on only TIGR | 45 | 152 | 1034 | 2613 | ||
| Unmapped on only TIGR | 15 | 17 | 187 | 192 | ||
| Both unmapped | 2397 | 2395 | 2225 | 2220 | ||
|
| 29925 | |||||
|
| 2186 | |||||
: sequence-assembled contigs that were not localized to one of the 12 chromosomes.
Comparisons of FL-cDNA loci and TIGR4 loci
| Class | ||||
| AE | NAE | ANE | ||
|
| 23193 | 0 | 32697 | |
|
| 23117 | 5447 | 0 | |
|
|
| 29808 | 2967 | 0 |
|
| 201343 | 11255 | 0 | |
|
| 465816 | 17481 | 0 | |
|
| 511817 | 21850 | 0 | |
Figure 1Gene structure analysis in rice.
(a) The length distribution of FL-cDNA for FL-AE (black) and FL-NAE (white). (b) The distribution of open reading frame (ORF) proportions for FL-AE (black) and FL-NAE (white). (c) The distribution of FL-cDNA locus lengths for FL-AE (black) and FL-NAE (white). (d) The distribution of locus lengths for CDS-AE (black) and CDS-ANE (white) in TIGR4. (e) The distribution of the number of exons for FL-AE (black) and FL-NAE (white). (f) The distribution of exon (black) and intron (white) lengths for the respective locus types. (g) The distribution of the number of FL-cDNA clones mapped per single FL-AE (black) and FL-NAE (white) locus.
Structural characteristics of locus types
| FL-cDNA length (median) | ORF ratio (median) | Locus length (median) | Variation of locus length | Number of exons (average) | Exon length (median) | Intron length (median) | Ave.nunber of mapped FL-cDNA clones | |
|
| 1540 | 66% | 3354 | Rich | 5.3 | 154 | 153 | 22.3 |
|
| 1173 | 21% | 1727 | Poor (short) | 2.4 | 247 | 251 | 4.1 |
|
| - | - | 3173 | Rich | 5.8 | 147 | 162 | - |
|
| - | - | 1643 | middle | 3.9 | 199 | 186 | - |
: For the calculation of locus lengths, we used the maximum lengths of individual loci.
: Exons shorter than 10 bp were excluded from the analysis. Thus, the definition of an exon in FL-cDNA loci differs from that in TIGR OSA1.
: Introns shorter than 10 bp were excluded from the analysis. Thus, the definition of an intron in FL-cDNA loci differs from that in TIGR OSA1.
The frequency of Arabidopsis homologus gene in each FL-locus
| FL-AE | FL-NAE | Total | ||||
| homology | Locus | FLcDNA | Locus | FLcDNA | Locus | FL-cDNA |
|
| 11898 | 17669 | 75 | 90 | 11973 | 17759 |
|
| 4763 | 6941 | 140 | 162 | 4903 | 7103 |
|
| 3663 | 5198 | 2404 | 2715 | 6067 | 7913 |
|
| 20324 | 29808 | 2619 | 2967 | 22943 | 32775 |
: HH, LH, NH: highly-, low- or non-homologous FL-cDNA with Arabidopsis CDS
Signal intensities in microarray analysis in relation to the locus types and the collection efficiencies of FL-cDNA.
| Collection efficiency (FL-cDNA clones/locus) | ||||||
| 1-2 | 3-5 | 6-9 | 10-15 | |||
|
|
|
| 16380 | 14056 | 12656 | 12404 |
|
| 3779 | 4323 | 5452 | 6716 | ||
|
| 852 | 1421 | 1984 | 2572 | ||
|
| 302 | 516 | 761 | 1010 | ||
|
| 2568 | 3758 | 4723 | 5985 | ||
|
|
| 8768 | 2680 | 1400 | 804 | |
|
| 3057 | 3780 | 6243 | 9122 | ||
|
| 596 | 1007 | 1648 | 2115 | ||
|
| 196 | 397 | 576 | 609 | ||
|
| 1826 | 2729 | 4397 | 7059 | ||
: Samples from four tissues of Nipponbare were used for the analysis, thus the numbers of data are the numbers of loci in each category multiplied by four (samples).
The number of significant signal-detected loci in each locus type
| Shoot | Root | Panicle | Callus | All samples | ||
|
| No.locus | 21885 | 21885 | 21885 | 21885 | 21885 |
| Sig. signal | 14438 | 15560 | 15894 | 15602 | 12021 | |
| No-sig. signal | 7447 | 6325 | 5991 | 6283 | 3234 | |
|
| No.locus | 3540 | 3540 | 3540 | 3540 | 3540 |
| Sig. signal | 1640 | 1528 | 1725 | 1584 | 1267 | |
| No-sig. signal | 1900 | 2012 | 1815 | 1956 | 1474 | |
|
| No.locus | 17665 | 17665 | 17665 | 17665 | 17665 |
| Sig. signal | 3862 | 3316 | 3453 | 3187 | 2589 | |
| No-sig. signal | 13803 | 14349 | 14212 | 14478 | 12983 |
: Sig.signal (No-sig. signal) indicates the number of loci with (without) significant signal.
: The number of loci with significant signals in each of the four samples, or number of loci with no significant signal in each of the four samples.
Coverage of genes encoding transcription factors in the rice FL-cDNA libraries
| Family | Total loci | CDS-AE | CDS-ANE | CDS-AE Proportion | Significance |
|
| 164 | 105 | 59 | 64.0 | Low |
|
| 144 | 100 | 44 | 69.4 | |
|
| 123 | 84 | 39 | 68.3 | |
|
| 121 | 87 | 34 | 71.9 | |
|
| 102 | 73 | 29 | 71.6 | |
|
| 97 | 66 | 31 | 68.0 | |
|
| 91 | 72 | 19 | 79.1 | |
|
| 85 | 70 | 15 | 82.4 | |
|
| 81 | 58 | 23 | 71.6 | |
|
| 66 | 57 | 9 | 86.4 | High |
|
| 64 | 37 | 27 | 57.8 | Low |
|
| 54 | 33 | 21 | 61.1 | Low |
|
| 52 | 35 | 17 | 67.3 | |
|
| 49 | 44 | 5 | 89.8 | High |
|
| 46 | 36 | 10 | 78.3 | |
|
| 1903 | 1432 | 471 | 75.2 |
: The classification was based on the Rice Transcription Factor Database version 2.1 (http://ricetfdb.bio.uni-potsdam.de/v2.1/)
: The ratio was calculated as the number of CDS-AE loci/total number of loci.
: Significances of difference were examined between the CDS-AE proportion for each gene family and all CDS encoding TF by the chi-squared test. High (Low) represents the cloning efficiency of a TF family was significantly higher (lower) than the collection efficiency for the entire TF.
Coverage of genes associated with metabolic pathways in rice FL-cDNAs libraries
| Pathway id | Pathway name | All-CDS | CDS-AE | CDS-ANE | CDS-AE proportion | Significance |
|
| cytokinins -glucoside biosynthesis | 235 | 144 | 91 | 61.3 | Low |
|
| triacylglycerol degradation | 177 | 127 | 50 | 71.8 | Low |
|
| homogalacturonan degradation | 134 | 95 | 39 | 70.9 | Low |
|
| removal of superoxide radicals | 79 | 13 | 66 | 16.5 | Low |
|
| gluconeogenesis | 67 | 61 | 6 | 91.0 | High |
|
| aerobic respiration | 66 | 39 | 27 | 59.1 | Low |
|
| tRNA charging pathway | 64 | 61 | 3 | 95.3 | High |
|
| UDP-glucose conversion | 63 | 60 | 3 | 95.2 | High |
|
| galactose degradation I | 54 | 53 | 1 | 98.1 | High |
|
| colanic acid building blocks biosynthesis | 51 | 50 | 1 | 98.0 | High |
|
| suberin biosynthesis | 51 | 34 | 17 | 66.7 | Low |
|
| galactose degradation III | 51 | 51 | 0 | 100.0 | High |
|
| flavonoid biosynthesis | 50 | 24 | 26 | 48.0 | Low |
|
| chlorophyll biosynthesis | 48 | 46 | 2 | 95.8 | High |
|
| salvage pathways of purine and pyrimidine nucleotides | 44 | 42 | 2 | 95.5 | High |
|
| formaldehyde assimilation II (RuMP Cycle) | 42 | 40 | 2 | 95.2 | High |
|
| phenylpropanoid biosynthesis | 40 | 23 | 17 | 57.5 | Low |
|
| 2750 | 2125 | 625 | 77.3 | ||
: The classification was based on the RiceCyc database at the GRAMENE Web site (http://www.gramene.org/pathway/).
: PWY-2881, PWY-2901, and PWY-2902 share the same genes in the respective pathways.
: The ratio was calculated as the number of CDS-AE loci/total number of loci.
: Significances of difference were examined between the CDS-AE proportion in genes in the respective pathway and that of all genes associated with metabolic pathways by the chi-squared test. High (Low) represents that the cloning efficiency of genes for a metabolic pathway was higher (lower) than the collection efficiency for all genes associated with metabolic pathways.
Coverage of genes with Pfam domains in the rice FL-cDNA libraries
| Pfam ID | InterPro ID | Short name | All loci | CDS-AE | CDS-ANE | CDS-AE proportion | Significance | Ecoli |
|
| IPR000504 | RNP1_RNA_bd | 233 | 204 | 29 | 87.6 | High | |
|
| IPR001680 | WD40 | 212 | 198 | 14 | 93.4 | High | |
|
| IPR002048 | EF_hand_Ca_bd | 164 | 137 | 27 | 83.5 | High | |
|
| IPR001650 | Helicase_C | 131 | 117 | 14 | 89.3 | High | 1 |
|
| IPR001087 | Lipase_GDSL | 113 | 92 | 21 | 81.4 | High | 1 |
|
| IPR001440 | TPR_1 | 108 | 99 | 9 | 91.7 | High | 1 |
|
| IPR001623 | DnaJ_N | 108 | 92 | 16 | 85.2 | High | 1 |
|
| IPR003959 | AAA_ATPase_core | 106 | 94 | 12 | 88.7 | High | 1 |
|
| IPR002198 | SDR | 97 | 80 | 17 | 82.5 | High | 1 |
|
| IPR000073 | AB_hydrolase_1 | 91 | 81 | 10 | 89.0 | High | 1 |
|
| IPR000719 | Prot_kinase | 1289 | 892 | 397 | 69.2 | Low | 1 |
|
| IPR001611 | LRR | 890 | 533 | 357 | 59.9 | Low | 1 |
|
| IPR001810 | F-box | 716 | 413 | 303 | 57.7 | Low | |
|
| IPR002182 | NB-ARC | 427 | 212 | 215 | 49.6 | Low | |
|
| IPR010811 | DUF1409 | 253 | 6 | 247 | 2.4 | Low | |
|
| IPR009546 | DUF1165 | 231 | 4 | 227 | 1.7 | Low | |
|
| IPR001878 | Znf_CCHC | 226 | 46 | 180 | 20.4 | Low | |
|
| IPR005213 | HGWP | 215 | 3 | 212 | 1.4 | Low | |
|
| IPR008906 | HATC | 215 | 32 | 183 | 14.9 | Low | |
|
| IPR002110 | ANK | 176 | 113 | 63 | 64.2 | Low | 1 |
|
| 24247 | 17459 | 6788 | 72.0 | ||||
: The domain information was taken from TIGR OSA1 (http://www.tigr.org/tdb/e2k1/osa1/index.shtml).
: The ratio was calculated as the number of CDS-AE loci/total number of loci.
: Significances of difference were examined between the CDS-AE proportion for genes encoding the respective domain and that for genes encoding the entire domains examined by the chi-squared test. High (Low) represents the cloning efficiency of genes encoding the respective domain was higher (lower) than the collection efficiency of genes encoding the entire domains examined.
: Pfam domain data for E. coli K12 published at http://www.sanger.ac.uk/Software/Pfam/. A value of 1 indicates that genes encoding the corresponding domain are found in the E. coli K12 genome.
The list of FL-cDNA in excluded internal sequences
| DDBJ accession | Clone length | Chr | Locus length (bp) | TIGR4 locus | Deleted domain | Short name | E.coli |
| AK058767 | 492 | 1 | 9411 | LOC_Os01g05760 | |||
| AK058349 | 527 | 3 | 1193 | LOC_Os03g08500 | IPR001471 | AP2-EBPRF | |
| AK063600 | 537 | 3 | 1190 | LOC_Os03g29250 | IPR004331 | SPX_N | |
| AK063698 | 520 | 4 | 2725 | LOC_Os04g01740 | IPR001404 | Hsp90 | 1 |
| AK063751 | 530 | 4 | 2712 | LOC_Os04g01740 | IPR001404 | Hsp90 | 1 |
| AK060229 | 524 | 4 | 2423 | LOC_Os04g21320 | IPR006702 | DUF588 | |
| AK059443 | 478 | 4 | 1500 | LOC_Os04g42020 | IPR000315 | B-box | |
| AK105205 | 592 | 4 | 1256 | LOC_Os04g58760 | IPR006702 | DUF588 | |
| AK062513 | 499 | 5 | 1345 | LOC_Os05g33220 | IPR008390 | AWPM-19 | |
| AK062487 | 526 | 6 | 1478 | LOC_Os06g10350 | IRP001005 | Myb_DNA_bd | |
| AK241732 | 237 | 6 | 4343 | LOC_Os06g10600 | IPR002913 | START_lipid_bd | |
| AK105179 | 543 | 7 | 1107 | LOC_Os07g42610 | IPR001841 | Zinc finger, RING-type | |
| AK058489 | 587 | 8 | 1288 | LOC_Os08g37370 | IPR001993 | Mitoch_carrier | |
| AK063134 | 432 | 8 | 5613 | LOC_Os08g45030 | IPR003439 | ABC_transp_like | 1 |
| AK059470 | 562 | 9 | 1928 | LOC_Os09g24690 | IPR008195 | Ribosomal_L34E | |
| AK063094 | 444 | 10 | 1075 | LOC_Os10g39450 | IPR001841 | Zinc finger, RING-type | |
| AK070901 | 457 | 12 | 1971 | LOC_Os12g29400 | IPR004182 | GRAM |
: Chromosome number to which FL-cDNA were mapped in TIGR4.
: Domains encoded in the sequences excluded from FL-cDNA sequences. The information was obtained from TIGR OSA1.
: Pfam domain data for E. coli K12 published at http://www.sanger.ac.uk/Software/Pfam/. A value of 1 indicates that genes encoding the corresponding domain are found in the E. coli K12 genome.
Figure 2Strategy for mapping of FL-cDNA clones and FL-ESTs, and for definition of FL-cDNA loci.