Literature DB >> 18687674

Collection and comparative analysis of 1888 full-length cDNAs from wild rice Oryza rufipogon Griff. W1943.

Tingting Lu1, Shuliang Yu, Danlin Fan, Jie Mu, Yingying Shangguan, Zixuan Wang, Yuzo Minobe, Zhixin Lin, Bin Han.   

Abstract

A huge amount of cDNA and EST resources have been developed for cultivated rice species Oryza sativa; however, only few cDNA resources are available for wild rice species. In this study, we isolated and completely sequenced 1888 putative full-length cDNA (FLcDNA) clones from wild rice Oryza rufipogon Griff. W1943 for comparative analysis between wild and cultivated rice species. Two cDNA libraries were constructed from 3-week-old leaf samples under either normal or cold-treated conditions. Homology searching of these cDNA sequences revealed that >96.8% of the wild rice cDNAs were matched to the cultivated rice O. sativa ssp. japonica cv. Nipponbare genome sequence. However, <22% of them were fully matched to the cv. Nipponbare genome sequence. The comparative analysis showed that O. rufipogon W1943 had greater similarity to O. sativa ssp. japonica than to ssp. indica cultivars. In addition, 17 novel rice cDNAs were identified, and 41 putative tissue-specific expression genes were defined through searching the rice massively parallel signature-sequencing database. In conclusion, these FLcDNA clones are a resource for further function verification and could be broadly utilized in rice biological studies.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18687674      PMCID: PMC2575888          DOI: 10.1093/dnares/dsn018

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

The wild rice species Oryza rufipogon Griff. (AA genome) is the most closely related ancestral species to Asian cultivated rice (O. sativa L.).[1,2] It contains various valuable traits with regard to tolerance to cold, drought and salinity. It also contains many quantitative trait loci with agronomic important traits.[3,4] However, cultivated rice, which feeds more than half of the world's population, is often threatened by multifarious environmental factors including drought, salinity, cold and other factors. The O. sativa ssp. japonica cv. Nipponbare genome has been completely sequenced through a map-based sequencing strategy.[5] The draft genome sequence of the O. sativa ssp. indica cv. 93-11 was also generated through a whole-genome shotgun sequencing approach.[6] The Rice Full-Length cDNA Consortium collected over 28 000 full-length complementary DNA (FLcDNA) clones from cv. Nipponbare.[7] Now, there are >47 000 cultivated rice FLcDNA sequences publicly available (ftp://ftp.ncbi.nih.gov/). There is also a collection of 10 096 FLcDNAs of O. sativa ssp. indica cv. Guangluai 4.[8] Moreover, comparative genome analysis has been developed to decipher the similarity and diversity among rice varieties, using single nucleotide polymorphisms data in 21 rice genomes.[9] Comparative analysis with cultivated rice cDNA sequences has also been developed using the microarray method.[10] In contrast, for wild rice, there are few batches of mRNAs and FLcDNAs in public databases, with the exception of 5211 leaf ESTs from the O. minuta (BBCC genome).[11] Oryza rufipogon has been classified into perennial and annual ecotypes.[12] W1943 is a perennial O. rufipogon. For the first time, a total of 1888 FLcDNAs of O. rufipogon W1943 were generated in the present study; most (>96.8%) were highly homologous with cultivated rice genome sequences. Furthermore, W1943 had greater similarity to ssp. japonica than to ssp. indica. Additionally, 1% of W1943 FLcDNAs was verified as novel rice genes not previously reported. We also discovered 41 putative tissue-specific expressed genes by applying the rice massively parallel signature-sequencing (MPSS) database.[13]

Materials and Methods

Plant materials and cDNA library construction

Two enriched FLcDNA libraries were constructed from wild rice O. rufipogon Griff. W1943. Seeds were germinated and seedlings were grown in a greenhouse with day/night of 13/11 h and 25/30°C. Three weeks after germination, some seedlings were exposed to 5°C and leaves were separately harvested after 0, 1, 12, 24, 48, 72 and 120 h of cold treatment. We constructed two cDNA libraries from 3-week-old rice leaves grown under normal and cold conditions, respectively. All samples were immediately frozen in liquid nitrogen and stored at −80°C. We constructed two FLcDNA libraries according to the Cap-Tagging[8] and Cap-trapper methods.[14] The 5′ cap-tagging method utilizes the 5′ cap-capture technique through the combined treatments of calf intestinal phosphatase (CIP) and tobacco acid pyrophosphatase (TAP) so that only the FLcDNA was targeted for library construction. The cap-trapper method is based on chemical introduction of a biotin group into the diol residue of the cap structure of mRNA, which is followed by RNase I treatment to select FLcDNA. Total RNA was isolated using the TRIZOL reagents, and mRNAs were purified with the Oligotex mRNA kit (Qiagen). Double-stranded cDNA was digested with EcoRI (1 U) and XhoI (10 U) for 1 h at 37°C, and cDNA fraction of 0.6–2 kb was collected and pooled, with which ligated to the sites of EcoRI and XhoI of vector pBluescript SK+ (Strategene) at 16°C overnight. Then, cDNA was transformed into competent E. coli DH10B cells (Invitrogen) by electroporation. We assessed the library quality by assaying ligations and carrying out 5′-end sequencing; the former procedure determined library titer, and the latter used to evaluate cDNA full-length percentage as well as the proportion of empty vectors.

DNA sequencing and assembling

DNA sequencing was carried out on ABI3730 sequencers. The clones were sequenced from both ends by the dideoxy chain termination method using BigDye Terminator Cycle sequencing V2.0 Ready Reaction (Applied Biosystems). The Phred base-calling software was used to analyze sequence trace files and generate raw sequences.[15] Peaks with Phred quality values of <20 were taken as ambiguous sequences and were presented by a universal placeholder ‘N’. Vector sequences were filtered automatically. Then, all 5′-tagged sequences were selected by a Perl script for clustering, which used the TGICL program.[16] These singletons and every representative clone from each contig were selected to be completely sequenced by bidirectional sequencing strategy. All processed sequences were assembled by Phrap software. Accession numbers for submitted data in the EMBL database CT841557–CT841684; CT841686–CT841707; CT841710–CT841954; CT841956–CT842008; CU405560–CU405627; CU405629–CU405654; CU405656–CU405706; CU405708–CU405710; CU405712–CU405714; CU405716–CU405717; CU405719–CU405720; CU405722–CU405729; CU405731–CU405880; CU405882–CU405928; CU405930–CU406064; CU406066–CU406249; CU406251–CU406335; CU406337–CU406954 and CU861673–CU861883. These W1943 sequences are available from our website (http://202.127.18.228/ricd/dym/ftp.php).

Comparative analysis of FLcDNA sequences

Similarity searches were performed with BLAST (version 2.2.14) program[17] against sequence data as follows: NCBI GenBank nt DB (2007-12), nr DB (2007-12), est-other DB (2007-07), rice japonica genomic sequence (http://rgp.dna.affrc.go.jp/IRGSP/), the Institute for Genomic Research (TIGR) rice cDNA data (release 4.0), TIGR_Oryza_Repeats_v3.1, Knowledge-based Oryza Molecular Biological Encyclopedia japonica cDNA collection (http://cdna01.dna.affrc.go.jp/cDNA, 2006-10-11) and National Center for Gene Research (NCGR, http://www.ncgr.ac.cn/ricd) Rice Indica cDNA Database (RICD). We downloaded all above sequence data and used our 1888 clones as query sequences. The similarity threshold of E-value was lower than 1E−10. We searched InterPro database[18] to compare the profiles of proteins encoded in W1943 FLcDNAs. Functional classification of cDNAs was referred to PFAM profiles.[19] A similarity-based tool sim4[20] was used to align W1943 FLcDNA sequence with rice genomic sequence. It was also used to identify and discard redundant gene sequences. Open reading frames (ORFs) of cDNA sequences were determined by using getorf program of EMBOSS package.[21] The rice MPSS database[13] was used for quantitative expression analysis of these W1943 cDNAs in rice. The expression levels were calculated for rice different tissues or same tissues at different developmental stages by summing all expressed tags in the sense strand. To calculate synonymous divergence (Ks), program ClustalX 1.8[22] and PAL2NAL (version: V11)[23] were applied. Rfam database[24] (http://www.sanger.ac.uk/Software/Rfam/) and miRBase[25] (http://microrna.sanger.ac.uk/) data were downloaded for non-protein-coding transcripts analysis. Software mFOLD was applied to predict pre-miRNAs' secondary structure (http://mfold.bioinfo.rpi.e.du/).[26]

Results and Discussion

Overall description of W1943 FLcDNA sequences

Two full-length enriched cDNA libraries of O. rufipogon W1943 were constructed following the cap-tagging method.[8] Each cDNA library was composed of 1 × 106 independent clones. The average cDNA sizes were 0.5–1.5 kb. In total, we randomly selected 8352 clones (6432 were from the normal rice leaf cDNA library and 1920 were from the cold-stressed rice leaf cDNA library) for 5′-end sequencing. In total, there were 4876 tagged potential FLcDNA clones of at least 100 continuous nucleotides with a Phred score of >20, after removal of vector sequences and low quality reads. The TGICL program[16] was used to cluster these 4876 cDNA clones. Thus, there were 2350 cDNAs, consisting of 454 representative unique clone contigs and 1896 singletons, generated for completely sequencing and assembling. Overlapping 5′ and 3′ reads were assembled to consensus sequences through the bidirectional sequencing strategy. Up to now, we have successfully obtained 1888 non-redundant W1943 cDNA sequences. Of 1888 cDNA sequences, 1360 sequences matched to NCBI GenBank non-redundant database of proteins (nrDB) (E < 1e−10; >70% identity). Of 1360 sequences, 997 cDNAs could fully cover the protein N-terminal first amino acid sequence. Therefore, we estimated that >70% of the 1832 cDNA sequences were FLcDNAs. It should be pointed out that the efficiency of CIP and TAP treatments played a key role in constructing the FLcDNA library. On the other hand, it was also possible that some of the remaining 30% putative truncated cDNA sequences might be genuine FLcDNAs transcribed from alternative start sites. There are lots of alternative transcription start sites known in mammals.[27,28]

Mapping of the 1888 W1943 FLcDNAs onto cultivated rice O. sativa genomic sequences

The 1888 FLcDNAs from O. rufipogon W1943 were mapped to O. sativa ssp. japonica cv. Nipponbare genomic sequence pseudomolecules (version 4.0) and compared with GenBank nrDB based on BLASTn (E < 1e−10) and BLASTx (E < 1e−10), respectively.[5] Of the 1888 FLcDNA sequences, 1831 (97.0%) could be aligned to the japonica genomic sequences at >80% sequence identity over the entire length (Fig. 1). The remaining 57 cDNAs that did not match the ssp. japonica genomic sequences are discussed in the following analysis. Among 1831 W1943 cDNAs, 395 (21.6%) fully matched the ssp. japonica cv. Nipponbare genomic sequences with 100% identity at nucleotide level. However, among 1831 cDNAs, 487 fully matched to corresponding proteins in nrDB with 100% identity. Therefore, 35.8% of W1943 cDNAs had full identity to proteins from nrDB at amino acid level. In spite of relatively low full identity at nucleotide acid level (only 21.6%), it was more conservative at amino acid level (>35.8%) between wild and cultivated rice. It was propitious to protect some key proteins from losing their conserved and vital functions.
Figure 1

Mapping of the 1888 FLcDNAs onto Oryza sativa genomic sequences.

Mapping of the 1888 FLcDNAs onto Oryza sativa genomic sequences. We also mapped the 1888 W1943 FLcDNAs to the O. sativa ssp. indica cv. 93-11 whole-genome shotgun sequences using BLASTn (E < 1e−10). A total of 1837 (97.2%) W1943 cDNAs could be aligned to the cv. 93-11 genome sequences at >80% sequence identity over the entire length (Fig. 1). Of these, 126 (6.9%) identically matched the cv. 93-11 genome sequences. These results indicated that the sequence of wild rice W1943 had a very high similarity with those of cultivated ssp. japonica (97.0%) and ssp. indica (97.2%) rice; and W1943 had greater similarity to japonica than to indica at nucleotide acid level. Monna et al.[29] surmised that W1943 was closer to japonica than to indica. It has been reported that japonica cultivars are closely related to the O. rufipogon perennial strains, and indica cultivars closely related to the O. rufipogon annual strains.[30] Our results confirmed this conclusion at transcriptional level. In the case of 395 W1943 FLcDNAs that were 100% matched to the genomic sequences, we checked the splicing patterns by comparing with all rice ESTs or mRNAs in public databases. The results revealed that 15 W1943 cDNAs had alternative splicing patterns when compared with cultivated rice ESTs or mRNAs (Table 1). These alternative splicing patterns might be specific for W1943. Furthermore, the first introns of two genes (CT841942 and CU406810) had a distinct splice site with GC-AG and GT-TG. We concluded that cultivated rice had experienced some mutations including the intron region, and thus some genes were lost over the long evolutionary period. There were four typical alternative splicing patterns of these sequences (Fig. 2).
Table 1

List of 15 Oryza rufipogon W1943 genes with specific alternative splicing patterns

Accession NumberLength (bp)ChromosomeNumber of exonProtein
CT841942978076 (1st intron: GC-AG)
CU406810958066 (1st intron: GT-TG)Dual-specificity phosphatase protein
CT8418931011016Drought-induced protein
CT8418741369014Vesicle transport protein
CU4058531377051Dehydration-responsive protein
CU405923639071IAA amidohydrolase
CU406279648051
CU406025839021
CT841561740062
CU406579468092
CU4069351345012
CU4066001107012
CU405570952012
CU406091893013
CU406134665103
Figure 2

Total 17 W1943 cDNAs had alternative splicing patterns different from previous ESTs or mRNAs in public database. It revealed four typical splicing patterns in wild rice species.

Total 17 W1943 cDNAs had alternative splicing patterns different from previous ESTs or mRNAs in public database. It revealed four typical splicing patterns in wild rice species. List of 15 Oryza rufipogon W1943 genes with specific alternative splicing patterns It should be pointed out that 10 of 1831 W1943 cDNAs had no hits to previously reported rice ESTs or mRNAs in GenBank database (Table 2). Another seven cDNAs had hits to rice ESTs or mRNAs at the sense–antisense pattern (Table 3). So these cDNA sequences offered novel rice transcripts to public database. As for the 17 W1943 cDNA sequences, they were either wild-rice-specific genes or cultivated rice co-owner genes. If the latter was the case, it may indicate that these genes are expressed at much lower levels in cultivated than in wild rice. Hence, it would be difficult to clone these cDNAs from cultivated rice in spite of a total of ∼47 000 ssp. japonica and ssp. indica cDNAs available in the current public database (ftp://ftp.ncbi.nih.gov/). We used the rice MPSS database (http://mpss.udel.edu/rice/) to detect the expression level of these 17 putative novel W1943 cDNAs under different conditions.[13] The results showed that 15 of 17 cDNAs were not detected having expressed tags with sense strand orientation in different tissues. Gene ‘CU861721’ was found only 18 times per million (tpm) in young leaves and gene ‘CU406355’ was found >100 tpm in young roots and germinating seedlings.
Table 2

List of 10 novel cDNA transcripts of Oryza rufipogon W1943

Accession NumberProteinLength (bp)ChromosomeIdentity (%)
CU4057857270599
CU4061385680299
CU4060225431299
CU40575747704100
CU40692141402100
CU40653538902100
CU4068325301092
CU4068714580184
CU8618043830699
CU86172155401100
Table 3

List of seven sense–antisense cDNA transcripts of Oryza rufipogon W1943

Accession NumberLength (bp)ProteinLocation (chr)Identity (%)Antisense geneLocation (chr)Protein
CU4057857270599CA76408101DNA-directed RNA polymerase 3
CU8617954750979CT858901unsureUnknown
CU4063558371297AK10712512AP2 domain, putative
CU4063965200299AK10348502Hypothetical
CT8418009411199AK12196211Patatin, putative
CU8616886930899AK10918208Hypothetical
CT84193715520898AK10671308Unknown
List of 10 novel cDNA transcripts of Oryza rufipogon W1943 List of seven sense–antisense cDNA transcripts of Oryza rufipogon W1943 In addition, 57 W1943 cDNAs that could not be aligned to the ssp. japonica cv. Nipponbare genomic sequence were further analyzed. After comparing with other public databases, 14 of them matched the ssp. indica cv. 93-11 genomic sequences, 6 matched to rice ESTs in NCBI est-other database, 4 had similarity to Sorghum bicolor, Triticum aestivum, Manihot esculenta and Spartina alterniflora ESTs, 15 were homologs to Gibberella moniliformis, Gibberella zeae and Magnaporthe grisea, and the remaining 18 had no hits. Table 4 listed 24 W1943 cDNAs' information after excluding 15 possible contamination clones and 18 no any hits clones. Several W1943 cDNAs that did not match to the cv. Nipponbare genomic sequence might be located in the gap of genomic sequence or might be related to wild rice W1943-specific genes.
Table 4

List of 24 no-hit Oryza sativa ssp. japonica genome sequences

NumberAccession Numberjaponica chromosome93–11 locationESTs or mRNA hitsProtein
1CT842002Contig005912AK241925.1
2CT842007Contig008507CT856206
3CU405940Contig001402AK103326Unknown protein
4CU406172Contig014596AK242967.1
5CT842006Contig000383AK111647GTP-binding protein
6CU861753Contig000750AK099287Ring-box protein
7CU406308Contig000444AK070131Unknown protein
8CT841996Contig002576CT834800Unknown protein
9CU406568Contig003848AK064050Bowman Birk trypsin inhibitor
10CU406582Contig000444AK107776Unknown protein
11CU406596Contig001277AK242711.1Hypothetical protein
12CT842008Contig008507CT856206Unknown protein
13CU406895Contig003011CT859459Hypothetical protein
14CU861744Contig000750AK099287Ring-box protein
15CU405657CT856885
16CT841712CA766528
17CU405768CT83665660S ribosomal protein L7A
18CU405675CA75623560S ribosomal protein L17
19CU406202NM_001063334Unknown
20CU406924AC145809
21CU405898CN130755.1 (Sorghum bicolor)Ribulose-bisphosphate carboxylase
22CU406778BE429292.1 (Triticum turgidum)Hydrophobin
23CU861677FF534517.1 (Manihot esculenta)Hypothetical protein
24CT841912EH277383.1 (Spartina alterniflora)Unknown protein
List of 24 no-hit Oryza sativa ssp. japonica genome sequences

Comparative analysis with cultivated rice cDNA sequences in public databases

The 1888 W1943 cDNAs were compared with cultivated rice cDNA sequences. The large-scale rice ssp. japonica cv. Nipponbare cDNA sequences have been released to public databases.[7] Recently, another batch of rice ssp. indica cv. Guangluai 4 cDNA sequences was released to public databases (ftp://ftp.ncbi.nih.gov/; http://www.ncgr.ac.cn/RICD).[8] We compared these two major cultivated rice varieties' cDNAs with 1888 W1943 cDNA sequences. For convenience, here we named cv. Nipponbare cDNA sequences as KOME (knowledge-based oryza molecular biological encyclopedia) and cv. Guangluai 4 cDNA sequences as NCGR (National Center for Gene Research, CAS). At present, there are 35 187 ssp. japonica FLcDNA sequences in KOME, and 10 096 ssp. indica FLcDNA sequences in NCGR. Initially, we identified chromosomal distributions of the three different rice cDNAs along the cv. Nipponbare chromosomal pseudomolecules (Fig. 3). Though there were relatively small quantities of W1943 cDNAs, there were similar trace trends and no visible large bias comparing KOME and NCGR cDNAs. So the 1888 W1943 cDNAs can give clues to the entire W1943 genome.
Figure 3

Chromosomal distributions of the three different rice cDNAs (W1943, KOME, NCGR) along the ssp. japonica cv. Nipponbare chromosomal pseudomolecule sequences. Though relative small quantities of W1943 cDNAs, it had about similar trace trends and no visible large bias comparing with KOME and NCGR (KOME, Oryza sativa ssp. japonica Nipponbare cDNAs; NCGR, Oryza sativa ssp. indica Guangluai 4 cDNAs.).

Chromosomal distributions of the three different rice cDNAs (W1943, KOME, NCGR) along the ssp. japonica cv. Nipponbare chromosomal pseudomolecule sequences. Though relative small quantities of W1943 cDNAs, it had about similar trace trends and no visible large bias comparing with KOME and NCGR (KOME, Oryza sativa ssp. japonica Nipponbare cDNAs; NCGR, Oryza sativa ssp. indica Guangluai 4 cDNAs.). A Perl script known as MISA (http://pgrc.ipk-gatersleben.de/misa/) was used to identify simple sequence repeats (SSRs) in these cDNA sequences. We described all SSR motifs of 1–6 nucleotides in size. The minimum repeat unit was prescribed as follows: 10 repeats for mononucleotides, 6 for di-nucleotides and 5 for all the other motifs such as tri-, tetra-, penta- and hexa-nucleotides. We detected the five highest frequencies of SSR motifs of the overall cDNA sequences, 5′-UTR sequences, ORF sequences and 3′-UTR sequences, respectively (Fig. 4). The highest frequencies of the SSR motifs in the three different rice cDNAs were identical in 5′-UTR, ORF or 3′-UTR regions. First, the motif CCG/CGG has the highest frequencies in 5′-UTR and ORF regions, but the SSR motif A/T has the highest frequency in 3′-UTR region. Second, all kinds of motif types were unevenly distributed in the FLcDNA sequences. The motifs CCG/CGG and A/T were more frequent in the ORF and 3′-UTR regions, respectively, with frequencies >50%. However, in 5′-UTR regions, the most frequent SSR motifs were ≤28%. In addition, scanning showed that the three most frequent SSR motif-types in ORF regions were all triplets that differed from those in UTR regions. This difference was very important for coding sequence because tri-nucleotide SSR motif-types could effectively prevent amino acid from frame shifting. Furthermore, the five most frequent SSR motifs were all triplets; the only exception was the fourth most frequent SSR type of NCGR, which was A/T (7.19%). In the process of evolution, relative higher frequency of mononucleotide SSR motifs of NCGR ORF was likely to be one key factor that led to divergence of ssp. indica and ssp. japonica. This could partly explain why W1943 was closer to japonica than to indica.
Figure 4

The first five highest frequency SSR motifs in the overall cDNA sequences, 5′-UTR sequences, ORF sequences and 3′-UTR sequences, respectively.

The first five highest frequency SSR motifs in the overall cDNA sequences, 5′-UTR sequences, ORF sequences and 3′-UTR sequences, respectively. We carried out transcripts comparisons between W1943 and the other two cultivated rice subspecies (Fig. 5). A total of 823 W1943 cDNAs were detected according to their homology with both KOME and NCGR (≥95% identity and non-redundant hit to KOME and NCGR). We extracted the ORF of each cDNA sequence using the getorf program.[21] The amino acid levels in a total of 194 ORF groups were all identical (Fig. 5A), 143 ORF groups were specifically identical between W1943 and KOME, 87 ORF groups were specifically identical between W1943 and NCGR, and 64 ORF groups were specifically identical between KOME and NCGR. Consequently, 40.9% of transcripts were conserved in wild rice W1943 and cultivated rice ssp. japonica cv. Nipponbare; 34.1% were conserved in W1943 and cultivated rice ssp. indica cv. Guangluai 4 and 31.3% were conserved in cvs. Nipponbare and Guangluai 4.
Figure 5

Comparative analysis with Oryza sativa cDNA sequences in public databases. (A) The relationships of ORFs among 823 W1943, KOME and NCGR co-cDNA groups at amino acid level. (B) The synonymous divergent (Ks) relationships of 194 ORF identical cDNA groups.

Comparative analysis with Oryza sativa cDNA sequences in public databases. (A) The relationships of ORFs among 823 W1943, KOME and NCGR co-cDNA groups at amino acid level. (B) The synonymous divergent (Ks) relationships of 194 ORF identical cDNA groups. The nucleotides of 194 identical ORF groups were extracted for further calculation of synonymous substitution rates. The results showed that 106 of 194 (54.6%) groups were also completely identical at nucleotide level. So the remaining 88 groups were used to calculate synonymous divergence (Ks) (Fig. 5B). Of 88 groups, 42 groups had no synonymous substitution between W1943 and KOME; 9 groups had no synonymous substitution between W1943 and NCGR; 15 groups had no synonymous substitution between KOME and NCGR and another 22 groups had synonymous substitutions among the three species and subspecies. That is, at nucleotide level, 76.2% of 194 identical ORF groups had no changes in W1943 and cv. Nipponbare, and 59.2% for W1943 and cv. Guangluai 4. It was reported[29] that the rates of polymorphisms in predicted intergenic regions of rice were 0.302 (W1943/Nipponbare), 0.653 (W1943/Guangluai 4) and 0.630 (Nipponbare/Guangluai 4), respectively. These were similar to results in coding sequence regions in the present study. Thus, the hypothesis that O. rufipogon W1943 was closer to ssp. japonica than to ssp. indica was further validated.

miRNAs identification

After searching against NCBI nrDB using BLASTx, 432 sequences of 1888 W1943 cDNAs found no hits in the database. Of 432 sequences, 71 were predicted as ORFs > 100 amino acid in length, so the remaining 361 were assumed to be putative non-protein-coding transcripts. Searching against Rfam database and miRBase, four cDNAs matched to four miRNA families; the osa-MIR159a, osa-MIR156j, osa-MIR818e and osa-miR446 families, respectively (Table 5). Using the mFOLD program, all four sequences could be predicted to pre-miRNA secondary structure and identified as miRNAs according to folding results.
Table 5

List of 4 miRNAs

Accession NumberGene length (bp)Pre-miRNA length (bp)Hit-miRNAmiRNA seqChromosome
CU4062921416262 (220–490)osa-MIR159auuuggauugaagggagcucug01
CU4059431511101 (160–280)osa-MIR156jugacagaagagagugagcac06
CU86181956180 (390–470)osa-miR818eaaucccuuauauuuugggacgg04
CU861752727150 (325–475)osa-miR446aucaauaugaaugugggaaau10
List of 4 miRNAs

Expression analysis by searching against the rice MPSS database

We used the rice MPSS database (http://mpss.udel.edu/rice/) to detect the expression level of W1943 cDNAs under different conditions.[13] To define tissue-specific genes, we demarcated the qualifications as follows: (i) the expression level of every gene should >100 tpm of at least one tissue; (ii) if the gene expressed in several diverse tissues, then the highest expression level should be >75% among all tissues and (iii) the ratio of the first two highest expression levels should be >10. Thus, we identified 41 putative tissue-specific genes (Table 6). There were 16 W1943 cDNAs expressed remarkably highly in leaves, 11 cDNAs specifically in roots, 1 in germinating seed, 3 in callus, 7 in germinating seedlings, 1 in meristematic tissue and 2 in mature pollen. Searching against the PFAM protein database, we found that gene ‘CU406902’ was predicted as ‘Lir1, light regulated protein Lir1’. Lir1 mRNA can accumulate in the light, reaching maximum and minimum steady-state levels at the end of the light and dark periods.[31] Another gene ‘CT841733’ was predicted as ‘RuBisCO_small’ (ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit). Although the RuBisCO large subunit is coded for by a single gene, the small subunit is coded for by several different genes, which are distributed in a tissue-specific manner. They are transcriptionally regulated by light receptor phytochrome, which results in RuBisCO being more abundant during the day when it is required.[32]
Table 6

List of Oryza rufipogon W1943 tissue-specific genes (unit: tpm)

Clone Acc.LeafRootNGSNCANGDNMENPOPFAM Acc.DescriptionE-value
CU40690244 199010101900PF07207Lir14.8e–85
CU40597936 7850894025690
CT84173325 11241120024100PF00101RuBisCO_small2.5e–45
CU40597515 4211278006502230
CT841994914001001800
CU4065213504600000PF01070FMN_dh2.8e–31
CU4059963069027028150PF00430ATP-synt_B3.4e–28
CU4056702653011521423PF00085Thioredoxin7.8e–43
CU40600623370000010
CU406668212631701600
CT8416501997000000PF00112Peptidase_C16e–109
CT84173119420001200PF02507PSI_PsaF0
CT841902148602403100
CU405952125371105250
CU40619912350160000
CU406624101206058035PF05899DUF8612.1e–37
CU40643101890001817
CU405706145615 907018380300PF01439Metallothio_22.7e–32
CU4063300358400131
CT841629217272115736802586PF01124MAPEG3.1e–63
CU4065131823000000PF01439Metallothio_21.6e–34
CU4065760231011000
CU40628129449000140
CT8419661552000000PF00188SCP5.7e–55
CU405942018505000PF00967Barwin3e–84
CU4065205120900000
CU406670018900000PF00280Potato_inhibit1.4e–20
CU406238410987333100PF04398DUF5384.9e–41
CT84187516001623315
CT841950119135763079107190
CT841815107135763087107190
CU40694059681931139340PF02065Melibiase3.5e–13
CU406598565060675716 96500PF00234Tryp_alpha_amyl1.6e–31
CU40653370143046621190PF00234Tryp_alpha_amyl5.5e–33
CU406609000014300
CU406264000023700
CU405759000077900
CU40603814140024700
CU405951025001313470PF01439Metallothio_26.5e–22
CU4066981300000289PF00481PP2C2.4e–14
CU4063511034366648423228

NGS, 3 days—Germinating seed; NCA, 35 days—Callus; NGD, 10 days—Germinating seedlings grown in dark; NME, 60 days—Crown vegetative meristematic tissue; NPO, mature pollen.

List of Oryza rufipogon W1943 tissue-specific genes (unit: tpm) NGS, 3 days—Germinating seed; NCA, 35 days—Callus; NGD, 10 days—Germinating seedlings grown in dark; NME, 60 days—Crown vegetative meristematic tissue; NPO, mature pollen. In the similar restricted conditions as above, there were seven W1943 cDNAs with distinct expression level in leaves exposed to cold, drought or salinity stresses (Table 7). Of the seven cDNAs, four genes were up-regulated by cold stress, two genes were up-regulated by drought and one gene was up-regulated by salinity. It should be pointed out that gene ‘CU405946’ matched to PFAM protein annotated as ‘Dehydrin’. This protein is produced by plants that experience water-stress.[33]
Table 7

List of seven cDNAs preferentially expressed under cold-stress, drought-stress and salinity in leaf (unit: tpm)

Clone Acc.Normal leafNCLNDLNSLPFAM Acc.DescriptionE-value
CU4063109628723255NullNullNull
CT8417819630893257NullNullNull
CT84155810224042365NullNullNull
CU40655411568083NullNullNull
CT8415763030343568PF00234Tryp_alpha_amyl4.6e–33
CU4064850014770NullNullNull
CU40594601130591PF00257Dehydrin2.2e–54

NCL, 14 days—Young leaves stressed in 4°C cold for 24 h; NDL, 14 days—Young leaves stressed in drought for 5 days; NSL, 14 days—Young leaves stressed in 250 mM NaCl for 24 h.

List of seven cDNAs preferentially expressed under cold-stress, drought-stress and salinity in leaf (unit: tpm) NCL, 14 days—Young leaves stressed in 4°C cold for 24 h; NDL, 14 days—Young leaves stressed in drought for 5 days; NSL, 14 days—Young leaves stressed in 250 mM NaCl for 24 h.

Conclusions

In this research, we collected and completely sequenced 1888 putative FLcDNAs of wild rice O. rufipogon Griff. W1943. A total of 17 novel rice cDNAs and 41 putative tissue-specific expression genes were identified. The comparative analysis between wild rice and two cultivated rice subspecies indicated that O. rufipogon W1943 had greater similarity to O. sativa ssp. japonica than to ssp. indica cultivars. It is reported that W1943 is primarily distributed in Dongxiang (26°14'N, 116°36'E) of Jiangxi Province in China.[34] It is found to be the northern most distribution of O. rufipogon at present time.[35] Both cultivated rice O. sativa ssp. japonica and indica have distributions in this area. The geological distribution of W1943 can also provide some clues for further analysis between wild and cultivated rices.

Funding

This research was supported by the grants from the Ministry of Science and Technology of China (the China Rice Functional Genomics Programs, 2005CB120805 and 2006AA10A102), the Chinese Academy of Sciences (038019315 and KSCX2-YW-N-024) and the Shanghai Municipal Commission of Science and Technology.
  32 in total

1.  Polymorphism and phylogenetic relationships among species in the genus Oryza as determined by analysis of nuclear RFLPs.

Authors:  Z Y Wang; G Second; S D Tanksley
Journal:  Theor Appl Genet       Date:  1992-03       Impact factor: 5.699

2.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

3.  A cDNA-based comparison of dehydration-induced proteins (dehydrins) in barley and corn.

Authors:  T J Close; A A Kortt; P M Chandler
Journal:  Plant Mol Biol       Date:  1989-07       Impact factor: 4.076

4.  Genome-wide searching of single-nucleotide polymorphisms among eight distantly and closely related rice cultivars (Oryza sativa L.) and a wild accession (Oryza rufipogon Griff.).

Authors:  Lisa Monna; Rieko Ohta; Haruka Masuda; Akiko Koike; Yuzo Minobe
Journal:  DNA Res       Date:  2006-03-29       Impact factor: 4.458

5.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).

Authors:  Stephen A Goff; Darrell Ricke; Tien-Hung Lan; Gernot Presting; Ronglin Wang; Molly Dunn; Jane Glazebrook; Allen Sessions; Paul Oeller; Hemant Varma; David Hadley; Don Hutchison; Chris Martin; Fumiaki Katagiri; B Markus Lange; Todd Moughamer; Yu Xia; Paul Budworth; Jingping Zhong; Trini Miguel; Uta Paszkowski; Shiping Zhang; Michelle Colbert; Wei-lin Sun; Lili Chen; Bret Cooper; Sylvia Park; Todd Charles Wood; Long Mao; Peter Quail; Rod Wing; Ralph Dean; Yeisoo Yu; Andrey Zharkikh; Richard Shen; Sudhir Sahasrabudhe; Alun Thomas; Rob Cannings; Alexander Gutin; Dmitry Pruss; Julia Reid; Sean Tavtigian; Jeff Mitchell; Glenn Eldredge; Terri Scholl; Rose Mary Miller; Satish Bhatnagar; Nils Adey; Todd Rubano; Nadeem Tusneem; Rosann Robinson; Jane Feldhaus; Teresita Macalma; Arnold Oliphant; Steven Briggs
Journal:  Science       Date:  2002-04-05       Impact factor: 47.728

6.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper.

Authors:  P Carninci; C Kvam; A Kitamura; T Ohsumi; Y Okazaki; M Itoh; M Kamiya; K Shibata; N Sasaki; M Izawa; M Muramatsu; Y Hayashizaki; C Schneider
Journal:  Genomics       Date:  1996-11-01       Impact factor: 5.736

7.  Distinct class of putative "non-conserved" promoters in humans: comparative studies of alternative promoters of human and mouse genes.

Authors:  Katsuki Tsuritani; Takuma Irie; Riu Yamashita; Yuta Sakakibara; Hiroyuki Wakaguri; Akinori Kanai; Junko Mizushima-Sugano; Sumio Sugano; Kenta Nakai; Yutaka Suzuki
Journal:  Genome Res       Date:  2007-06-13       Impact factor: 9.043

8.  A collection of 10,096 indica rice full-length cDNAs reveals highly expressed sequence divergence between Oryza sativa indica and japonica subspecies.

Authors:  Xiaohui Liu; Tingting Lu; Shuliang Yu; Ying Li; Yuchen Huang; Tao Huang; Lei Zhang; Jingjie Zhu; Qiang Zhao; Danlin Fan; Jie Mu; Yingying Shangguan; Qi Feng; Jianping Guan; Kai Ying; Yu Zhang; Zhixin Lin; Zongxiu Sun; Qian Qian; Yuping Lu; Bin Han
Journal:  Plant Mol Biol       Date:  2007-05-24       Impact factor: 4.076

9.  Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA.

Authors:  Mayumi Nakano; Kan Nobuta; Kalyan Vemaraju; Shivakundan Singh Tej; Jeremy W Skogen; Blake C Meyers
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  Rfam: annotating non-coding RNAs in complete genomes.

Authors:  Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  13 in total

Review 1.  Genomics and bioinformatics resources for crop improvement.

Authors:  Keiichi Mochida; Kazuo Shinozaki
Journal:  Plant Cell Physiol       Date:  2010-03-05       Impact factor: 4.927

2.  Molecular cloning of Sdr4, a regulator involved in seed dormancy and domestication of rice.

Authors:  Kazuhiko Sugimoto; Yoshinobu Takeuchi; Kaworu Ebana; Akio Miyao; Hirohiko Hirochika; Naho Hara; Kanako Ishiyama; Masatomo Kobayashi; Yoshinori Ban; Tsukaho Hattori; Masahiro Yano
Journal:  Proc Natl Acad Sci U S A       Date:  2010-03-10       Impact factor: 11.205

3.  Massive gene losses in Asian cultivated rice unveiled by comparative genome analysis.

Authors:  Hiroaki Sakai; Takeshi Itoh
Journal:  BMC Genomics       Date:  2010-02-19       Impact factor: 3.969

4.  Efficient plant gene identification based on interspecies mapping of full-length cDNAs.

Authors:  Naoki Amano; Tsuyoshi Tanaka; Hisataka Numa; Hiroaki Sakai; Takeshi Itoh
Journal:  DNA Res       Date:  2010-07-28       Impact factor: 4.458

5.  Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

Authors:  Hiroaki Sakai; Sung Shin Lee; Tsuyoshi Tanaka; Hisataka Numa; Jungsok Kim; Yoshihiro Kawahara; Hironobu Wakimoto; Ching-chia Yang; Masao Iwamoto; Takashi Abe; Yuko Yamada; Akira Muto; Hachiro Inokuchi; Toshimichi Ikemura; Takashi Matsumoto; Takuji Sasaki; Takeshi Itoh
Journal:  Plant Cell Physiol       Date:  2013-01-07       Impact factor: 4.927

6.  Global characterization of the root transcriptome of a wild species of rice, Oryza longistaminata, by deep sequencing.

Authors:  Haiyuan Yang; Liwei Hu; Thomas Hurek; Barbara Reinhold-Hurek
Journal:  BMC Genomics       Date:  2010-12-15       Impact factor: 3.969

7.  De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes.

Authors:  Bin He; Yinghong Gu; Xiang Tao; Xiaojie Cheng; Changhe Wei; Jian Fu; Zaiquan Cheng; Yizheng Zhang
Journal:  Int J Mol Sci       Date:  2015-12-10       Impact factor: 5.923

8.  Exploring the rice dispensable genome using a metagenome-like assembly strategy.

Authors:  Wen Yao; Guangwei Li; Hu Zhao; Gongwei Wang; Xingming Lian; Weibo Xie
Journal:  Genome Biol       Date:  2015-09-07       Impact factor: 13.583

9.  Ascribing Functions to Genes: Journey Towards Genetic Improvement of Rice Via Functional Genomics.

Authors:  Ananda Mustafiz; Sumita Kumari; Ratna Karan
Journal:  Curr Genomics       Date:  2016-06       Impact factor: 2.236

10.  Transcriptomic Analysis and the Expression of Disease-Resistant Genes in Oryza meyeriana under Native Condition.

Authors:  Bin He; Xiang Tao; Yinghong Gu; Changhe Wei; Xiaojie Cheng; Suqin Xiao; Zaiquan Cheng; Yizheng Zhang
Journal:  PLoS One       Date:  2015-12-07       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.