Jinquan Chao1, Yueyi Chen1, Shaohua Wu1, Wei-Min Tian1. 1. Ministry of Agriculture Key Laboratory of Biology and Genetic Resources of Rubber Tree/State Key Laboratory Breeding Base of Cultivation and Physiology for Tropical Crops, Rubber Research Institute, Chinese Academy of Tropical Agricultural Sciences, Danzhou, Hainan 571737, PR China.
Seven-year-old virgin trees of rubber tree clone CATAS8-79 and PR107 were grown at the Experimental Station of the Rubber Research Institute of the Chinese Academy of Tropical Agricultural Sciences in Danzhou city, Hainan province. Virgin trees with the same circumference were selected for this study. For RNA-Seq, latex from five individual trees by the first tapping was pooled for each clone. The samples were immediately stored at − 80 °C until RNA extraction. For real time-PCR and determination of physiological parameters, latex was individually collected from another batch of five trees for each clone upon the first, second, third and forth tapping, respectively. All the selected virgin trees were tapped with a tapping system of S/2, d/2 (a half spiral pattern, every two days) at 6:00 am in August, 2013.
RNA isolation and sequencing
Total latex RNA was extracted as described [1] and RNA integrity was evaluated by NanoDrop (Thermo Scientific Inc., USA). The double strand cDNA was synthesized using SuperScript® Double-Stranded cDNA Synthesis Kit (Invitrogen Inc., USA), and purified and added single nucleotide A (adenine) to the end with QiaQuick PCR extraction kit. Finally, sequencing adaptors were ligated to the cDNA fragments. The required fragments were purified by 2% agarose gel electrophoresis and enriched by PCR amplification. The library products were sequenced via Illumina HiSeq™ 2000 by Beijing Genomics Institute (Shenzhen, China). The original image datasets was transferred into sequence datasets by base calling. Clean reads were obtained by removing adaptor sequence, low quality sequences, empty tags, low complexity, and tags with only one copy. Finally, 26,266,670 and 26,266,670 clean reads were generated in CATAS8-79 and PR107 pool, respectively.
Transcriptome de novo assembly, annotation and classification
Transcriptome de novo assembly was carried out using a de Bruijn graph and the SOAPdenovo as previously described [2]. Under a certain overlap length (k-mer = 29), SOAPdenovo combined overlapping reads into contigs. Adjacent contigs were constructed into scaffolds by read mate pairs. Within the scaffold, the connected contigs used ‘N’ to represent unknown sequences and insert size information. Finally, paired-end information was used to fill the gap of scaffolds to obtain the extended sequences with fewer Ns, which were defined as unigenes for further analysis. The data for contig and unigene were listed in Table 1.
Table 1
Statistics of DGE sequencing for CATAS8-79 and PR107 libraries.
Sample
Number/%
100–500 nt
500–1000 nt
1000–1500 nt
1500–2000 nt
> 2000 nt
N50
Mean (bp)
No.
Length (bp)
VT879 — contig
Number
296,736
10,351
1649
531
261
133
142
305,004
43,311,050
Percent
95.87%
3.34%
0.53%
0.17%
0.08%
VT107 — contig
Number
308,262
9932
1396
360
129
124
137
315,643
43,387,443
Percent
96.31%
3.10%
0.44%
0.11%
0.04%
VT879 — unigene
Number
41,457
8474
2224
824
592
509
421
53,571
22,572,807
Percent
77.39%
15.82%
4.15%
1.54%
1.11%
VT107 — unigene
Number
46,999
8243
1744
539
281
427
375
57,806
21,689,990
Percent
81.30%
14.26%
3.02%
0.93%
0.49%
All — unigene
Number
35,195
11,019
3323
1277
1015
640
526
51,829
27,237,155
Percent
67.91%
21.26%
6.41%
2.46%
1.96%
All unigenes were used for BLAST searches (E-value < 1E − 5) against databases as NCBI Nr (http://www.ncbi.nlm.nih.gov/), Swissprot (http://www.expasy.ch/sprot/), KEGG (http://www.genome.jp/kegg/) and COG (http://www.ncbi.nlm.nih.gov/cog/). The best aligning results were chosen for unigene annotation. The aligning results were selected with an order of Nr, Swiss-Prot, KEGG and COG. To classify the unigenes, the Blast2GO program was used to get GO annotation based on molecular function, biological process and cellular component. All unigenes were also aligned to the COG database to predict possible functions and KEGG pathway database to perform pathway assignments.
Digital gene expression analysis
A rigorous algorithm was developed to identify differentially expressed genes between two different DGE libraries (CATAS8-79 versus PR107). Raw clean tags in each library were normalized to Tags Per Million (TPM) to obtain normalized gene expression level. Differential digital gene expression was deemed with FDR value ≤ 0.001 and | log2 Ratio | ≥ 1 in sequence counts across libraries. “Up-regulated” means the level of gene transcripts were higher in PR107 whereas “down-regulated” means the level of gene transcripts were higher in CATAS8-79. Based on the limit role listed above, a total of 6726 unigenes with differential expression patterns were detected between CATAS8-79 and PR107.
Conflict of interest
The authors declare that they have no competing interests.
Specifications
Organism/cell line/tissue
Genome or genomic data origin
Sex
Male or female if applicable
Sequencer or array type
Type of sequencer
Data format
Raw or analyzed
Experimental factors
i.e. tumor vs. normal, any pretreatment of samples
Experimental features
Very brief experimental description
Consent
Level of consent allowed for reuse if applicable
Sample source location
City, Country of model organism and/or Latitude & Longitude (& GPS coordinates) for collected samples if applicable