| Literature DB >> 24484543 |
Keng-See Chow1, Ahmad-Kamal Ghazali, Chee-Choong Hoh, Zainorlina Mohd-Zainuddin.
Abstract
BACKGROUND: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24484543 PMCID: PMC3926681 DOI: 10.1186/1756-0500-7-69
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Publications containing applications of second generation sequencing in rubber tree transcriptome analysis
| 1 | Xia et al. [ | 2011 | PE-RNA-Seq (Illumina) | Latex and leaf combined; clone RY7-33-97 (12 mil. reads or 1 Gb approx.) |
| 2 | Pootakham et al. [ | 2011 | 454 pyrosequencing (Roche) | Information not available |
| 3 | Triwitayakorn et al. [ | 2011 | 454 pyrosequencing (Roche) | Shoot apical meristem; clone RRIM 600 (2 mil. reads or 676.5 Mb approx.) |
| 4 | Chow et al. [ | 2012 | RNA-Seq (Illumina) | Latex; clone RRIM 600 (10 mil. reads or 350 Mb approx.) |
| 5 | Li et al. [ | 2012 | PE-RNA-Seq (Illumina) | Bark; clone RY7-33-97 (30 mil. reads or 3 Gb approx.) |
| 6 | Duan et al. [ | 2013 | 454 pyrosequencing (Roche) | Leaf, bark, latex, root, embryogenic tissues; clone PB 260(0.5 mil. reads or 200 Mb approx. per tissue) |
| 7 | Rahman et al. [ | 2013 | PE-RNAseq (Illumina); | Leaf; clone RRIM 600 (4.89 Gb); |
| 454 pyrosequencing (Roche) | Leaf; clone RRIM 600 (1,085 Mb) |
Sequencing depth, tissue types and the tree clones in each project are indicated.
Quality processing of reads from three tissue libraries
| Raw reads | 50,384,572 (100%) | 5,038,457,200 (100%) |
| Clean reads | 49,393,389 (98.03%) | 4,709,104,798 (93.46%) |
| Paired reads | 48,650,932 (96.56%) | 4,647,858,661 (92.25%) |
| Orphan reads (single end) | 742,457 (1.47%) | 61,246,137 (1.21%) |
| Raw reads | 49,578,322 (100%) | 4,957,832,200 (100%) |
| Clean reads | 47,662,360 (96.14%) | 4,512,413,782 (91.02%) |
| Paired reads | 46,062,766 (92.90%) | 4,373,106,379 (88.21%) |
| Orphan reads (single end) | 1,599,594 (3.23%) | 139,307,403 (2.81%) |
| Raw reads | 169,887,626 (100%) | 16,988,762,600 (100%) |
| Clean reads | 166,258,828 (97.86%) | 15,983,753,737 (94.08%) |
| Paired reads | 163,316,702 (96.13%) | 15,726,859,825 (92.57%) |
| Orphan reads (single end) | 2,942,126 (1.73%) | 256,893,912 (1.51%) |
Figure 1Size distributions of clean paired reads from latex, leaf and bark libraries.
Statistics of incremental bark assemblies across k-mers
| 51 | 1,389 | 68,942 | 1,734 | 102,352 | 1,741 | 131,979 | 1,695 | 170,838 | 1,648 | 193,885 | 1,599 | 224,292 | 1,542 | 254,026 |
| 53 | 1,375 | 64,265 | 1,763 | 94,288 | 1,797 | 120,998 | 1,778 | 157,127 | 1,741 | 178,204 | 1,701 | 206,355 | 1,654 | 234,717 |
| 55 | 1,353 | 59,325 | 1,783 | 86,117 | 1,832 | 110,086 | 1,834 | 142,292 | 1,813 | 161,486 | 1,798 | 186,542 | 1,758 | 212,283 |
| 57 | 1,343 | 54,670 | 1,798 | 78,651 | 1,852 | 100,131 | 1,872 | 129,469 | 1,864 | 146,550 | 1,861 | 169,917 | 1,828 | 192,775 |
| 59 | 1,315 | 50,440 | 1,799 | 72,051 | 1,876 | 91,351 | 1,905 | 117,384 | 1,901 | 133,175 | 1,903 | 154,642 | 1,880 | 176,018 |
| 61 | 1,288 | 46,261 | 65,903 | 1,891 | 83,027 | 1,926 | 106,569 | 1,933 | 120,901 | 1,940 | 139,498 | 1,928 | 159,832 | |
| 63 | 1,255 | 42,626 | 1,791 | 60,666 | 1,889 | 75,893 | 1,959 | 96,676 | 1,959 | 109,522 | 1,969 | 126,478 | 1,956 | 143,857 |
| 65 | 1,225 | 39,223 | 1,782 | 55,645 | 69,159 | 1,966 | 87,780 | 1,980 | 98,782 | 1,989 | 115,028 | 1,988 | 130,546 | |
| 67 | 1,199 | 35,773 | 1,753 | 51,333 | 1,892 | 63,469 | 1,970 | 80,153 | 1,997 | 90,055 | 2,010 | 104,399 | 2,025 | 118,385 |
| 69 | 1,147 | 32,606 | 1,731 | 47,383 | 1,895 | 58,112 | 72,612 | 2,007 | 81,560 | 2,024 | 94,500 | 2,043 | 106,974 | |
| 71 | 1,111 | 29,621 | 1,691 | 43,910 | 1,878 | 53,357 | 1,974 | 66,249 | 74,220 | 2,036 | 85,874 | 2,061 | 96,653 | |
| 73 | 1,068 | 26,674 | 1,652 | 40,633 | 1,846 | 49,412 | 1,967 | 60,561 | 2,010 | 67,593 | 2,040 | 77,667 | 87,612 | |
| 75 | 1,034 | 23,699 | 1,606 | 37,433 | 1,808 | 45,312 | 1,946 | 55,384 | 1,997 | 61,680 | 70,331 | 2,066 | 79,038 | |
| 77 | 982 | 20,746 | 1,550 | 34,531 | 1,770 | 41,621 | 1,923 | 50,636 | 1,967 | 56,233 | 2,022 | 63,767 | 2,057 | 71,504 |
The peak N50 length from each datasize assembly (other than for 1 Gb) is highlighted in bold italics.
Mapping of 255 ORF sequences to transcripts from the optimized 16 Gb bark assembly
| Total queries ( | 255 | |
| Queries with hits to bark transcripts | 250 | 100% |
| Queries with bark transcript hits where ORF coverage ≥ 70% | 200 | 80% |
Figure 2Number of transcripts from the optimized 16 Gb bark assembly with hits to rubber genome scaffolds. Hits are classified into transcript-to-scaffold coverage categories describing the extent of alignment from 10-100%. A total of 84,471 bark transcripts (all categories) were mapped to genome scaffolds. The proportion of hits from 84,471 in each category is shown in brackets.
Figure 3Methodology of the transcript mapping saturation test.
Figure 4Colour matrix representing the transcript mapping saturation test results. An asterisk corresponds to the assembly with the optimized k-mer and transcript N50 length for a particular datasize (1 Gb assembly transcripts were not included since the transcript N50 length was not optimized by any k-mer for this datasize). See Additional file 2: Table S2 for full details of BlastN matches by subsets of bark transcripts to the optimized 16 Gb bark transcriptome.
Statistics of 1, 3 and 5 Gb assemblies of latex and leaf reads
| 51 | 706 | 64,691 | 1016 | 89,882 | 1053 | 108,246 | 544 | 90,842 | 831 | 141,047 | 845 | 170,274 |
| 53 | 716 | 61,113 | 1058 | 85,854 | 1135 | 104,493 | 535 | 84,656 | 888 | 129,418 | 920 | 158,195 |
| 55 | 56,456 | 1093 | 79,303 | 1187 | 95,718 | 523 | 77,815 | 915 | 117,125 | 999 | 141,823 | |
| 57 | 717 | 52,339 | 1102 | 73,610 | 1231 | 88,114 | 517 | 70,653 | 933 | 105,513 | 1049 | 127,329 |
| 59 | 707 | 48,151 | 67,662 | 1266 | 80,620 | 509 | 64,286 | 95,662 | 1079 | 113,686 | ||
| 61 | 683 | 43,857 | 1100 | 62,627 | 73,298 | 500 | 58,367 | 933 | 86,244 | 100,912 | ||
| 63 | 668 | 40,003 | 1090 | 57,318 | 1270 | 67,227 | 490 | 52,086 | 912 | 78,666 | 1084 | 90,894 |
| 65 | 657 | 36,366 | 1067 | 53,173 | 1266 | 61,105 | 478 | 46,805 | 885 | 72,173 | 1059 | 82,529 |
| 67 | 642 | 32,702 | 1041 | 49,168 | 1241 | 55,834 | 466 | 41,495 | 855 | 66,289 | 1026 | 75,410 |
| 69 | 629 | 29,326 | 1001 | 46,360 | 1201 | 52,592 | 451 | 36,757 | 810 | 61,116 | 988 | 68,544 |
| 71 | 610 | 26,079 | 956 | 43,095 | 1151 | 49,302 | 439 | 32,167 | 761 | 56,152 | 944 | 63,072 |
| 73 | 591 | 23,055 | 907 | 40,051 | 1103 | 45,844 | 432 | 27,547 | 713 | 51,241 | 879 | 58,710 |
| 75 | 578 | 20,055 | 850 | 36,849 | 1052 | 42,945 | 427 | 23,246 | 669 | 46,727 | 823 | 53,571 |
| 77 | 567 | 17,326 | 815 | 33,398 | 990 | 39,996 | 427 | 19,079 | 631 | 41,838 | 763 | 49,712 |
The optimized N50 length for each tissue assembly is highlighted in bold italics.