| Literature DB >> 25122667 |
Upendra Kumar Devisetty1, Michael F Covington1, An V Tat1, Saradadevi Lekkala1, Julin N Maloof2.
Abstract
The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes-R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)-using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/.Entities:
Keywords: Brassica rapa; RNA-Seq; SNPs; genome annotation; transcriptome
Mesh:
Year: 2014 PMID: 25122667 PMCID: PMC4232532 DOI: 10.1534/g3.114.012526
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
List of tissue samples collected from B. rapa genotypes R500 and IMB211 across growth chamber, greenhouse, and field conditions
| Tissue | Location | Treatment | Genotype | No. of Replicates |
|---|---|---|---|---|
| GC pool | ||||
| Apical Meristem | GC | Shade | IMB211 | 2 |
| GC | Shade | R500 | 2 | |
| GC | Sun | R500 | 1 | |
| Leaf | GC | Shade | IMB211 | 3 |
| GC | Sun | IMB211 | 3 | |
| GC | Shade | R500 | 3 | |
| GC | Sun | R500 | 3 | |
| Floral Meristem | GC | Shade | R500 | 2 |
| Internode | GC | Shade | IMB211 | 2 |
| GC | Sun | IMB211 | 3 | |
| GC | Shade | R500 | 3 | |
| GC | Sun | R500 | 3 | |
| Seedling | GC | Shade | IMB211 | 3 |
| GC | Sun | IMB211 | 3 | |
| GC | Shade | R500 | 3 | |
| GC | Sun | R500 | 3 | |
| Silique | GC | Shade | IMB211 | 3 |
| GC | Sun | IMB211 | 3 | |
| GC | Shade | R500 | 3 | |
| GC | Sun | R500 | 3 | |
| Root | GC | Sun | IMB211 | 3 |
| GC | Shade | IMB211 | 3 | |
| GC | Sun | R500 | 3 | |
| GC | Shade | R500 | 3 | |
| GH pool | ||||
| Leaf | F | IMB211 | 3 | |
| F | R500 | 3 | ||
| GH | DP | IMB211 | 3 | |
| GH | NDP | IMB211 | 3 | |
| GH | DP | R500 | 3 | |
| GH | NDP | R500 | 3 | |
| Internode | GH | DP | IMB211 | 3 |
| GH | NDP | IMB211 | 3 | |
| GH | DP | R500 | 3 | |
| GH | NDP | R500 | 3 | |
| Petiole | GH | DP | IMB211 | 3 |
| GH | NDP | IMB211 | 3 | |
| GH | DP | R500 | 3 | |
| GH | NDP | R500 | 3 | |
| Silique | F | IMB211 | 3 | |
| F | R500 | 3 | ||
| GH | DP | IMB211 | 3 | |
| GH | NDP | IMB211 | 3 | |
| GH | DP | R500 | 3 | |
| GH | NDP | R500 | 3 | |
GC, growth chamber; GH, greenhouse; F, field; DP, dense planting; NDP, nondense planting.
Summary of RNA-Seq data obtained from B. rapa deep transcriptome using Illumina GAIIx sequencing
| Pool Name/Run | Pool Number | No. of Tissues | Total No. of Reads | Fastq File Size (in GB) | Total No. of Reads After Quality Control | Average Read Length (bp) |
|---|---|---|---|---|---|---|
| s_6_1 | 1 | 66 | 218,642,024 | 58 | 162,403,886 | 100 |
| s_7_1 | 2 | 60 | 193,176,106 | 50 | 148,346,164 | 100 |
| GH_s_1 | 1 | 60 | 184,259,308 | 48 | 162,260,599 | 89 |
| GH_s_2 | 1 | 60 | 181,519,358 | 48 | 160,143,988 | 89 |
| GH_s_3 | 1 | 60 | 180,575,544 | 48 | 159,927,822 | 100 |
| GC_s_1 | 2 | 66 | 218,352,856 | 56 | 183,452,891 | 100 |
| GC_s_2 | 2 | 66 | 213,474,480 | 56 | 172,832,712 | 100 |
| GC_s_3 | 2 | 66 | 96,917,806 | 26 | 75,331,287 | 100 |
| GC_s_4 | 2 | 66 | 212,749,370 | 56 | 180,334,866 | 100 |
| GC_s_5 | 2 | 66 | 223,341,414 | 60 | 144,237,478 | 100 |
| GC_s_6 | 2 | 66 | 201,966,154 | 54 | 135,778,620 | 100 |
| GC_s_7 | 2 | 66 | 201,305,348 | 54 | 142,418,270 | 100 |
| GH_s_4 | 1 | 60 | 166,999,240 | 44 | 117,413,176 | 100 |
| GH_s_5 | 1 | 60 | 213,801,168 | 58 | 136,508,912 | 100 |
| GH_s_6 | 1 | 60 | 216,036,902 | 58 | 142,306,342 | 100 |
| GH_s_7 | 1 | 60 | 203,973,230 | 54 | 136,831,614 | 100 |
| GCGH_s_1 | R | 8 | 226,938,148 | 60 | 189,740,545 | 100 |
Pool 1 includes all 66 different tissues collected from growth chamber. Pool 2 includes all 60 different tissues collected from greenhouse and field. Pool R includes all eight tissues that failed in pool 1 and pool 2.
Summary of total number of SNPs detected and annotated to different regions of the genome between Chiifu and two genotypes of B. rapa
| Total No. of Annotated SNPs | SNP Rate | Total Number of Exonic SNPs | Total No. of Intronic SNPs | Total No. of Intergenic SNPs | Total No. of Nonsynonymous Coding SNPs | |
|---|---|---|---|---|---|---|
| R500 | 330,995 | 0.50 | 202,295 | 48,210 | 80,833 | 66,327 |
| R500 | 639,788 | 0.83 | 358,391 | 124,429 | 157,726 | 119,222 |
| IMB211 | 595,619 | 0.81 | 338,749 | 110,437 | 147,212 | 111,719 |
Figure 1SNP annotation of B. rapa using snpeff. (A) SNP rate (total number of SNPs/100 bp of gene) across 10 chromosomes. Blue line indicates rolling mean across 25 genes. (B) The distribution of SNPs at different codon positions. (C) KaKs box plot of all chromosomes. Asterisks indicate significance at P ≤ 0.05 (permutation testing).
Figure 2Pipeline illustrating the overall transcriptome assembly and annotation of Brassica rapa genotype R500.
Summary statistics for individual and merged Velvet-Oases assemblies
| k-mers | No. of Transcripts | Total Bases (bp) | Average Transcript Length (bp) | Maximum Transcript Length (bp) | Minimum Transcript Length (bp) | N50 | N90 | No. of Transcripts in N50 |
|---|---|---|---|---|---|---|---|---|
| 31 | 227,834 | 233,436,468 | 1024 | 23,632 | 100 | 1959 | 489 | 36,982 |
| 35 | 210,673 | 231,120,029 | 1097 | 25,926 | 100 | 1987 | 551 | 36,312 |
| 41 | 182,288 | 222,326,942 | 1219 | 22,553 | 100 | 2022 | 640 | 34,524 |
| 45 | 169,361 | 213,517,894 | 1260 | 22,480 | 100 | 1989 | 665 | 33,885 |
| 51 | 151,970 | 196,156,152 | 1290 | 23,619 | 100 | 1898 | 672 | 32,425 |
| 55 | 140,093 | 191,427,390 | 1366 | 23,673 | 100 | 1976 | 751 | 30,947 |
| 61 | 123,880 | 167,489,435 | 1352 | 16,491 | 100 | 1908 | 734 | 28,093 |
| Merged_27 | 577,900 | 895,807,087 | 1550 | 26,098 | 100 | 2214 | 833 | 128,986 |
| Merged_55 | 601,915 | 935,108,204 | 1553 | 26,060 | 100 | 2218 | 855 | 134,411 |
Downstream processing of Velvet-Oases and Trinity transcripts after initial assembly
| Velvet-Oases | Trinity | |
|---|---|---|
| a) Initial assembly transcripts | 43,816 | 39,084 |
| b) No. of novel transcripts remained after removing blast hits to | 14,540 | 5464 |
| c) No. of novel transcripts from (b) that have | 11,182 | 3789 |
| d) Number of novel transcripts from (c) remaining after chimera removal | 9448 | 2377 |
| e) Number of novel transcripts from (b) that do not have a | 3358 | 1675 |
| f) Number of novel transcripts from (e) remaining after blasting to NCBI nr database | 1218 | |
| Final number of novel transcripts (d) and (f) combined | 10,706 | 4052 |
The number of transcripts assembled with Cufflinks and the percentage they represent in the assembly after Cuffcompare analysis
| Class Code | No. of Transcripts | % |
|---|---|---|
| = | 38,126 | 49.75 |
| C | 14 | 0.018 |
| E | 3149 | 4.11 |
| I | 527 | 0.69 |
| J | 23,008 | 30.02 |
| O | 2596 | 3.39 |
| P | 1515 | 1.98 |
| S | 229 | 0.30 |
| U | 6708 | 8.75 |
| X | 768 | 1.00 |
| 76,640 | 100.00 |
Class codes described by Cuffcompare: =, exactly equal to the reference annotation; c, contained in the reference annotation; e, possible pre-mRNA molecules; I, an exon falling into a intron of the reference; j, new isoforms; o, unknown generic overlap with reference; p, possible polymerase run-on fragment; u, unknown intergenic transcript.
Comparison of assembly statistics from de novo (Velvet-Oases and Trinity) and reference (TopHat-Cufflinks) assemblers
| Velvet-Oases | Trinity | TopHat-Cufflinks | |
|---|---|---|---|
| Total no. of reads | 182,386,000 | 182,386,000 | 157,164,008 |
| No. of initial transcripts | 601,915 | 158,863 | 75,237 |
| No. of transcripts after removing isoforms | 43,816 | 39,084 | 53,632 |
| Average size of transcript | 1554 | 1112 | 1310 |
| Maximum transcript length | 26,060 | 22,887 | 16,681 |
| Minimum transcript length | 100 | 201 | 94 |
| N50 | 2218 | 1863 | 1677 |
| No. of transcripts in N50 | 134,411 | 28,901 | 18,762 |
| % of transcripts annotated | 67 | 84 | 87 |
| No. of novel transcripts detected | 14,540 | 5464 | 6700 |
Figure 3Venn diagrams showing unique and shared novel transcripts detected between (A) Velvet-Oases, Trinity, and TopHat-Cufflinks assemblers, (B) de novo (Velvet-Oases and Trinity), and reference-based (TopHat-Cufflinks) pipelines.
Comparison of de novo–based and reference-based assembly methods on the final output (novel transcripts detected)
| Reference-Based | ||
|---|---|---|
| No. of novel transcripts detected | 14,758 | 6700 |
| Maximum size of the transcript (bp) | 7827 | 6897 |
| N50 (bp) | 640 | 1084 |
Comparison of original and updated B. rapa annotations using PASA
| First Update | Second Update | Third Update | Total | |
|---|---|---|---|---|
| No. of gene models updated | 23,132 | 597 | 25 | 23,754 |
| No. of alternative splice isoforms | 15,733 | 205 | 833 | 16,771 |
| No. of annotated proteins changed | 3505 | 132 | 18 | 3655 |
RT-PCR validations of assembled transcripts from different assembly methods
| Assembly Type | Total No. of Genes Tested | No. of Genes Validated | Percentage of Validation |
|---|---|---|---|
| Velvet-Oases | 20 | 13 | 65 |
| Trinity | 20 | 13 | 65 |
| TopHat-Cufflinks “u” transcripts | 20 | 14 | 70 |
| TopHat-Cufflinks “o” transcripts | 10 | 6 | 60 |
Summary of in silico validation of novel transcripts
| R500 and Chiifu Transcript Structure | ||||
|---|---|---|---|---|
| R500 = Chiifu | R500 Different from Chiifu | Uncertain | ||
| Matches both | 38 | |||
| Matches R500 | 3 | |||
| Matches Chiifu | 1 | |||
| Matches neither | 11 | |||
| Uncertain | 1 | 2 | 4 | |
Transcripts were classified as uncertain when alternative splicing or low coverage precluded definitive assignment.
Summary of in silico validation of PASA updated transcript annotations
| R500 and Chiifu Transcript Structure | |||
|---|---|---|---|
| R500 = Chiifu | R500 Different from Chiifu | ||
| Updated annotation correct for both | 50 | ||
| Annotation matches genotype | 2 | ||
| Both annotations wrong | 2 | 1 | |
| Uncertain | 4 | 1 | |
Transcripts were classified as uncertain when alternative splicing or low coverage precluded definitive assignment.
Figure 4Histogram of level 2 GO term assignment of B. rapa re-annotated gene models. Results are summarized for three main GO categories: biological process (P), molecular function (F), and cellular component (C).