| Literature DB >> 25623424 |
Omar Darwish1, Rachel Shahan2, Zhongchi Liu3, Janet P Slovin4, Nadim W Alkharouf5.
Abstract
BACKGROUND: Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1.Entities:
Mesh:
Year: 2015 PMID: 25623424 PMCID: PMC4318131 DOI: 10.1186/s12864-015-1221-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Diagram illustrating two alternative assembly pipelines. (A) De novo assembly and alignment pipeline using Trinity and PASA. (B) Reference guided assembly pipeline using TopHat and Cufflinks.
Statistical summary of the de novo and reference based assembly results
|
|
|
|
|
|
|---|---|---|---|---|
| Cortex1 | 28,688,674 | 54,198 | 33,909 | 29,533 |
| Cortex2 | 34,602,978 | 51,586 | 29,001 | 25,339 |
| Cortex3 | 29,332,631 | 49,470 | 26,746 | 23,988 |
| Cortex4 | 30,473,993 | 49,434 | 26,909 | 23,799 |
| Cortex5 | 27,773,042 | 49,387 | 26,249 | 23,456 |
| Embryo3 | 13,580,328 | 46,827 | 21,362 | 19,547 |
| Embryo4 | 22,817,240 | 52,203 | 28,728 | 25,371 |
| Embryo5 | 20,596,516 | 50,390 | 26,655 | 24,038 |
| Ghost3 | 20,131,210 | 53,322 | 31,702 | 28,050 |
| Ghost4 | 24,729,472 | 54,355 | 33,985 | 29,612 |
| Ghost5 | 21,893,808 | 52,645 | 30,382 | 27,037 |
| Ovule1 | 28,983,033 | 53,794 | 34,796 | 30,060 |
| Seed2 | 31,044,314 | 54,353 | 31,669 | 28,044 |
| Pith1 | 26,828,710 | 53,968 | 33,015 | 28,971 |
| Pith2 | 29,684,268 | 51,985 | 28,916 | 25,497 |
| Pith3 | 32,039,484 | 50,854 | 28,047 | 24,757 |
| Pith4 | 31,461,556 | 50,342 | 26,944 | 23,872 |
| Pith5 | 35,919,060 | 51,016 | 27,727 | 24,411 |
| Wall1 | 25,996,860 | 53,102 | 31,968 | 28,214 |
| Wall2 | 29,527,238 | 53,552 | 32,761 | 28,780 |
| Wall3 | 19,130,041 | 51,998 | 30,787 | 27,384 |
| Wall4 | 27,737,593 | 52,606 | 31,071 | 27,423 |
| Wall5 | 34,216,551 | 52,882 | 32,980 | 28,694 |
| Leaf1 | 30,740,916 | 54,993 | 36,502 | 31,610 |
| Seedling1 | 27,518,958 | 53,477 | 31,589 | 28,065 |
| Total | 685,448,474 | 1,302,739 | 754,400 | 665,552 |
*Sample name indicates tissue type and the number indicates stage (see Kang et al. [2]). Each sample reflects averaged data from two biological replicates. Sample descriptions are available at http://bioinformatics.towson.edu/strawberry/newpage/Tissue_Description.aspx.
Figure 2Summary of overall bioinformatics pipeline for genome re-annotation. Evidence data including de novo assembled transcripts from all 50 samples, reference-guided assembled transcripts from all 50 samples, gene models generated using ab initio algorithm based tools (Augustus, SNAP and GeneMark), the first generation F. vesca gene predictions and plant reference proteins were passed to the MAKER pipeline to generate TowU_Fve annotation.
Statistical comparisons between first generation annotation and TowU_Fve annotation
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| Version 1.1 Annotation | 1,618 | 7,637 | 1,779,011 | 232.95 |
| TowU_Fve Annotation | 1,729 | 8,172 | 1,887,397 | 230.96 | |
|
| Version 1.1 Annotation | 3,440 | 17,714 | 4,064,386 | 229.44 |
| TowU_Fve Annotation | 3,367 | 18,654 | 4,232,525 | 226.9 | |
|
| Version 1.1 Annotation | 4,051 | 21,502 | 4,850,631 | 225.59 |
| TowU_Fve Annotation | 4,140 | 22,637 | 5,076,584 | 224.26 | |
|
| Version 1.1 Annotation | 4,920 | 24,334 | 5,804,654 | 238.54 |
| TowU_Fve Annotation | 5,054 | 25,892 | 6,259,234 | 241.74 | |
|
| Version 1.1 Annotation | 3,837 | 19,431 | 4,369,641 | 224.88 |
| TowU_Fve Annotation | 3,956 | 20,445 | 4,568,144 | 223.44 | |
|
| Version 1.1 Annotation | 4,655 | 23,920 | 5,587,403 | 233.59 |
| TowU_Fve Annotation | 4,882 | 25,609 | 5,944,185 | 232.11 | |
|
| Version 1.1 Annotation | 6,453 | 32,868 | 7,651,854 | 232.81 |
| TowU_Fve Annotation | 6,547 | 34,777 | 8,067,688 | 231.98 | |
|
| Version 1.1 Annotation | 3,857 | 19,864 | 4,642,955 | 233.74 |
| TowU_Fve Annotation | 3,821 | 20,223 | 4,780,057 | 236.37 | |
|
| Version 1.1 Annotation |
|
|
|
|
| TowU_Fve Annotation |
|
|
|
|
Statistical summary of the newly predicted gene models by TowU_Fve annotation
|
|
|
|
|
|
|---|---|---|---|---|
|
| 111 | 268 | 49,025 | 182.93 |
|
| 78 | 247 | 39,746 | 160.92 |
|
| 262 | 670 | 113,528 | 169.45 |
|
| 345 | 794 | 144,502 | 181.99 |
|
| 247 | 596 | 105,811 | 177.54 |
|
| 301 | 703 | 123,173 | 175.21 |
|
| 412 | 930 | 158,501 | 170.43 |
|
| 530 | 1,798 | 442,946 | 246.36 |
|
|
|
|
|
|
Figure 3Comparisons of version 1.1 annotation with the TowU_Fve annotation. (A) The version1.1 (peach color) annotation shows two exons connected by an intron. However, leaf RNASeq reads align to the intronic region. The TowU_Fve annotation (red) merges two existing exons by including the intronic region. (B) The version1.1 (peach color) annotation is missing the last exon revealed by cortex tissue RNASeq reads alignments. The TowU_Fve annotation (red) shows the newly predicted gene structure with the addition of the distal exon. (C) The first generation annotation (peach color) shows an absence of a gene between 5613k and 5616k, while the aligned reads from leaf tissue revealed the existence of an expressed gene at that site. The TowU_Fve annotation (red) shows a newly predicted gene (an alpha/beta-hydrolase domain-containing protein) between gene35181 and gene12565.
Figure 4cDNA sequences support the re-annotation of gene11268. (A) F. vesca gene11268 annotation predicted by the first generation annotation at GDR. Colored boxes denote exons and gray lines denote introns. (B) Re-annotation of F. vesca gene11268 revealed the presence of additional exons. RNA-Seq reads from stage 7_8 anther are represented as red rectangles. Gray peaks below the red rectangles represent the abundance of additional reads beyond those shown. (C) The TowU_Fve predicted structure of F. vesca gene11268 after splicing is supported by the sequence of cDNA clones from YW5AF7 anther mRNA. Sequences of two such cDNA clones were identical and yielded the TowU_Fve predicted gene structure as shown.