| Literature DB >> 23056003 |
Simon Schliesky1, Udo Gowik, Andreas P M Weber, Andrea Bräutigam.
Abstract
Transcriptomic sequence resources represent invaluable assets for research, in particular for non-model species without a sequenced genome. To date, the Next Generation Sequencing technologies 454/Roche and Illumina have been used to generate transcriptome sequence databases by mRNA-Seq for more than fifty different plant species. While some of the databases were successfully used for downstream applications, such as proteomics, the assembly parameters indicate that the assemblies do not yet accurately reflect the actual plant transcriptomes. Two different assembly strategies have been used, overlap consensus based assemblers for long reads and Eulerian path/de Bruijn graph assembler for short reads. In this review, we discuss the challenges and solutions to the transcriptome assembly problem. A list of quality control parameters and the necessary scripts to produce them are provided.Entities:
Keywords: NGS; RNA-seq; assembly; next generation sequencing; plant; transcriptome
Year: 2012 PMID: 23056003 PMCID: PMC3457010 DOI: 10.3389/fpls.2012.00220
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Plant transcriptome sequencing projects until today (complete table available as Table S1 in Supplementary Material).
| Reference | Plant | Type of reads | |
|---|---|---|---|
| Weber et al. ( | 454 | ||
| Novaes et al. ( | 454 | ||
| Barakat et al. ( | 454 | ||
| Alagna et al. ( | 454 | ||
| Dassanayake et al. ( | 454 | ||
| Wang et al. ( | 454 | ||
| Swarbreck et al. ( | 454 | ||
| Guo et al. ( | 454 | ||
| Riggins et al. ( | 454 | ||
| King et al. ( | 454 | ||
| Hiremath et al. ( | 454 | ||
| Troncoso-Ponce et al. ( | 454 | ||
| Bräutigam et al. ( | 454 | ||
| Cantu et al. ( | 454 | ||
| Dai et al. ( | 454 | ||
| Sun et al. ( | 454 | ||
| Der et al. ( | 454 | ||
| Franssen et al. ( | 454 | ||
| Ibarra-Laclette et al. ( | 454 | ||
| Su et al. ( | 454 | ||
| Pont et al. ( | 454 | ||
| Bleeker et al. ( | 454 | ||
| Blavet et al. ( | Eight | 454 | |
| Villar et al. ( | 454 | ||
| Kaur et al. ( | 454 | ||
| Kalavacharla et al. ( | 454 | ||
| Lu et al. ( | 454 | ||
| Meyer et al. ( | 454 | ||
| Edwards et al. ( | 454 | ||
| Desgagne-Penix et al. ( | 454 | ||
| Angeloni et al. ( | 454 and Illumina | ||
| Garg et al. ( | 454 and Illumina | ||
| Krishnan et al. ( | Illumina | ||
| Mutasa-Göttgens et al. ( | Illumina | ||
| Gruenheit et al. ( | Illumina and Illumina paired end | ||
| Mizrachi et al. ( | Illumina paired | ||
| Barrero et al. ( | Illumina paired | ||
| Xia et al. ( | Illumina paired | ||
| Chibalina and Filatov ( | Illumina paired | ||
| Hao et al. ( | Illumina paired | ||
| Tang et al. ( | Illumina paired | ||
| Wong et al. ( | Illumina paired | ||
| Shi et al. ( | Illumina paired | ||
| Hyun et al. ( | Illumina paired | ||
| Hao et al. ( | Illumina paired | ||
| Huang et al. ( | Illumina paired | ||
| Gahlan et al. ( | Illumina paired | ||
| Zhang et al. ( | Illumina paired | ||
| McKain et al. ( | Different Agavoideae | Illumina paired |
Figure 1Schematic de Bruijn graph of a single transcript; 1 alternative transcription start site .
Quality assessment parameters drawn from transcripts of publicly available genome databases.
| Species | Genome size (Mbases) | Number of transcripts including isoforms | N50 | GC% |
|---|---|---|---|---|
| 120 | 41671 | 1912 | 42.27 | |
| 485 | 41019 | 1482 | 46.28 | |
| 481 | 45033 | 1845 | 42.29 | |
| 950 | 35802 | 1461 | 41.61 | |
| 420 | 66338 | 2295 | 51.30 | |
| 515 | 40599 | 1811 | 52.75 | |
| 2066 | 136770 | 1612 | 51.14 |
Figure 2Workflow scheme for a transcriptome assembly and quality assessment: (I) preprocessing of the raw reads, (II) assembly of processed reads, (III) mappings for annotation and for subsequent quality assessment, (IV) collecting quality information from assembly and mappings, (V) final polishing to create an easy to use, thus easy to share file from the assembly.