| Literature DB >> 26581708 |
Yuichiro Hara1, Kaori Tatsumi2, Michio Yoshida3, Eriko Kajikawa4, Hiroshi Kiyonari5,6, Shigehiro Kuraku7.
Abstract
BACKGROUND: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity.Entities:
Mesh:
Year: 2015 PMID: 26581708 PMCID: PMC4652379 DOI: 10.1186/s12864-015-2007-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Animals used in this study. a Embryo of Madagascar ground gecko at four days before estimated date of oviposition (−4 days post oviposition, −4 dpo). b 9 dpo embryo. c 30 dpo embryo. Scale bars, 2 mm. d Molecular phylogenetic relationship between the gecko and other amniotes. Asterisks indicate the sauropsid species for which whole genome sequences have been published. Divergence times were based on the TimeTree project [22]
Properties of RNA-seq libraries
| Library | Library preparation | Sequencing | |||
|---|---|---|---|---|---|
| RNA source | Duration of RNA fragmentation (min) | × AMPure volume (targeted fraction to retain) | PCR cycles | ||
| A | -4 dpo whole embryoa | 8 | × 1.6 (>100 bp) | 6 | HiSeq 1/4 lanes, 171 cycles paired-end |
| B | 2 | × 0.7 (>300 bp) | 6 | ||
| C | 6 | MiSeq 1/4 lanes, 250 cycles paired-end | |||
| D | 9 dpo whole embryo | 8 | × 1.6 (>100 bp) | 2 | HiSeq 1/4 lanes, 171 cycles paired-end |
| E | 2 | × 0.7 (>300 bp) | 4 | ||
| F | 30 dpo head | 4 | × 1.0 (>150 bp) | 6 | HiSeq 2/3 lanes, 151 cycles paired-end |
| G | 30 dpo liver | 6 | |||
| H | 30 dpo tail | 6 | |||
aEmbryo of 4 days before the estimated day of oviposition
Fig. 2Size distribution of prepared and sequenced fragments. Fragment size distributions are shown for Library a and Library b (see Table 1). The red lines represent the size distributions reported by Agilent 2100 Bioanalyzer. The light blue areas represent inferred size distributions of the sequenced fragments. Insert sizes were extracted from the results of paired read mapping onto Assembly 1 (for Library a) and Assembly 2 (for Library b) (see Table 2 for details of these assemblies). A fragment size is a sum of the sizes of the insert and the TruSeq adapters
Transcriptome assembly statistics
| Assembly No. | Individuala or integratedb assembly | Assembly approach | Number of fragments (×106) c | Raw assembly | Assembly filtered by mapping count (≥5) | N50 length (bp) | ||
|---|---|---|---|---|---|---|---|---|
| Number of contigs | Number of subcomponents | Number of contigs | Number of subcomponents | |||||
| 1 | A | Trinity | 22.719 | 222178 | 168924 | 106323 | 62636 | 3091 |
| 2 | B | 21.224 | 228165 | 159338 | 94371 | 45267 | 3634 | |
| 3 | C | 3.569 | 104985 | 83417 | 37504 | 22331 | 3093 | |
| 4 | D | 23.712 | 417291 | 291424 | 204328 | 104294 | 3693 | |
| 5 | E | 16.037 | 383737 | 246347 | 149926 | 56669 | 4149 | |
| 6 | F | 75.929 | 798982 | 562528 | 358433 | 182611 | 3956 | |
| 7 | G | 82.453 | 787608 | 541906 | 375297 | 191055 | 3860 | |
| 8 | H | 81.033 | 525154 | 348570 | 250433 | 115476 | 4090 | |
| 9 | Integrated | All-in-one by Trinity | 326.676 | 1214573 | 852257 | 653132 | 387456 | 2680 |
| 10 | All-in-one by SOAPdenovo-trans, multiple k-mer lengths | 326.676 | 1087900 | 745363 | 748019 | 422329 | 4854 | |
| 11 | Assembly following Trinity’s normalization | 39.593d | 1465425 | 721986 | 972512 | 330937 | 3755 | |
| 12 | Assembly after khmer | 33.251d | 1464412 | 741241 | 945799 | 314023 | 2898 | |
| 13 | Assembly and clustering | 326.676 | 1562282 | 939252 | 996336 | 457323 | 3897 | |
aCorresponding library symbols (see Table 1) are included for individual assemblies
bIntegration of all the individual assemblies
cNumber of fragments for which both of the pairs passed quality control
dNote that this is a number of fragments after in silico normalization
Fig. 3Core Vertebrate Genes (CVG). a Flowchart showing selection procedure of the CVG from the chordate ortholog groups of eggNOG v4.0 (ChorNOGs). The 26 core species were specified by the eggNOG. Components of the CVG were shown in Additional file 5. b Taxonomic ranges of CEG (on a light blue background) and CVG (on a magenta background). The CEG consists of the six the species with asterisks, and the CVG set for CEGMA consists of the eight species in magenta. Tunicate orthologs were used as outgroup in order to distinguish one-to-one orthologs conserved in vertebrates from those with additional paralogs duplicated in the vertebrate lineage. Those with no additional vertebrate paralog were included in CVG. c Completeness scores of the transcriptome assemblies assessed by CEGMA referring to the 248 CEGs and 233 CVGs. The scores indicate proportions of the genes recognized as ‘complete’ in individual assemblies by CEGMA out of 248 CEGs and 233 CVGs. See Additional file 8 for the results of an equivalent assessment with BUSCO
Fig. 4Demonstrated assembly approaches. a All-in-one assembly using either Trinity or SOAPdenovo-trans resulted in Assembly 9 and Assembly 10, respectively). Assemblies employing multiple k-mer lengths based on SOAPdenovo-trans were merged by the same procedure as that of merging individual assemblies for the assembly and clustering approach below. b Assembly following in silico normalization of short reads with the normalization function implemented in Trinity and khmer resulted in Assembly 11 and Assembly 12. c Clustering following assembly was performed with both cd-hit-est and gicl (Assembly 13). See Table 2 for statistics of the generated assemblies and Methods for the details of these three individual procedures