| Literature DB >> 26484068 |
Abstract
Genomic data have become commonplace in most branches of the biological sciences and have fundamentally altered the way research is conducted. However, the predominance of short-read sequence data from second-generation sequencing technologies has commonly resulted in fragmented and partial genomic data characteristics. In this opinion, I will highlight how long, unbiased reads from single molecule, real-time (SMRT) sequencing now allow for a return to more contiguous and comprehensive views of genomes.Entities:
Keywords: Consensus accuracy; DNA sequencing; De novo assembly; GC bias; Reference genome; Sequence read length
Year: 2014 PMID: 26484068 PMCID: PMC4535613 DOI: 10.1016/j.gdata.2014.02.003
Source DB: PubMed Journal: Genom Data ISSN: 2213-5960
Fig. 1Yeast (Saccharomyces cerevisiae) de novo assembly (green) using SMRT sequencing and HGAP, and comparison to the reference genome (strain S228C, blue). Data available at http://pacbiodevnet.com/.
Arabidopsis thaliana Ler-0 strain de novo assembly using SMRT sequencing data and HGAP, and comparison to a short-read assembly (Data available at http://pacbiodevnet.com/ and http://1001genomes.org/data/MPI/MPISchneeberger2011/releases/current/, respectively).
| PacBio assembly | Short-read assembly (2011) | Improvement | |
|---|---|---|---|
| Assembly size (bp) | 124,572,784 | 110,357,164 | 12% |
| # contigs | 540 | 4662 | 8.6 × |
| Contig N50 (bp) | 6,190,353 | 66,600 | 90 × |
| Max contig length (bp) | 12,982,390 | 462,490 | 30 × |