| Literature DB >> 26921234 |
Joaquin Tarraga1, Asunción Gallego2, Vicente Arnau3, Ignacio Medina4, Joaquin Dopazo5,6,7.
Abstract
BACKGROUND: The use of nanopore technologies is expected to spread in the future because they are portable and can sequence long fragments of DNA molecules without prior amplification. The first nanopore sequencer available, the MinION™ from Oxford Nanopore Technologies, is a USB-connected, portable device that allows real-time DNA analysis. In addition, other new instruments are expected to be released soon, which promise to outperform the current short-read technologies in terms of throughput. Despite the flood of data expected from this technology, the data analysis solutions currently available are only designed to manage small projects and are not scalable.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26921234 PMCID: PMC4769497 DOI: 10.1186/s12859-016-0966-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Files generated (for each run) by the stats command in HPG pore, where seq can be a template, a complement or a 2D read
| Output file name | File description |
|---|---|
| summary.txt | Text file containing the number of reads and nucleotides, the mean, min. and max read length, nucleotide distribution, %GC, and mean quality |
|
| Image of the read length histogram |
|
| Image of the nucleotides (A, C, T, G, N) per position in the read |
|
| Image of the GC histogram |
|
| Image of the number of nucleotides (yield) over time |
|
| Image of the read quality histogram |
|
| Image of the mean quality per position in read |
|
| Image of the number of reads processed per channel |
|
| Image of the number of nucleotides (yield) processed per channel |
Fig. 1Electronic signal measured for each nanopore translocation event over time for a given MinIONTM template read
Comparison of HPG Pore to the other tools available
| Feature | HPG Pore | poRe | Poretools |
|---|---|---|---|
| Extract FASTq | Y | Y | Y |
| Extract FASTA | Y | Y | Y |
| Organise fast5 into run folders | – | Y | – |
| Create tar files of runs | – | – | Y |
| Organise the results into run folders | Y | – | – |
| Plot yield | Y | Y | Y |
| Plot squiggle | Y | Y | Y |
| Extract run stats | Y | Y | Y |
| Read length histogram | Y | Y | Y |
| read length (max., avg., min) | Y | Y | Y |
| Mean read quality | Y | – | – |
| Nucleotides content: count and % | Y | – | Y1 |
| %GC | Y | – | – |
| Plot Frequency- %GC | Y | – | – |
| Plot per base sequence content | Y | – | – |
| Read quality histogram | Y | – | – |
| Reads per channel histogram | Y | Y | Y2 |
| Nucleotides per channel histogram | Y | Y | – |
1 Poretools does not display the nucleotide content percentage, only counts
2 Poretools returns the occupancy of pores, not the reads per channel
Fig. 2Runtimes of the three programs, poRe, Poretools, and HPG Pore, as a function of the number of sequences in the FAST5 file
Fig. 3Runtimes (upper panel) and increase in speed (lower panel) as the number of nodes increase in the Hadoop system in two different scenarios: FAST5 file containing 32,000 (blue line), 100,000 (red line), 300,000 (green line) and 1 million (dark blue line) sequences. Dotted line in the lower panel represents the ideal speed-up according to the number of nodes used. Speed-ups have been calculated using 3 nodes as the starting point given that the 1 million reads could not be calculated for1 only one node