| Literature DB >> 27614348 |
Matei David1, L J Dursi1, Delia Yao1, Paul C Boutros1,2,3, Jared T Simpson1,4.
Abstract
MOTIVATION: The highly portable Oxford Nanopore MinION sequencer has enabled new applications of genome sequencing directly in the field. However, the MinION currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27614348 PMCID: PMC5408768 DOI: 10.1093/bioinformatics/btw569
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Dataset summary
| Dataset | Reads | Avg Events | Hairpin | Avg Length | Identity | Insertions | Deletions | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | MN | M2 | M0 | M1 | N0 | N1 | M2 | M | N | M0 | M1 | N0 | N1 | M0 | M1 | N0 | N1 | |||
| 10000 | 19160 | .994 | .904 | 9803 | 9701 | 9135 | 9173 | 8741 | .861 | .699 | .681 | .071 | .055 | .058 | .044 | .07 | .097 | .09 | .121 | |
| 10000 | 13448 | .991 | .929 | 6861 | 6837 | 6419 | 6458 | 6152 | .874 | .706 | .688 | .071 | .053 | .058 | .042 | .069 | .095 | .089 | .118 | |
| Human | 10000 | 11145 | .892 | .863 | 5977 | 5816 | 5538 | 5594 | 5780 | .857 | .688 | .673 | .051 | .041 | .042 | .031 | .09 | .11 | .107 | .133 |
| Human PCR | 10000 | 8604 | .961 | .937 | 4377 | 4265 | 4083 | 4074 | 3993 | .858 | .69 | .676 | .057 | .045 | .047 | .038 | .085 | .104 | .103 | .122 |
Reads: number of ONT reads for which the Metrichor 2D read was mapped to the reference. Avg Events: average number of events. Hairpin N: fraction of reads for which Nanocall detects a hairpin. Hairpin MN: fraction of reads for which the Metrichor and Nanocall hairpins are within 100 events of each other. Avg Length: average basecalled read length for each read type: Metrichor 2D (M2), Metrichor template/complement (M0/M1), and Nanocall template/complement (N0/N1). Identity: alignment identity for Metrichor 2D, Metrichor 1D and Nanocall 1D reads. Insertions/Deletions: fraction of insertion/deletions for each 1D read type. The Nanocall runs in this table use the default options (double-strand scaling, 2ss).
Performance of Nanocall
| Dataset | Opts | Hpin | MCorr | NCorr | MIdn | NIdn | Speed* |
|---|---|---|---|---|---|---|---|
| .904 | .962 | .699 | .682 | 1106 | |||
| .941 | .681 | 763 | |||||
| .941 | .682 | 1655 | |||||
| .938 | 1217 | ||||||
| .929 | .673 | ||||||
| .929 | .978 | .706 | 909 | ||||
| .957 | .688 | 614 | |||||
| .957 | .689 | 1378 | |||||
| .955 | .69 | 974 | |||||
| .936 | .68 | ||||||
| Human | .863 | .763 | .688 | 747 | |||
| .703 | .673 | 515 | |||||
| .647 | .672 | 839 | |||||
| .642 | .675 | 1209 | |||||
| .441 | .652 | ||||||
| Human PCR | .937 | .868 | .69 | 569 | |||
| .823 | .676 | 397 | |||||
| .785 | .675 | 928 | |||||
| .785 | .675 | 647 | |||||
| .543 | .655 |
Options: fast: no training; 1ss/2ss: single/double strand scaling; nott: no transition parameter training. Hpin: fraction of reads where Metrichor and Nanocall hairpins are within 100 events of each other. M/N Corr: fraction of Metrichor 1D/Nanocall reads where the mapping of both strands overlaps the mapping of the Metrichor 2D read. M/N Idn: Metrichor 1D/Nanocall identity, for reads where all 5 mappings (1× Metrichor 2D, 2× Metrichor 1D, 2× Nanocall) are overlapping. Speed: Kbp per core-hour (*: measured separately for 1000 reads only, on a desktop computer with a 4-core Intel(R) Core(TM) i5-3570 CPU and 12GB of RAM). In bold face: best value in column, among options for that dataset
Fig. 1Fraction aligned versus identity using a base-10 logarithmic density plot, for all datasets (rows) and all read types (columns). Only showing reads where all 1D basecalls are mapped (Color version of this figure is available at Bioinformatics online.)
Fig. 2Comparison of Metrichor and Nanocall (2ss) mapped reads using a base-10 logarithmic density plot, for template reads from the human PCR dataset. We show Metrichor versus Nanocall: identity (row 1), read length (row 2) and fraction aligned (row 3). We compare Nanocall template reads with Metrichor template reads (column 1) and Metrichor 2D reads (column 2) (Color version of this figure is available at Bioinformatics online.)
Fig. 3Nanocall (2ss) versus Metrichor scaling parameters for mapped reads using a base-10 logarithmic density plot, for the human PCR dataset. We show Metrichor vs Nanocall scale (row 1) and shift (row 2), for template reads (col 1) and complement reads (col 2). Only showing reads where both Metrichor and Nanocall detected a hairpin and picked the same complement model (Color version of this figure is available at Bioinformatics online.)
Fig. 4Nanocall versus Metrichor hairpin position using a base-10 logarithmic density plot, for reads where Nanocall detected a hairpin, for each dataset (Color version of this figure is available at Bioinformatics online.)
Influence of default transition parameters pstay and pskip
| MCorr | NCorr | MIdn | NIdn | ||
|---|---|---|---|---|---|
| .10 | .24 | .852 | .689 | .679 | |
| .12 | .28 | .818 | .679 | ||
| .12 | .24 | .818 | .679 | ||
| .11 | .22 | .818 | .679 | ||
| .12 | .22 | .817 | .679 | ||
| .11 | .24 | .817 | .679 | ||
| .12 | .30 | .815 | .679 | ||
| .12 | .26 | .815 | .679 | ||
| .11 | .26 | .815 | |||
| .09 | .26 | .815 | .679 | ||
| .09 | .24 | .815 | .679 | ||
| .12 | .32 | .814 | .679 | ||
| .10 | .30 | .814 | .679 | ||
| .10 | .22 | .814 | .679 | ||
| .09 | .22 | .814 | .679 | ||
| .11 | .28 | .813 | .679 | ||
| .11 | .32 | .812 | .679 | ||
| .10 | .32 | .812 | .679 | ||
| .09 | .28 | .812 | .679 | ||
| .10 | .28 | .811 | .679 | ||
| .10 | .26 | .811 | .68 | ||
| .11 | .30 | .81 | .679 | ||
| .09 | .32 | .805 | .679 | ||
| .09 | .30 | .804 | .679 |
All runs on 1000 human pcr reads, with double strand scaling with no transition parameter training (2ss-nott). Other columns: see Table 2.
Influence of parameters that control training
| Param | Val | MCorr | NCorr | MIdn | NIdn | Speed |
|---|---|---|---|---|---|---|
| 250 | .852 | .689 | 313 | |||
| 200 | .811 | .676 | 397 | |||
| 1.0 | .811 | .676 | 396 | |||
| 10 | .811 | .676 | 396 | |||
| 20 | .81 | .676 | 394 | |||
| 15 | .81 | .676 | 395 | |||
| 5 | .81 | .675 | 450 | |||
| 1.5 | .809 | .675 | 445 | |||
| 2.0 | .805 | .675 | 485 | |||
| .5 | .801 | .677 | 333 | |||
| 4 | .8 | .674 | 513 | |||
| 150 | .795 | .675 | 530 | |||
| 3 | .787 | .673 | 639 | |||
| 100 | .767 | .671 | 752 | |||
| 2 | .764 | .671 | 866 | |||
| 50 | .722 | .668 |
Runs on 1000 human PCR reads, using double strand scaling mode with transition parameter training (2ss). numex: use x events used per strand; maxry: maximum of y training rounds per strand; minpz: minimum fit improvement of z in log space. Other columns: see Table 2.