| Literature DB >> 28039163 |
Michiaki Hamada1,2,3, Yukiteru Ono4, Kiyoshi Asai3,5, Martin C Frith2,3,5.
Abstract
Summary: LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation: the source code is freely available at http://last.cbrc.jp/. Contact: mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary information: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2017 PMID: 28039163 PMCID: PMC5351549 DOI: 10.1093/bioinformatics/btw742
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Results of haplotype phasing with Nanopore long reads (NA12878)
| GraphMap | BLASR | LAST | LAST + LAST-TRAIN | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Manual ( | Manual ( | Training | Training+LAMA | ||||||||||
| Haplotype | Polymorphism | Count | Freq | Count | Freq | Count | Freq | Count | Freq | Count | Freq | Count | Freq |
| TT: CT | CYP2D6*4 | 207 | 11.9% | 227 | 18.4% | 340 | 20.1% | 182 | 21.2% | 327 | 27.3% | 343 | 27.4% |
| TT: CC | (reference bias) | 225 | 13.0% | 329 | 26.6% | 326 | 19.3% | 164 | 19.1% | 160 | 13.4% | 134 | 10.7% |
| T−: CT | 70 | 4.0% | 31 | 2.5% | 65 | 3.8% | 36 | 4.2% | 75 | 6.3% | 78 | 6.2% | |
| T−: CC | CYP2D6*3 | 226 | 13.0% | 281 | 22.8% | 232 | 13.7% | 199 | 23.1% | 199 | 16.6% | 217 | 17.3% |
| Other | 1006 | 58.1% | 367 | 29.7% | 726 | 43.1% | 279 | 32.4% | 436 | 36.4% | 480 | 38.3% | |
| Total | 1734 | 100.0% | 1235 | 100.0% | 1689 | 100.0% | 860 | 100.0% | 1197 | 100.0% | 1252 | 100.0% | |
In the first column, TX:CY indicates the phased haplotype where the 1st position (rs35742686) is ‘X’ (‘T’ in the reference genome) and the 2nd position (rs3892097) is ‘Y’ (‘C’ in the reference genome). See also Supplementary Table S14. The high frequency for TT:CC (the identical haplotype to the reference genome) is known as reference bias (Laver ). The values for ‘BLASR’ were computed from the mapping results in Ammar , where BLASR was used for mapping Nanopore reads to the reference genome. The column ‘training + LAMA’ shows the results of probabilistic alignment (Hamada ) using forward scores with the trained parameters by LAST-TRAIN. See Supplementary Materials S7 for the detailed command line options for every tool.
| A | C | G | T | |
|---|---|---|---|---|
| A | 7 | −11 | −8 | −16 |
| C | −7 | 5 | −8 | −7 |
| G | −5 | −8 | 5 | −8 |
| T | −19 | −12 | −13 | 7 |