| Literature DB >> 27556417 |
Weixing Feng1, Sen Zhao1, Dingkai Xue1, Fengfei Song1, Ziwei Li1, Duojiao Chen1, Bo He2, Yangyang Hao3, Yadong Wang4, Yunlong Liu5,6.
Abstract
BACKGROUND: Ion Torrent and Ion Proton are semiconductor-based sequencing technologies that feature rapid sequencing speed and low upfront and operating costs, thanks to the avoidance of modified nucleotides and optical measurements. Despite of these advantages, however, Ion semiconductor sequencing technologies suffer much reduced sequencing accuracy at the genomic loci with homopolymer repeats of the same nucleotide. Such limitation significantly reduces its efficiency for the biological applications aiming at accurately identifying various genetic variants.Entities:
Keywords: Alignment; Bayesian; Homopolymer; Ion Torrent/Proton
Mesh:
Year: 2016 PMID: 27556417 PMCID: PMC5001236 DOI: 10.1186/s12864-016-2894-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Profile of retrieved homopolymers. Profile of retrieved homopolymers according to (a) nucleotide type and (b) position in the sequencing reads
Fig. 2Prior possibilities of the detected voltages. Prior possibilities of the detected voltages when nucleotide type is A and position in the sequencing reads belongs to Z1
Fig. 3Other factors in Identification of homopolymer length. Other factors in identification of homopolymer length as (a) nucleotide type when homopolymer length is 4 and position in the sequencing reads belongs to Z1 and (b) position in the sequencing reads when homopolymer length is 4 and nucleotide type is A
Fig. 4Identification result of homopolymer lengths. Identification result of homopolymer lengths when nucleotide type is A and position in the sequencing read belongs to Z1. The result is presented as (a) frequency of identification errors and (b) distribution of identification result
Identification errors of homopolymer length with different methods
| No | Nt | Pos | Count | Errors (%) | |||||
|---|---|---|---|---|---|---|---|---|---|
| KNN | Torrent suite | Bayesian | Reference | Proposed approach | |||||
| Weight | Errors | ||||||||
| 1 | A | 1–75 | 144230 | 7.002 | 1.119 | 2.296 | 0.298 | 0.28 | 0.250 |
| 2 | A | 76–150 | 112776 | 12.121 | 1.651 | 4.722 | 0.489 | 0.34 | 0.453 |
| 3 | A | 151–225 | 97568 | 18.733 | 2.926 | 8.150 | 0.423 | 0.14 | 0.421 |
| 4 | A | 226–300 | 48033 | 22.292 | 4.655 | 10.259 | 0.535 | 0.24 | 0.510 |
| 5 | C | 1–75 | 88732 | 6.534 | 1.843 | 2.779 | 0.034 | 0.14 | 0.027 |
| 6 | C | 76–150 | 77650 | 10.382 | 2.489 | 4.595 | 0.556 | 0.36 | 0.121 |
| 7 | C | 151–225 | 63658 | 18.581 | 3.187 | 6.383 | 0.545 | 0.28 | 0.542 |
| 8 | C | 226–300 | 35736 | 17.910 | 4.600 | 6.159 | 0.926 | 0.30 | 0.923 |
| 9 | G | 1–75 | 97493 | 4.141 | 1.422 | 1.826 | 0.609 | 0.30 | 0.376 |
| 10 | G | 76–150 | 78192 | 14.874 | 1.623 | 3.864 | 0.322 | 0.32 | 0.152 |
| 11 | G | 151–225 | 64680 | 16.868 | 2.273 | 5.683 | 1.062 | 0.14 | 1.062 |
| 12 | G | 226–300 | 34116 | 18.754 | 2.492 | 7.985 | 0.147 | 0.12 | 0.147 |
| 13 | T | 1–75 | 156550 | 5.186 | 1.106 | 2.504 | 0.076 | 0.14 | 0.054 |
| 14 | T | 76–150 | 152034 | 11.446 | 1.571 | 5.780 | 0.342 | 0.30 | 0.297 |
| 15 | T | 151–225 | 111090 | 14.720 | 2.331 | 7.290 | 0.419 | 0.32 | 0.362 |
| 16 | T | 226–300 | 68448 | 13.912 | 3.315 | 8.240 | 0.723 | 0.28 | 0.599 |
“Count” means the number of each class of homopolymers. “KNN” means the method of K nearest neighbors. “Reference” means only reference information is used in the designed model(Weight = 0)
Fig. 5Comparison of identification results among different identification methods. Comparison of identification results among different identification methods according to (a) all methods and (b) two methods of only using reference information and the proposed method