| Literature DB >> 23855743 |
Jiawen Bian, Chenglin Liu, Hongyan Wang, Jing Xing, Priyanka Kachroo, Xiaobo Zhou.
Abstract
BACKGROUND: The rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Single nucleotide variants (SNVs) inferred from next generation sequencing are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNVs detection capability in the regulatory regions of the genome. Post probabilistic based methods are efficient for detection of SNVs in high coverage regions or sequencing data with high depth. However, for data with low sequencing depth, the efficiency of such algorithms remains poor and needs to be improved.Entities:
Mesh:
Year: 2013 PMID: 23855743 PMCID: PMC3718670 DOI: 10.1186/1471-2105-14-225
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Observation of HMM: the illustration for alignment at position for the sequence in study. The observation sets and are considered as the observation of o in HMM. For the bases on the covered reads, blue color denotes the base is the same as reference allele, yellow color denotes the base is different from reference allele while purple color denotes the base is undecided on the genome in study.
Figure 2States of HMM: the illustration for three states {aa}, {ab} and {bb} in HMM. The meaning of different colors are defined the same as in Figure 1.
Comparison of statistical performance of SNVHMM with SNVMix2 with different mapping quality (MQ) and base quality (BQ) threshold for 10X data
| SNVHMM | 50 | 20 | 247 | 59 | 133 | 58 | 80.98 | 69.27 | 76.46(MVC = 4) | 0.8085 |
| 40 | 20 | 254 | 66 | 126 | 51 | 83.28 | 65.63 | 76.46(MVC = 4) | 0.8128 | |
| 30 | 20 | 256 | 70 | 112 | 49 | 83.93 | 65.54 | 76.05(MVC = 4) | 0.8114 | |
| 30 | 10 | 236 | 63 | 129 | 69 | 77.38 | 67.19 | 73.44(MVC = 5) | 0.7815 | |
| 20 | 10 | 273 | 111 | 81 | 32 | 89.51 | 42.19 | 71.23(MVC = 3) | 0.7925 | |
| 10 | 5 | 273 | 110 | 82 | 32 | 89.51 | 42.71 | 71.43(MVC = 2) | 0.7936 | |
| SNVMix2_TI | 50 | 20 | 303 | 160 | 32 | 2 | 99.34 | 16.67 | 67.40 | 0.7891 |
| 40 | 20 | 305 | 174 | 18 | 0 | 100 | 9.38 | 64.99 | 0.7781 | |
| 30 | 20 | 305 | 174 | 18 | 0 | 100 | 9.38 | 64.99 | 0.7781 | |
| 30 | 10 | 305 | 173 | 19 | 0 | 100 | 9.89 | 65.19 | 0.7791 | |
| 20 | 10 | 305 | 191 | 1 | 0 | 100 | 0.52 | 61.56 | 0.7615 | |
| 10 | 5 | 305 | 192 | 0 | 0 | 100 | 0 | 61.37 | 0.7606 | |
| SNVMix2_TO | 50 | 20 | 245 | 75 | 117 | 60 | 80.32 | 60.94 | 72.84 | 0.7840 |
| 40 | 20 | 261 | 88 | 104 | 44 | 85.57 | 54.17 | 73.44 | 0.7982 | |
| 30 | 20 | 266 | 90 | 102 | 39 | 87.21 | 53.13 | 74.04 | 0.8048 | |
| 30 | 10 | 274 | 92 | 100 | 31 | 89.83 | 52.08 | 75.25 | 0.8167 | |
| 20 | 10 | 283 | 125 | 67 | 22 | 92.78 | 34.90 | 70.42 | 0.7938 | |
| 10 | 5 | 290 | 134 | 58 | 15 | 95.08 | 30.21 | 70.02 | 0.7956 |
Comparison of statistical performance of SNVHMM with SNVMix2 with different mapping quality (MQ) and base quality (BQ) threshold for 40X data
| SNVHMM | 50 | 20 | 281 | 77 | 115 | 24 | 92.13 | 59.89 | 79.68(MVC = 7) | 0.8477 |
| 40 | 20 | 283 | 83 | 109 | 22 | 92.78 | 56.77 | 78.87(MVC = 6) | 0.8435 | |
| 30 | 20 | 273 | 77 | 115 | 32 | 89.51 | 59.89 | 78.07(MVC = 9) | 0.8336 | |
| 30 | 10 | 289 | 79 | 113 | 16 | 94.75 | 58.85 | 80.88(MVC = 9) | 0.8588 | |
| 20 | 10 | 279 | 86 | 106 | 26 | 91.47 | 55.21 | 77.46(MVC = 8) | 0.8328 | |
| 10 | 5 | 281 | 87 | 105 | 24 | 92.13 | 54.69 | 77.67(MVC = 7) | 0.8351 | |
| SNVMix2_TI | 50 | 20 | 291 | 109 | 83 | 14 | 95.40 | 43.23 | 75.25 | 0.8255 |
| 40 | 20 | 294 | 113 | 79 | 11 | 96.39 | 41.14 | 75.05 | 0.8258 | |
| 30 | 20 | 294 | 115 | 77 | 11 | 96.39 | 40.10 | 74.65 | 0.8235 | |
| 30 | 10 | 295 | 113 | 79 | 10 | 96.72 | 41.15 | 75.25 | 0.8275 | |
| 20 | 10 | 294 | 117 | 75 | 11 | 96.39 | 39.06 | 74.25 | 0.8212 | |
| 10 | 5 | 295 | 118 | 74 | 10 | 96.72 | 38.54 | 74.25 | 0.8217 | |
| SNVMix2_TO | 50 | 20 | 283 | 86 | 106 | 22 | 92.79 | 55.21 | 78.27 | 0.8398 |
| 40 | 20 | 287 | 93 | 99 | 18 | 94.10 | 51.56 | 77.67 | 0.8380 | |
| 30 | 20 | 287 | 96 | 96 | 18 | 94.10 | 50.00 | 77.06 | 0.8343 | |
| 30 | 10 | 284 | 105 | 87 | 21 | 93.11 | 45.31 | 74.65 | 0.8184 | |
| 20 | 10 | 291 | 105 | 87 | 14 | 95.40 | 45.31 | 76.06 | 0.8302 | |
| 10 | 5 | 291 | 104 | 88 | 14 | 95.41 | 45.83 | 76.26 | 0.8314 |
Figure 3Plot of the (a) accuracy and (b) F-score with the change of minimum and valid value of coverage for 10X lobular breast cancer data.
Figure 4Plot of the (a) accuracy and (b) F-score with the change of minimum valid value of coverage for 40X lobular breast cancer data.
Figure 5ROC curve of SNVHMM and SNVMix2_TO for lobular breast cancer data.
Number of point mutations found in 5 MDS RNA-Seq data and 2 whole exome data for SNVHMM
| Sample | RS_1 | RS_2 | RS_3 | RS_4 | RS_5 | WE_1 | WE_2 |
| Number1 | 10645(91.6%) | 33354(93.3%) | 13881(91.5%) | 4777(94.2%) | 6951(92.6%) | 58803(94.8%) | 61344(93.7%) |
1The proportion that both SNVHMM and SNVMix2 predicted is reported in the parenthesis.
Comparison of the parameters of SNVHMM before and after training on lobular breast cancer (LBC) data and MDS RNA_seq data for threshold of mapping quality 50 and base quality 20
| initial | (0.904233 0.499051 0.090499) | (0.904233 0.499051 0.090499) | (0.904233 0.499051 0.090499) | |
| trained | (0.001199 0.984717 0.014086) | (0.000001 0.999831 0.000170) | (0.988258 0.011743 0.000001) | |
| initial | (0.999023 0.508543 0.000123) | (0.999023 0.508543 0.000123) | (0.999023 0.508543 0.000123) | |
| trained | (0.904833 0.466663 0.151255) | (0.897743 0.509801 0.165214) | (0.904233 0.544121 0.090499) | |
| initial | ||||
| trained | ||||