| Literature DB >> 32219145 |
Erdal Cosgun1, Min Oh2.
Abstract
BACKGROUND: Next-generation sequencing enables massively parallel processing, allowing lower cost than the other sequencing technologies. In the subsequent analysis with the NGS data, one of the major concerns is the reliability of variant calls. Although researchers can utilize raw quality scores of variant calling, they are forced to start the further analysis without any preevaluation of the quality scores.Entities:
Mesh:
Year: 2020 PMID: 32219145 PMCID: PMC7061114 DOI: 10.1155/2020/8531502
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Analysis pipeline of secondary analysis of NGS data.
VCF fields extracted by VariantAnnotation package.
| Category | Field | Type | Description |
|---|---|---|---|
| INFO | AC | Integer | Allele count in genotypes, for each ALT allele, in the same order as listed |
| DB | Flag | dbSNP membership | |
| DP | Integer | Approximate read depth: some reads may have been filtered | |
| FS | Float | Phred-scaled | |
| MQ | Float | RMS mapping quality | |
| SOR | Float | Symmetric odds ratio of 2 × 2 contingency table to detect strand bias | |
|
| |||
| GENO | AD | Integer | Allelic depths for the ref and alt alleles in the order listed |
| GQ | Integer | Genotype quality | |
|
| |||
| FIXED | QUAL | Float | A quality score associated with the inference of the given alleles |
Figure 2Average correlation coefficient matrix of simulated 24 data sets.
Figure 3Average correlation coefficient matrix of real 24 data sets.
Prediction result for 24 simulated VCFs.
| Metric | Machine learning algorithm |
| RMSE | RSE | MAE | RAE |
|---|---|---|---|---|---|---|
| Average | MLR | 0.944 | 181.726 | 0.056 | 114.624 | 0.196 |
|
|
|
|
|
|
| |
| NNR | 0.898 | 212.891 | 0.102 | 110.602 | 0.189 | |
|
| ||||||
| Standard deviation | MLR | 0.003 | 4.346 | 0.003 | 3.180 | 0.007 |
| RFR | 0.004 | 8.387 | 0.004 | 1.254 | 0.003 | |
| NNR | 0.191 | 124.380 | 0.191 | 100.858 | 0.172 | |
MLR: multivariate linear regression; RFR: random forest regression; NNR: neural network regression; RMSE: root-mean-square error; RSE: relative squared error; MAE: mean absolute error; RAE: relative absolute error.
Prediction result for 24 real VCFs.
| Metric | Machine learning algorithm |
| RMSE | RSE | MAE | RAE |
|---|---|---|---|---|---|---|
| Average | MLR | 0.965 | 250.403 | 0.036 | 88.170 | 0.302 |
|
|
|
|
|
|
| |
| NNR | 0.593 | 804.970 | 0.409 | 180.281 | 0.599 | |
|
| ||||||
| Standard deviation | MLR | 0.015 | 44.463 | 0.015 | 12.665 | 0.081 |
| RFR | 0.005 | 39.842 | 0.005 | 7.604 | 0.013 | |
| NNR | 0.345 | 448.786 | 0.344 | 91.056 | 0.286 | |