| Literature DB >> 32164528 |
Yixuan Wang1,2, Xuanping Zhang1,2, Xiao Xiao3, Fei-Ran Zhang4, Xinxing Yan1,2, Xuan Feng1,2, Zhongmeng Zhao1,2, Yanfang Guan1,2,5, Jiayin Wang6,7.
Abstract
BACKGROUND: Genomic micro-satellites are the genomic regions that consist of short and repetitive DNA motifs. Estimating the length distribution and state of a micro-satellite region is an important computational step in cancer sequencing data pipelines, which is suggested to facilitate the downstream analysis and clinical decision supporting. Although several state-of-the-art approaches have been proposed to identify micro-satellite instability (MSI) events, they are limited in dealing with regions longer than one read length. Moreover, based on our best knowledge, all of these approaches imply a hypothesis that the tumor purity of the sequenced samples is sufficiently high, which is inconsistent with the reality, leading the inferred length distribution to dilute the data signal and introducing the false positive errors.Entities:
Keywords: Cancer genomics; Computational pipeline; Genomic micro-satellite; Length distribution estimation; Sequencing data analysis; Tumor purity
Mesh:
Year: 2020 PMID: 32164528 PMCID: PMC7069170 DOI: 10.1186/s12859-020-3349-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Micro-satellite length distributions in the mixed sample
Fig. 2The patterns of sequencing reads from a micro-satellite region sampled from a mixed sample. a short MS region. b long MS region
Fig. 3The changes in coverage when a micro-satellite event occurs, and the definitions of different read pairs. C-pairs: The paired-reads in which mapped to WIN-bk; T-pairs: The paired-reads in which both reads mapped from micro-satellite areas; O-pairs: The paired-reads in which one read perfectly matched to WIN-bk and one read mapped from micro-satellites areas; SO-pairs: The paired-reads in which one read is mapped from an micro-satellite area and one read spans across the breakpoints; S-pairs: The paired-reads in which one read is perfectly matched to WIN-bk and one read spans across the breakpoint. S-reads: The reads which span across the breakpoints in SO-pairs and S-pairs
Comparison results of ELMSI and MSIsensor
| Tumor proportion | MSIsensor | ELMSI | ||
|---|---|---|---|---|
| Precision | Recall | Precision | Recall | |
| 0.9 | 1 | 0.3333 | 1 | 1 |
| 0.7 | 1 | 0.1333 | 1 | 0.6667 |
| 0.5 | 0 | 0 | 1 | 0.5667 |
| 0.3 | 0 | 0 | 1 | 0.6 |
| 0.1 | 0 | 0 | 1 | 0.4667 |
Performance of ELMSI for longer micro-satellites classification
| Tumor purity | 0.9 | 0.7 | ||||||||
| No. | Breakpoint | Unit | Breakpoint | MSI | Breakpoint | MSI | ||||
| 1 | 34489 | TCATT | 86 | 125 | 34491 | 146.84 | 1 | 34489 | 163.44 | 1 |
| 2 | 122387 | GGCC | 425 | 525 | 122389 | 685.63 | 1 | 122387 | 724.23 | 1 |
| 3 | 189108 | GCTAC | 46 | 120 | 189158 | 105.65 | 1 | 189108 | 133.03 | 1 |
| 4 | 190653 | CATC | 43 | 136 | 190655 | 130.52 | 1 | 190653 | 170.21 | 1 |
| 5 | 194236 | AAC | 89 | 166 | 194238 | 145.93 | 1 | 194236 | 151.43 | 1 |
| 6 | 251655 | GCT | 71 | 111 | 251654 | 91.17 | 1 | 251655 | 71.71 | 1 |
| 7 | 311313 | ACCA | 56 | 236 | 311315 | 321.08 | 1 | 311313 | 331.60 | 1 |
| 8 | 356789 | GCT | 76 | 256 | 356790 | 51.47 | 1 | 356789 | 161.35 | 1 |
| 9 | 398971 | TTCG | 45 | 225 | 398973 | 213.12 | 1 | 398971 | 251.42 | 1 |
| 10 | 412340 | G | 100 | 280 | 412505 | 88.05 | 1 | 412340 | 70.50 | 1 |
| 11 | 432344 | TGA | 78 | 258 | 432343 | 177.30 | 1 | 432344 | 220.54 | 1 |
| 12 | 473174 | AAGG | 221 | 354 | 473176 | 462.75 | 1 | 473174 | 403.21 | 1 |
| 13 | 501994 | CGCCG | 78 | 161 | 501996 | 128.26 | 1 | 501994 | 329.52 | 1 |
| 14 | 505733 | ACAGGG | 40 | 111 | 505791 | 222.26 | 1 | 505733 | 248.28 | 1 |
| 15 | 526358 | GTCC | 58 | 144 | 526360 | 167.58 | 1 | 526358 | 152.30 | 1 |
| 16 | 612344 | TGC | 90 | 270 | 612342 | 355.32 | 1 | 612344 | 343.93 | 1 |
| 17 | 622735 | GGTTC | 77 | 142 | 622737 | 114.59 | 1 | 622735 | 197.75 | 1 |
| 18 | 677621 | TCA | 70 | 200 | 677623 | 163.89 | 1 | 677621 | 202.36 | 1 |
| 19 | 712345 | GACT | 89 | 269 | 712337 | 0 | 712345 | 230.62 | 1 | |
| 20 | 731506 | GA | 146 | 203 | 731506 | 104.14 | 1 | 731506 | 88.48 | 1 |
| 21 | 776166 | TAA | 213 | 324 | 776167 | 359.1743 | 1 | 776166 | 564.01 | 1 |
| 22 | 842735 | CTC | 134 | 211 | 842734 | 213.29 | 1 | 842735 | 236.04 | 1 |
| 23 | 866450 | TG | 185 | 220 | 866450 | 371.53 | 1 | 866450 | 526.3551 | 1 |
| 24 | 891334 | TCAGC | 105 | 285 | 891336 | 234.20 | 1 | 891334 | 338.38 | 1 |
| 25 | 908385 | AGAAT | 167 | 229 | 908386 | 194.85 | 1 | 908385 | 294.27 | 1 |
| 26 | 910124 | C | 205 | 301 | 910204 | 98.17 | 1 | 910124 | 32.50 | 1 |
| 27 | 929056 | CCG | 120 | 210 | 929058 | 199.72 | 1 | 929056 | 202.66 | 1 |
| 28 | 944729 | GGACT | 90 | 190 | 944731 | 214.33 | 1 | 944729 | 225.51 | 1 |
| 29 | 964608 | AGGGGG | 56 | 156 | 964610 | 305.59 | 1 | 964608 | 296.61 | 1 |
| 30 | 973099 | GGGCAC | 355 | 460 | 973101 | 849.18 | 1 | 973099 | 0 | |
| Accuracy | 0.967 | 0.967 | ||||||||
| Tumor purity | 0.5 | 0.3 | 0.1 | |||||||
| No. | Unit | Breakpoint | MSI | Breakpoint | MSI | Breakpoint | MS | |||
| 1 | TCATT | 34491 | 155.09 | 1 | 34489 | 165.59 | 1 | 34489 | 272.51 | 1 |
| 2 | GGCC | 122389 | 354.08 | 1 | 122387 | 710.49 | 1 | 122387 | 619.04 | 1 |
| 3 | GCTAC | 189108 | 177.33 | 1 | 189108 | 127.27 | 1 | 189108 | 94.68 | 1 |
| 4 | CATC | 190655 | 137.45 | 1 | 190653 | 84.00 | 1 | 190653 | 19.35 | 1 |
| 5 | AAC | 194238 | 172.94 | 1 | 194236 | 125.12 | 1 | 194236 | 82.57 | 1 |
| 6 | GCT | 251654 | 86.32 | 1 | 251655 | 30.28 | 1 | 251655 | 0 | |
| 7 | ACCA | 311355 | 240.46 | 1 | 311313 | 361.34 | 1 | 311313 | 509.08 | 1 |
| 8 | GCT | 356790 | 303.44 | 1 | 356789 | 189.28 | 1 | 356789 | 584.24 | 1 |
| 9 | TTCG | 398973 | 275.14 | 1 | 398971 | 282.36 | 1 | 398971 | 295.25 | 1 |
| 10 | G | 412483 | 66.43 | 1 | 412340 | 105.09 | 1 | 412340 | 0 | |
| 11 | TGA | 432343 | 292.88 | 1 | 432344 | 270.84 | 1 | 432344 | 345.93 | 1 |
| 12 | AAGG | 473176 | 434.98 | 1 | 473174 | 640.55 | 1 | 473174 | 779.62 | 1 |
| 13 | CGCCG | 501996 | 278.35 | 1 | 501994 | 365.00 | 1 | 501994 | 800.13 | 1 |
| 14 | ACAGGG | 505844 | 246.03 | 1 | 505733 | 178.62 | 1 | 505733 | 339.87 | 1 |
| 15 | GTCC | 526361 | 120.87 | 1 | 526358 | 132.08 | 1 | 526358 | 112.71 | 1 |
| 16 | TGC | 612419 | 467.03 | 1 | 612344 | 570.02 | 1 | 612344 | 0 | |
| 17 | GGTTC | 622737 | 215.57 | 1 | 622735 | 241.85 | 1 | 622735 | 254.75 | 1 |
| 18 | TCA | 677623 | 226.32 | 1 | 677621 | 241.83 | 1 | 677621 | 220.82 | 1 |
| 19 | GACT | 712347 | 393.39 | 1 | 712345 | 327.39 | 1 | 712345 | 0 | |
| 20 | GA | 731506 | 77.51 | 1 | 731506 | 51.49 | 1 | 731506 | 0 | |
| 21 | TAA | 776167 | 405.81 | 1 | 776166 | 170.12 | 1 | 776166 | 340.20 | 1 |
| 22 | CTC | 842734 | 221.52 | 1 | 842735 | 321.37 | 1 | 842735 | 204.91 | 1 |
| 23 | TG | 866450 | 485.10 | 1 | 866450 | 870.52 | 1 | 866450 | 236.77 | 1 |
| 24 | TCAGC | 891336 | 316.81 | 1 | 891334 | 112.10 | 1 | 891334 | 677.12 | 1 |
| 25 | AGAAT | 908386 | 206.39 | 1 | 908385 | 457.04 | 1 | 908385 | 152.63 | 1 |
| 26 | C | 910220 | 0 | 910124 | 0 | 910124 | 0 | |||
| 27 | CCG | 929058 | 203.18 | 1 | 929056 | 98.56 | 1 | 929056 | 269.48 | 1 |
| 28 | GGACT | 944731 | 257.57 | 1 | 944729 | 250.61 | 1 | 944729 | 279.33 | 1 |
| 29 | AGGGGG | 964610 | 416.61 | 1 | 964608 | 380.03 | 1 | 964608 | 691.83 | 1 |
| 30 | GGGCAC | 973101 | 1182.00 | 1 | 973099 | 0 | 973099 | 748.71 | 1 | |
| Accuracy | 0.967 | 0.933 | 0.8 |
Key indicators of ELMSI in different numbers number of mciro-satellites
| Number of MSIs | Coverage | Accuracy | Recall | Precision | Gain | MCC |
|---|---|---|---|---|---|---|
| 20 | 30 × | 0.5385 | 0.7000 | 0.7000 | 0.4000 | -0.300 |
| 60 × | 0.4815 | 0.6500 | 0.6500 | 0.3000 | -0.350 | |
| 100 × | 0.4419 | 0.6333 | 0.5938 | 0.2000 | -0.386 | |
| 120 × | 0.5750 | 0.7667 | 0.6970 | 0.4333 | -0.265 | |
| 30 | 30 × | 0.5250 | 0.7000 | 0.6774 | 0.3667 | -0.311 |
| 60 × | 0.5250 | 0.7000 | 0.6774 | 0.3667 | -0.311 | |
| 100 × | 0.6875 | 0.8250 | 0.8049 | 0.6250 | -0.184 | |
| 120 × | 0.6400 | 0.8000 | 0.7619 | 0.5500 | -0.218 | |
| 40 | 30 × | 0.6809 | 0.8000 | 0.8205 | 0.6250 | -0.189 |
| 60 × | 0.5882 | 0.7500 | 0.7317 | 0.4750 | -0.259 | |
| 100 × | 0.5606 | 0.7400 | 0.6981 | 0.4200 | -0.280 | |
| 120 × | 0.4930 | 0.7000 | 0.6250 | 0.2800 | -0.335 | |
| 50 | 30 × | 0.6949 | 0.8200 | 0.8200 | 0.6400 | -0.180 |
| 60 × | 0.7000 | 0.8400 | 0.8077 | 0.6400 | -0.175 | |
| 100 × | 0.6000 | 0.7500 | 0.7500 | 0.5000 | -0.250 | |
| 120 × | 0.5375 | 0.7167 | 0.6825 | 0.3833 | -0.299 | |
| 60 | 30 × | 0.6164 | 0.7500 | 0.7759 | 0.5333 | -0.236 |
| 60 × | 0.6081 | 0.7500 | 0.7627 | 0.5167 | -0.243 | |
| 100 × | 0.6279 | 0.7714 | 0.7714 | 0.5429 | -0.228 | |
| 120 × | 0.5862 | 0.7286 | 0.7500 | 0.4857 | -0.260 | |
| 70 | 30 × | 0.5333 | 0.6857 | 0.7059 | 0.4000 | -0.304 |
| 60 × | 0.4787 | 0.6429 | 0.6522 | 0.3000 | -0.352 | |
| 100 × | 0.5660 | 0.7500 | 0.6977 | 0.4250 | -0.274 | |
| 120 × | 0.5463 | 0.7375 | 0.6782 | 0.3875 | -0.290 | |
| 80 | 30 × | 0.6122 | 0.7500 | 0.7692 | 0.5250 | -0.240 |
| 60 × | 0.5980 | 0.7625 | 0.7349 | 0.4875 | -0.250 | |
| 100 × | 0.5664 | 0.7111 | 0.7356 | 0.4556 | -0.276 | |
| 120 × | 0.5575 | 0.7000 | 0.7326 | 0.4444 | -0.283 | |
| 90 | 30 × | 0.4957 | 0.6444 | 0.6824 | 0.3444 | -0.336 |
| 60 × | 0.5085 | 0.6667 | 0.6818 | 0.3556 | -0.325 | |
| 100 × | 0.4138 | 0.6000 | 0.5714 | 0.1500 | -0.414 | |
| 120 × | 0.6667 | 0.8000 | 0.8000 | 0.6000 | -0.200 | |
| 100 | 30 × | 0.6116 | 0.7400 | 0.7789 | 0.5300 | -0.239 |
| 60 × | 0.5191 | 0.6800 | 0.6869 | 0.3700 | -0.316 | |
| 100 × | 0.6290 | 0.7800 | 0.7647 | 0.5400 | -0.227 | |
| 120 × | 0.5952 | 0.7500 | 0.7426 | 0.4900 | -0.253 |
Comparisons of the performance of ELMSI in different coverages
| Number of MSIs | Coverage | Accuracy | Recall | Precision | Gain | MCC |
|---|---|---|---|---|---|---|
| 20 | 10 × | 0.3667 | 0.5500 | 0.5238 | 0.05 | -0.4629 |
| 20 × | 0.3846 | 0.5000 | 0.5714 | 0.20 | -0.4330 | |
| 30 × | 0.4167 | 0.5263 | 0.6429 | 0.26 | -0.3974 | |
| 40 × | 0.4815 | 0.6500 | 0.6500 | 0.30 | -0.3500 | |
| 50 × | 0.6957 | 0.8000 | 0.8421 | 0.65 | -0.1777 | |
| 60 × | 0.6818 | 0.7895 | 0.8333 | 0.63 | -0.1873 | |
| 70 × | 0.6400 | 0.7619 | 0.8000 | 0.57 | -0.2182 | |
| 80 × | 0.6071 | 0.7391 | 0.7727 | 0.52 | -0.2435 | |
| 90 × | 0.5483 | 0.7083 | 0.7083 | 0.42 | -0.3077 | |
| 100 × | 0.5294 | 0.6923 | 0.6923 | 0.38 | -0.3077 | |
| 40 | 10 × | 0.3928 | 0.61110 | 0.5238 | 0.06 | -0.4303 |
| 20 × | 0.4615 | 0.6000 | 0.6667 | 0.30 | -0.3651 | |
| 30 × | 0.5106 | 0.6857 | 0.6667 | 0.34 | -0.3237 | |
| 40 × | 0.5208 | 0.6756 | 0.6744 | 0.38 | -0.3148 | |
| 50 × | 0.5192 | 0.6750 | 0.6923 | 0.38 | -0.3162 | |
| 60 × | 0.5762 | 0.8333 | 0.7555 | 0.48 | -0.2670 | |
| 70 × | 0.6809 | 0.8000 | 0.8205 | 0.63 | -0.1895 | |
| 80 × | 0.6600 | 0.8250 | 0.7674 | 0.58 | -0.2017 | |
| 90 × | 0.6538 | 0.7906 | 0.7555 | 0.53 | -0.2262 | |
| 100 × | 0.6800 | 0.8500 | 0.7727 | 0.60 | -0.1846 | |
| 60 | 10 × | 0.4218 | 0.5869 | 0.6000 | 0.20 | -0.4065 |
| 20 × | 0.4247 | 0.5167 | 0.7045 | 0.30 | -0.3779 | |
| 30 × | 0.4861 | 0.6250 | 0.6862 | 0.34 | -0.3430 | |
| 40 × | 0.5441 | 0.6981 | 0.7115 | 0.42 | -0.2951 | |
| 50 × | 0.5232 | 0.6818 | 0.6923 | 0.38 | -0.3129 | |
| 60 × | 0.5000 | 0.6571 | 0.6765 | 0.34 | -0.3331 | |
| 70 × | 0.5455 | 0.7000 | 0.7119 | 0.42 | -0.2940 | |
| 80 × | 0.5000 | 0.6470 | 0.6875 | 0.35 | -0.3321 | |
| 90 × | 0.6216 | 0.7667 | 0.7667 | 0.53 | -0.2333 | |
| 100 × | 0.5444 | 0.7000 | 0.7101 | 0.41 | -0.2949 |
Key indicators of ELMSI corresponding to different read lengths
| Number of MSIs | Read length | Coverage | Accuracy | Recall | Precision | Gain | MCC |
|---|---|---|---|---|---|---|---|
| 20 | 100 | 30 × | 0.31 | 0.45 | 0.5 | 0 | -0.25 |
| 60 × | 0.33 | 0.5 | 0.5 | 0 | -0.5 | ||
| 100 × | 0.31 | 0.45 | 0.5 | 0 | -0.52 | ||
| 120 × | 0.36 | 0.55 | 0.52 | 0.05 | -0.46 | ||
| 150 | 30 × | 0.31 | 0.45 | 0.5 | 0 | -0.25 | |
| 60 × | 0.37 | 0.5 | 0.59 | 0.15 | -0.45 | ||
| 100 × | 0.46 | 0.6 | 0.6 | 0.3 | -0.36 | ||
| 120 × | 0.42 | 0.55 | 0.65 | 0.25 | -0.40 | ||
| 200 | 30 × | 0.5 | 0.65 | 0.68 | 0.35 | -0.33 | |
| 60 × | 0.49 | 0.57 | 0.65 | 0.1 | -0.55 | ||
| 100 × | 0.48 | 0.7 | 0.61 | 0.25 | -0.34 | ||
| 120 × | 0.52 | 0.7 | 0.67 | 0.35 | -0.32 | ||
| 250 | 30 × | 0.6 | 0.75 | 0.75 | 0.5 | -0.25 | |
| 60 × | 0.54 | 0.7 | 0.7 | 0.4 | -0.3 | ||
| 100 × | 0.55 | 0.75 | 0.69 | 0.2 | -0.38 | ||
| 120 × | 0.58 | 0.75 | 0.71 | 0.25 | -0.34 | ||
| 50 | 100 | 30 × | 0.43 | 0.6 | 0.6 | 0.2 | -0.4 |
| 60 × | 0.37 | 0.55 | 0.54 | 0.08 | -0.46 | ||
| 100 × | 0.45 | 0.63 | 0.61 | 0.23 | -0.38 | ||
| 120 × | 0.36 | 0.55 | 0.52 | 0.05 | -0.46 | ||
| 150 | 30 × | 0.51 | 0.68 | 0.63 | 0.35 | -0.33 | |
| 60 × | 0.45 | 0.63 | 0.61 | 0.23 | -0.38 | ||
| 100 × | 0.48 | 0.58 | 0.62 | 0.05 | -0.45 | ||
| 120 × | 0.47 | 0.73 | 0.63 | 0.45 | -0.28 | ||
| 200 | 30 × | 0.51 | 0.7 | 0.65 | 0.45 | -0.32 | |
| 60 × | 0.61 | 0.78 | 0.64 | 0.5 | -0.24 | ||
| 100 × | 0.52 | 0.63 | 0.67 | 0.15 | -0.40 | ||
| 120 × | 0.56 | 0.75 | 0.68 | 0.4 | -0.28 | ||
| 250 | 30 × | 0.53 | 0.7 | 0.68 | 0.38 | -0.31 | |
| 60 × | 0.57 | 0.78 | 0.7 | 0.23 | -0.36 | ||
| 100 × | 0.61 | 0.7 | 0.67 | 0.35 | -0.32 | ||
| 120 × | 0.59 | 0.75 | 0.7 | 0.25 | -0.33 |