| Literature DB >> 28698882 |
Jinhwa Kong1,2, Jaemoon Shin1,2, Jungim Won2, Keonbae Lee3, Unjoo Lee4, Jeehee Yoon1.
Abstract
Copy number variations (CNVs) are structural variants associated with human diseases. Recent studies verified that disease-related genes are based on the extraction of rare de novo and transmitted CNVs from exome sequencing data. The need for more efficient and accurate methods has increased, which still remains a challenging problem due to coverage biases, as well as the sparse, small-sized, and noncontinuous nature of exome sequencing. In this study, we developed a new CNV detection method, ExCNVSS, based on read coverage depth evaluation and scale-space filtering to resolve these problems. We also developed the method ExCNVSS_noRatio, which is a version of ExCNVSS, for applying to cases with an input of test data only without the need to consider the availability of a matched control. To evaluate the performance of our method, we tested it with 11 different simulated data sets and 10 real HapMap samples' data. The results demonstrated that ExCNVSS outperformed three other state-of-the-art methods and that our method corrected for coverage biases and detected all-sized CNVs even without matched control data.Entities:
Mesh:
Year: 2017 PMID: 28698882 PMCID: PMC5494116 DOI: 10.1155/2017/9631282
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The flowchart of our method. It includes two procedures: data preprocessing and CNV estimation. The data preprocessing procedure included a four-step normalization protocol. The CNV estimation procedure included a Gaussian convolution, finger print mapping, baseline adjustment, interval search, and CNV detection.
Figure 2The ROC curves of the five methods. FNRs and FPRs were calculated on 11 simulated data sets at different threshold levels, and ROC curves were generated on the basis of averaged values. The circled symbol on each curve represents the performance of each method using default parameters.
CNV detection performances across variant sizes using simulated data sets. Each method was run with its default parameters.
| Size of variants | 100~159 bp | 160~299 bp | 300~8260 bp |
| Number of simulated instances (gain/loss) | 438 (212/226) | 430 (219/211) | 93 (52/41) |
| Size of gain instances (min/max) | Gain (120 bp/151 bp) | Gain (184 bp/296 bp) | Gain (305 bp/8260 bp) |
| Size of loss instances (min/max) | Loss (120 bp/151 bp) | Loss (178 bp/299 bp) | Loss (301 bp/1561 bp) |
|
| |||
| ExCNVSS | |||
| Number of correctly detected instances (gain/loss) | 365 (164/201) | 383 (192/191) | 82 (45/37) |
| FNR/FPR (%) | 16.7/2.7 | 10.9/2.2 | 12.1/6.3 |
| Detected region size (bp) | Gain (120/151) | Gain (184/296) | Gain (305/8260) |
| (Min/max) | Loss (120/151) | Loss (178/299) | Loss (301/1561) |
| ExCNVSS_noRatio | |||
| Number of correctly detected instances (gain/loss) | 294 (105/189) | 332 (141/191) | 56 (18/38) |
| FNR/FPR (%) | 32.8/6.4 | 22.8/4.3 | 42.0/9.5 |
| Detected region size (bp) | Gain (120/151) | Gain (184/296) | Gain (305/8260) |
| (Min/max) | Loss (120/151) | Loss (178/299) | Loss (301/1561) |
| Excavator | |||
| Number of correctly detected instances (gain/loss) | 221 (100/121) | 202 (94/108) | 43 (26/17) |
| FNR/FPR (%) | 50.3/0.1 | 53.4/0.1 | 52.8/0.8 |
| Detected region size (bp) | Gain (120/151) | Gain (207/296) | Gain (305/871) |
| (Min/max) | Loss (120/151) | Loss (178/271) | Loss (359/603) |
| Contra | |||
| Number of correctly detected instances (gain/loss) | 247 (147/100) | 371 (182/189) | 44 (35/9) |
| FNR/FPR (%) | 42.7/0.2 | 13.1/0.0 | 51.7/0.0 |
| Detected region size (bp) | Gain (120/151) | Gain (184/296) | Gain (305/8260) |
| (Min/max) | Loss (120/151) | Loss (196/299) | Loss (303/1561) |
| ExomeCNV | |||
| Number of correctly detected instances (gain/loss) | 24 (24/0) | 69 (69/0) | 18 (17/1) |
| FNR/FPR (%) | 94.3/0.0 | 84.3/0.0 | 78.8/0.0 |
| Detected region size (bp) | Gain (120/151) | Gain (191/296) | Gain (305/8260) |
| (Min/max) | Loss (−/−) | Loss (−/−) | Loss (1561/1561) |
CNV detection performances using 10 real data sets. An artificial exome data set generated by Wessim was used as the control data set.
| sample ID | ExCNVSS | ExCNVSS_noRatio | Excavator | Contra | ExomeCNV | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Correctly detected | FNR/FPR | Correctly | FNR/FPR | Correctly detected instances (gain/loss) | FNR/FPR | Correctly detected instances (gain/loss) | FNR/FPR | Correctly detected instances | FNR/FPR | |
| NA12843 | 99 | 26.67/ | 63 | 53.33/ | 61 | 54.81/ | 3 | 97.78/ | 100 | 25.93/ |
| (51/48) | 12.39 | (27/36) | 14.99 | (54/7) | 4.43 | (0/3) | 0.05 | (54/46) | 43.35 | |
| NA12842 | 62 | 40.95/ | 39 | 62.86/ | 56 | 46.67/ | 0 | 98.10/ | 49 | 53.33/ |
| (50/12) | 13.09 | (25/14) | 17.03 | (56/0) | 6.04 | (0/2) | 0.43 | (49/0) | 66.74 | |
| NA12748 | 59 | 41.58/ | 36 | 64.36/ | 63 | 37.62/ | 1 | 99.01/ | 56 | 44.55/ |
| (50/9) | 13.21 | (24/12) | 14.47 | (54/9) | 5.46 | (0/1) | 0.04 | (56/0) | 55.38 | |
| NA12718 | 87 | 25.64/ | 76 | 35.04/ | 63 | 46.15/ | 7 | 94.02/ | 89 | 23.93/ |
| (49/38) | 12.27 | (31/45) | 14.08 | (54/9) | 6.10 | (0/7) | 0.26 | (54/35) | 88.65 | |
| NA12275 | 57 | 41.84/ | 43 | 56.12/ | 56 | 42.86/ | 1 | 98.98/ | 56 | 42.86/ |
| (49/8) | 11.53 | (33/10) | 13.74 | (56/0) | 6.09 | (0/1) | 0.27 | (54/2) | 91.48 | |
| NA12273 | 68 | 38.92/ | 40 | 64.60/ | 56 | 50.44/ | 4 | 96.46/ | 60 | 46.90/ |
| (54/14) | 10.45 | (27/13) | 15.12 | (56/0) | 6.10 | (0/4) | 0.11 | (49/11) | 98.61 | |
| NA12272 | 58 | 41.41/ | 42 | 57.58/ | 64 | 35.35/ | 2 | 97.98/ | 65 | 34.34/ |
| (48/10) | 11.31 | (31/11) | 13.86 | (57/7) | 6.08 | (0/2) | 0.23 | (54/11) | 97.54 | |
| NA11843 | 55 | 45.54/ | 37 | 63.37/ | 62 | 38.61/ | 1 | 99.01/ | 52 | 48.51/ |
| (48/7) | 11.92 | (26/11) | 15.86 | (54/8) | 5.49 | (0/1) | 0.05 | (52/0) | 47.60 | |
| NA10847 | 58 | 38.95/ | 37 | 61.05/ | 63 | 33.68/ | 3 | 96.84/ | 52 | 45.26/ |
| (55/3) | 11.15 | (36/1) | 16.24 | (63/0) | 6.06 | (1/2) | 0.5 | (52/0) | 92.17 | |
| NA06984 | 64 | 42.86/ | 46 | 58.93/ | 56 | 50.00/ | 4 | 96.43/ | 55 | 50.89/ |
| (51/13) | 12.31 | (32/14) | 14.36 | (56/0) | 5.76 | (0/4) | 0.03 | (53/2) | 65.75 | |
CNV detection performances using 10 real data sets. A different germline sample (NA19152) was used as a control data set.
| sample ID | ExCNVSS | ExCNVSS_noRatio | Excavator | Contra | ExomeCNV | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Correctly detected instances (gain/loss) | FNR/FPR | Correctly detected instances (gain/loss) | FNR/FPR | Correctly detected instances (gain/loss) | FNR/FPR | Correctly detected instances (gain/loss) | FNR/FPR | Correctly detected instances (gain/loss) | FNR/FPR | |
| NA12843 | 31 | 77.04/ | 63 | 53.33/ | 47 | 65.19/ | 11 | 91.85/ | 64 | 52.59/ |
| (3/28) | 12.04 | (27/36) | 14.99 | (0/47) | 6.20 | (0/11) | 0.36 | (17/47) | 80.09 | |
| NA12842 | 12 | 88.57/ | 39 | 62.86/ | 1 | 99.05/ | 3 | 97.14/ | 28 | 73.33/ |
| (3/9) | 11.43 | (25/14) | 17.03 | (0/1) | 4.93 | (0/3) | 0.32 | (17/11) | 56.75 | |
| NA12748 | 18 | 82.18/ | 36 | 64.36/ | 1 | 99.01/ | 1 | 99.01/ | 32 | 68.32/ |
| (4/14) | 11.25 | (24/12) | 14.47 | (0/1) | 6.00 | (0/1) | 0.26 | (15/17) | 70.67 | |
| NA12718 | 50 | 57.26/ | 76 | 35.04/ | 47 | 59.83/ | 17 | 85.47/ | 13 | 88.89/ |
| (6/44) | 8.37 | (31/45) | 14.08 | (0/47) | 2.42 | (0/17) | 0.34 | (0/13) | 3.50 | |
| NA12275 | 22 | 77.55/ | 43 | 56.12/ | 5 | 94.90/ | 2 | 97.96/ | 9 | 90.82/ |
| (9/13) | 12.13 | (33/10) | 13.74 | (0/5) | 2.70 | (1/1) | 0.31 | (0/9) | 1.85 | |
| NA12273 | 32 | 71.68/ | 40 | 64.60/ | 11 | 90.27/ | 17 | 84.96/ | 2 | 98.23/ |
| (0/32) | 11.68 | (27/13) | 15.12 | (0/11) | 2.95 | (0/17) | 0.38 | (0/2) | 2.60 | |
| NA12272 | 35 | 64.65/ | 42 | 57.58/ | 16 | 83.84/ | 1 | 98.99/ | 18 | 81.82/ |
| (6/29) | 12.58 | (31/11) | 13.86 | (5/11) | 2.36 | (0/1) | 0.18 | (5/13) | 1.43 | |
| NA11843 | 19 | 81.19/ | 37 | 63.37/ | 12 | 88.12/ | 1 | 99.01/ | 34 | 66.34/ |
| (10/9) | 9.74 | (26/11) | 15.86 | (0/12) | 6.62 | (0/1) | 0.20 | (19/15) | 78.30 | |
| NA10847 | 13 | 86.32/ | 37 | 61.05/ | 7 | 92.63/ | 7 | 92.63/ | 9 | 90.53/ |
| (10/3) | 9.58 | (36/1) | 16.24 | (7/0) | 2.56 | (4/3) | 0.35 | (7/2) | 2.03 | |
| NA06984 | 25 | 77.68/ | 46 | 58.93/ | 15 | 86.61/ | 13 | 88.39/ | 44 | 60.71/ |
| (2/23) | 11.75 | (32/14) | 14.36 | (0/15) | 5.86 | (0/13) | 0.22 | (15/29) | 72.78 | |