| Literature DB >> 21209939 |
Debby W Tsuang1, Steven P Millard, Benjamin Ely, Peter Chi, Kenneth Wang, Wendy H Raskind, Sulgi Kim, Zoran Brkanac, Chang-En Yu.
Abstract
BACKGROUND: The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery. METHODOLOGY AND PRINCIPALEntities:
Mesh:
Year: 2010 PMID: 21209939 PMCID: PMC3012691 DOI: 10.1371/journal.pone.0014456
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Illustration of how CNVs are merged based on the “any” overlap criterion (A) and the “40% either” overlap criterion (B).
(A) CNVs that have any overlap are merged. The start position of the resulting CNV is defined to be the minimum base pair position of the overlapping CNVs, and the end position is defined to be the maximum base pair position of the overlapping CNVs. (B) CNVs are merged only if the length of overlap is at least 40% of the size of at least one of the CNVs. The start position of the resulting CNV is defined to be the minimum base pair position of the overlapping CNVs, and the end position is defined to be the maximum base pair position of the overlapping CNVs.
Number of CNVs in Normal and SCZ databases based on the literature, by overlap algorithm.
| Overlap Algorithm | Merged CNVs in Normals | CNVs in SCZ from Lit, Not Previously Discovered in Normals | Merged Version of CNVs in SCZ from Lit, Not Previously Discovered in Normals | |||||||||
| Loss | Gain | Both | Total | Loss | Gain | Both | Total | Loss | Gain | Both | Total | |
|
| 4,007 | 1,273 | 1,455 | 6,735 | 135 | 344 | 479 | 119 | 293 | 6 | 418 | |
|
| 4,293 | 1,426 | 1,506 | 7,225 | 167 | 414 | 581 | 148 | 357 | 6 | 511 | |
“Any” overlap means the CNVs share at least one base pair. “40% either” overlap means that the length of the overlap has to be at least 40% of the size of at least one of the CNVs.
Normals database contains 29,292 CNVs (9,538 gains; 18,983 losses; 771 both) before any kind of internal merging based on overlap is performed.
SCZ database contains 3,581 CNVs before omitting any CNVs that overlap with the CNVs in Normals database.
Number of CNVs detected in 96 subjects by each algorithm.
| All CNVs | CNVs <100 kb | CNVs ≥100 kb | |||||||
| Algorithm | Loss | Gain | Total | Loss | Gain | Total | Loss | Gain | Total |
| PennCNV | 2,531 | 1,234 | 3,765 | 2,280 | 966 | 3,246 | 251 | 268 | 519 |
| HMMSeg | 664 | 302 | 966 | 584 | 27 | 611 | 80 | 275 | 355 |
| cnvPartition with 3 Probes | 590 | 103 | 693 | 432 | 28 | 460 | 158 | 75 | 233 |
| cnvPartition with 5 Probes | 427 | 87 | 514 | 289 | 12 | 301 | 138 | 75 | 213 |
| cnvPartition with 10 Probes | 175 | 75 | 250 | 93 | 4 | 97 | 82 | 71 | 153 |
| QuantiSNP | 159 | 81 | 240 | 117 | 21 | 138 | 42 | 60 | 102 |
| HMMSeg & cnvPartition with 3 Probes | 262 | 37 | 299 | 215 | 2 | 217 | 47 | 35 | 82 |
| HMMSeg & cnvPartition with 5 Probes | 172 | 37 | 209 | 129 | 2 | 131 | 43 | 35 | 78 |
| HMMSeg & cnvPartition with 10 Probes | 71 | 34 | 105 | 35 | 1 | 36 | 36 | 33 | 69 |
| HMMSeg & cnvPartition with 3 Probes & PennCNV & QuantiSNP | 87 | 15 | 102 | 56 | 2 | 58 | 31 | 13 | 44 |
Default settings, then CNVs <10 bp omitted.
HMMSeg using Cooper et al. [13] implementation.
Default settings, except minimum number of probes required to identify that a CNV was varied.
Default settings, then CNVs with Log Bayes Factor <30 omitted.
Only CNVs identified by both HMMSeg and cnvPartition that overlap are included.
Only CNVs identified by HMMSeg, cnvPartition, PennCNV, and QuantiSNP that overlap are included.
Size of CNVs (kb) detected in 96 subjects by each algorithm.
| All CNVs | CNVs <100 kb | CNVs ≥100 kb | ||||||||||
| Algorithm | Mean | SD | Min | Max | Mean | SD | Min | Max | Mean | SD | Min | Max |
| PennCNV | 46 | 105 | 0.003 | 1,623 | 17 | 25 | 0.003 | 100 | 226 | 195 | 100 | 1,623 |
| HMMSeg | 126 | 215 | 1 | 1,751 | 16 | 25 | 1 | 100 | 316 | 260 | 100 | 1,751 |
| cnvPartition with 3 Probes | 345 | 998 | 0.1 | 10,283 | 31 | 33 | 0.1 | 99 | 966 | 1,544 | 102 | 10,283 |
| cnvPartition with 5 Probes | 443 | 1,138 | 1 | 10,283 | 40 | 35 | 1 | 99 | 1,013 | 1,605 | 102 | 10,283 |
| cnvPartition with 10 Probes | 787 | 1,550 | 8 | 10,283 | 30 | 17 | 8 | 94 | 1,266 | 1,827 | 103 | 10,283 |
| QuantiSNP | 410 | 849 | 1 | 4,733 | 39 | 36 | 1 | 99 | 911 | 1,123 | 100 | 4,733 |
| HMMSeg & cnvPartition with 3 Probes | 247 | 801 | 1 | 10,283 | 28 | 32 | 1 | 99 | 827 | 1,375 | 103 | 10,283 |
| HMMSeg & cnvPartition with 5 Probes | 344 | 942 | 2 | 10,283 | 38 | 36 | 2 | 99 | 857 | 1,403 | 103 | 10,283 |
| HMMSeg & cnvPartition with 10 Probes | 607 | 1,270 | 8 | 10,283 | 32 | 10 | 8 | 52 | 907 | 1,484 | 121 | 10,283 |
| HMMSeg & cnvPartition with 3 Probes & PennCNV & QuantiSNP | 345 | 1,129 | 2 | 10,283 | 45 | 43 | 2 | 99 | 740 | 1,646 | 110 | 10,283 |
See Table 2 for explanation of algorithms.
Number of CNVs per person detected in 96 subjects by each algorithm.
| All CNVs | CNVs <100 kb | CNVs ≥100 kb | ||||||||||
| Algorithm | Mean | SD | Min | Max | Mean | SD | Min | Max | Mean | SD | Min | Max |
| PennCNV | 39.2 | 11.2 | 14 | 78 | 33.8 | 9.0 | 14 | 60 | 5.4 | 4.8 | 0 | 30 |
| HMMSeg | 10.1 | 3.4 | 3 | 26 | 6.4 | 1.9 | 3 | 12 | 3.7 | 2.7 | 0 | 17 |
| cnvPartition-with 3 Probes | 7.2 | 3.6 | 1 | 23 | 4.8 | 2.3 | 1 | 12 | 2.4 | 2.7 | 0 | 20 |
| cnvPartition with 5 Probes | 5.4 | 3.1 | 1 | 23 | 3.1 | 1.7 | 0 | 7 | 2.2 | 2.6 | 0 | 20 |
| cnvPartition with 10 Probes | 2.6 | 2.6 | 0 | 21 | 1.0 | 0.9 | 0 | 3 | 1.6 | 2.3 | 0 | 19 |
| QuantiSNP | 2.5 | 2.7 | 0 | 20 | 1.4 | 1.2 | 0 | 5 | 1.1 | 2.5 | 0 | 19 |
| HMMSeg & cnvPartition-with 3 Probes | 3.1 | 1.7 | 0 | 8 | 2.3 | 1.4 | 0 | 7 | 0.9 | 1.1 | 0 | 5 |
| HMMSeg & cnvPartition with 5 Probes | 2.2 | 1.5 | 0 | 6 | 1.4 | 1.1 | 0 | 5 | 0.8 | 1.1 | 0 | 5 |
| HMMSeg & cnvPartition with 10 Probes | 1.1 | 1.2 | 0 | 5 | 0.4 | 0.5 | 0 | 2 | 0.7 | 1.0 | 0 | 5 |
| HMMSeg & cnvPartition with 3 Probes & PennCNV & QuantiSNP | 1.1 | 1.1 | 0 | 5 | 0.6 | 0.8 | 0 | 3 | 0.5 | 0.8 | 0 | 4 |
See Table 2 for explanation of algorithms.
4 control and 5 schizophrenia subjects with no identified CNVs.
6 control and 6 schizophrenia subjects with no identified CNVs.
1 control and 2 schizophrenia subjects with no identified CNVs.
5 control and 5 schizophrenia subjects with no identified CNVs.
15 control and 19 schizophrenia subjects with no identified CNVs.
19 control and 17 schizophrenia subjects with no identified CNVs.
Figure 2Process Flow Chart based on overlap between HMMSeg and cnvPartition (3-probe minimum).
Numbers are based on the “any” overlap criterion. All data on sex chromosomes have been omitted. Gains are compared only to gains or both, and losses are compared only to losses or both.
“Newly discovered” CNVs detected by each algorithm based on 48 normal and 48 schizophrenia subjects, using the “any” overlap criterion.
| Algorithm | CNVs Not Previously Discovered in Normals | CNVs Found Only in SCZ, Not Previously Discovered in Normals | CNVs Found Only in SCZ, Not Previously Discovered in Normals or SCZ | |||||||||
| Loss | Gain | Both | Total | Loss | Gain | Both | Total | Loss | Gain | Both | Total | |
| PennCNV | 72 | 84 | 33 | 189 | 24 | 31 | 2 | 57 | 24 | 29 | 2 | 55 |
| HMMSeg | 4 | 6 | 10 | 2 | 2 | 2 | 2 | |||||
| cnvPartition with 3 Probes | 12 | 2 | 14 | 3 | 3 | 3 | 3 | |||||
| cnvPartition with 5 Probes | 10 | 1 | 11 | 4 | 4 | 4 | 4 | |||||
| cnvPartition with 10 Probes | 2 | 2 | 1 | 1 | 1 | 1 | ||||||
| QuantiSNP | 1 | 1 | 0 | 0 | ||||||||
| HMMSeg & cnvPartition with 3 Probes | 1 | 1 | 0 | 0 | ||||||||
| HMMSeg & cnvPartition with 5 Probes | 1 | 1 | 0 | 0 | ||||||||
| HMMSeg & cnvPartition with 10 Probes | 0 | 0 | 0 | |||||||||
| HMMSeg & cnvPartition with 3 Probes & PennCNV & QuantiSNP | 1 | 0 | 0 | |||||||||
See Table 2 for explanation of algorithms.
“Newly discovered” CNVs detected by each algorithm based on 48 normal and 48 schizophrenia subjects, using the “40% either” overlap criterion.
| Algorithm | CNVs Not Previously Discovered in Normals | CNVs Found Only in SCZ, Not Previously Discovered in Normals | CNVs Found Only in SCZ, Not Previously Discovered in Normals or SCZ | |||||||||
| Loss | Gain | Both | Total | Loss | Gain | Both | Total | Loss | Gain | Both | Total | |
| PennCNV | 83 | 98 | 31 | 212 | 30 | 34 | 2 | 66 | 30 | 28 | 2 | 60 |
| HMMSeg | 6 | 10 | 16 | 1 | 4 | 5 | 1 | 3 | 4 | |||
| cnvPartition with 3 Probes | 14 | 4 | 18 | 3 | 3 | 3 | 3 | |||||
| cnvPartition with 5 Probes | 10 | 3 | 13 | 4 | 4 | 4 | 4 | |||||
| cnvPartition with 10 Probes | 2 | 1 | 3 | 1 | 1 | 1 | 1 | |||||
| QuantiSNP | 1 | 1 | 0 | 0 | ||||||||
| HMMSeg & cnvPartition with 3 Probes | 1 | 2 | 3 | 1 | 1 | 0 | ||||||
| HMMSeg & cnvPartition with 5 Probes | 2 | 2 | 0 | 0 | ||||||||
| HMMSeg & cnvPartition with 10 Probes | 1 | 1 | 0 | 0 | ||||||||
| HMMSeg & cnvPartition with 3 Probes & PennCNV & QuantiSNP | 1 | 1 | 0 | 0 | ||||||||
See Table 2 for explanation of algorithms.