| Literature DB >> 26197066 |
Marcel Elie Nutsua1, Annegret Fischer1, Almut Nebel1, Sylvia Hofmann1, Stefan Schreiber1, Michael Krawczak2, Michael Nothnagel3.
Abstract
The analysis of structural variants, in particular of copy-number variations (CNVs), has proven valuable in unraveling the genetic basis of human diseases. Hence, a large number of algorithms have been developed for the detection of CNVs in SNP array signal intensity data. Using the European and African HapMap trio data, we undertook a comparative evaluation of six commonly used CNV detection software tools, namely Affymetrix Power Tools (APT), QuantiSNP, PennCNV, GLAD, R-gada and VEGA, and assessed their level of pair-wise prediction concordance. The tool-specific CNV prediction accuracy was assessed in silico by way of intra-familial validation. Software tools differed greatly in terms of the number and length of the CNVs predicted as well as the number of markers included in a CNV. All software tools predicted substantially more deletions than duplications. Intra-familial validation revealed consistently low levels of prediction accuracy as measured by the proportion of validated CNVs (34-60%). Moreover, up to 20% of apparent family-based validations were found to be due to chance alone. Software using Hidden Markov models (HMM) showed a trend to predict fewer CNVs than segmentation-based algorithms albeit with greater validity. PennCNV yielded the highest prediction accuracy (60.9%). Finally, the pairwise concordance of CNV prediction was found to vary widely with the software tools involved. We recommend HMM-based software, in particular PennCNV, rather than segmentation-based algorithms when validity is the primary concern of CNV detection. QuantiSNP may be used as an additional tool to detect sets of CNVs not detectable by the other tools. Our study also reemphasizes the need for laboratory-based validation, such as qPCR, of CNVs predicted in silico.Entities:
Mesh:
Year: 2015 PMID: 26197066 PMCID: PMC4510559 DOI: 10.1371/journal.pone.0133465
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1CNV prediction and family-based validation.
A: Number of CNVs predicted per sample. B: Proportion of CNVs per sample validated by parental information. Light grey: HMM-based algorithms; dark grey: segmentation algorithms.
Sample-specific features of predicted CNVs.
| Software | Total number | Median length [kb] | Median cumulated length [Mb] | Median number of Markers in CNV | Median inter-marker distance [kb] | DDR |
|---|---|---|---|---|---|---|
|
| 99.5 (87.0–115.2) | 8.9 (8.1–10.3) | 4.6 (3.7–5.7) | 10.2 (8.0–14.0) | 0.22 (0.19–0.26) | 4.3 (3.2–5.1) |
|
| 179.5 (139.5–210.0) | 7.1 (6.6–8.5) | 6.2 (4.4–8.3) | 6.0 (5.0–8.0) | 0.20 (0.17–0.26) | 2.8 (2.3–3.4) |
|
| 75.0 (63.2–84.5) | 21.7 (17.3–25.8) | 5.1 (4.0–6.5) | 25.0 (23.0–29.2) | 0.18 (0.14–0.21) | 5.5 (4.8–6.6) |
|
| 160.5 (137.8–184.5) | 9.0 (8.3–10.0) | 8.2 (5.7–23.2) | 6.0 (5.9–7.0) | 0.23 (0.20–0.26) | 3.1 (2.6–3.6) |
|
| 211.0 (177.8–236.5) | 8.0 (7.1–9.6) | 121.0 (18.9–281.4) | 7.0 (6.8–10.0) | 0.28 (0.23–0.33) | 4.4 (3.6–5.4) |
|
| 158.5 (137.8–185.2) | 7.0 (6.3–7.6) | 6.2 (4.7–7.9) | 7.0 (5.0–8.0) | 0.28 (0.23–0.31) | 3.6 (3.1–4.6) |
|
| ||||||
|
| 98.0 (87.0–111.5) | 9.7 (8.9–11.2) | 5.2 (4.1–6.6) | 10.8 (8.0–14.0) | 0.21 (0.19–0.24) | 4.3 (3.3–5.0) |
|
| 182.0 (150.8–210.0) | 7.4 (6.6–8.1) | 7.0 (5.0–8.7) | 7.0 (6.0–8.0) | 0.26 (0.22–0.30) | 3.6 (3.1–4.4) |
Given are the median and, in parentheses, the inter-quartile range per sample. DDR: Ratio of deletions to duplications.
Fig 2Median sample-specific CNV length.
Kernel-smoothed histogram of the median CNV length per sample. Outliers were excluded. Solid line: HMM-based algorithms; dotted lines: segmentation algorithms.
Sample-specific CNV validation rate.
| Software | Validated CNVs | Validated deletions [%] | Validated duplications [%] | DDR, confined to validated CNVs | Validated cumulative sequence [%] |
|---|---|---|---|---|---|
|
| 56.3 (50.8–61.3) | 55.7 (50.9–61.1) | 60.0 (48.0–67.2) | 0.9 (0.8–1.2) | 55.7 (42.2–66.9) |
|
| 46.0 (39.1–53.2) | 46.3 (35.9–52.5) | 54.8 (41.0–60.9) | 1.3 (1.0–1.5) | 45.1 (31.7–58.2) |
|
| 60.9 (53.9–69.5) | 64.4 (57.4–74.1) | 52.2 (44.3–59.0) | 1.2 (0.9–1.4) | 56.0 (43.4–65.1) |
|
| 50.5 (46.3–56.6) | 52.2 (47.2–58.2) | 46.4 (37.9–55.7) | 1.0 (0.8–1.2) | 53.8 (32.3–77.5) |
|
| 34.8 (20.4–41.6) | 34.6 (20.8–43.1) | 32.3 (22.3–44.0) | 0.8 (0.6–1.0) | 5.2 (1.4–16.4) |
|
| 41.1 (33.6–45.9) | 39.2 (31.1–45.4) | 47.3 (36.6–54.9) | 0.9 (0.7–1.1) | 36.0 (21.2–53.0) |
|
| |||||
|
| 55.9 (49.5–61.3) | 57.5 (50.6–60.6) | 51.9 (46.1–60.0) | 1.1 (0.9–1.3) | 55.3 (43.4–65.0) |
|
| 41.4 (35.1–47.6) | 40.5 (33.9–46.1) | 45.0 (36.0–53.5) | 0.9 (0.7–1.1) | 34.9 (19.9–47.2) |
Given are the median and, in parentheses, the inter-quartile range per sample. DDR: Ratio of deletions to duplications.
Sample-specific features of validated CNVs.
| Software | Total number | Median length [kb] | Median cumulated length [Mb] | Median number of Markers in CNV | Median inter-marker distance [kb] | DDR |
|---|---|---|---|---|---|---|
|
| 55.0 (48.8–60.0) | 10.3 (9.1–12.0) | 2.3 (1.9–3.1) | 16.0 (14.0–18.0) | 0.19 (0.15–0.21) | 3.7 (3.1–4.6) |
|
| 79.0 (62.0–94.2) | 8.7 (7.3–9.5) | 2.7 (1.8–3.5) | 10.0 (9.0–12.0) | 0.15 (0.11–0.16) | 3.6 (2.8–4.1) |
|
| 42.5 (37.8–51.0) | 21.2 (16.1–26.0) | 2.4 (2.0–3.3) | 26.5 (23.9–31.1) | 0.15 (0.12–0.19) | 6.2 (5.3–8.9) |
|
| 83.5 (69.5–95.0) | 9.7 (8.9–11.0) | 3.2 (2.2–11.0) | 9.0 (7.5–9.6) | 0.19 (0.13–0.21) | 3.0 (2.5–3.7) |
|
| 66.0 (37.0–87.0) | 8.9 (7.7–10.3) | 3.4 (1.4–5.1) | 12.0 (10.0–15.2) | 0.14 (0.12–0.19) | 3.6 (2.9–4.2) |
|
| 62.0 (48.5–74.0) | 8.2 (7.2–9.2) | 2.3 (1.4–3.6) | 11.0 (9.0–14.5) | 0.16 (0.12–0.20) | 3.0 (2.2–4.1) |
|
| ||||||
|
| 56.0 (49.5–61.0) | 10.7 (9.8–12.2) | 2.4 (2.0–3.2) | 16.0 (14.4–18.0) | 0.16 (0.13–0.20) | 3.9 (3.3–5.2) |
|
| 70.5 (57.0–79.2) | 8.6 (7.6–9.3) | 2.5 (1.6–3.5) | 11.0 (9.8–13.0) | 0.14 (0.12–0.18) | 3.2 (2.9–3.8) |
Given are the median and, in parentheses, the inter-quartile range per sample. DDR: Ratio of deletions to duplications,
Fig 3Median sample-specific number of validated CNVs.
Light grey: duplications; dark grey: deletions.
Family-based CNV validation by chance alone?
| Software | CEU + YRI combined [%] | CEU only [%] | YRI only [%] |
|---|---|---|---|
|
| 20.3 (20.1–20.7) | 29.2 (29.0–30.6) | 24.1 (23.9–24.5) |
|
| 16.5 (15.4–16.8) | 24.2 (23.9–24.6) | 19.9 (18.8–19.9) |
|
| 18.6 (18.3–18.9) | 23.5 (23.1–24.0) | 16.3 (16.0–17.2) |
|
| 16.8 (16.4–17.1) | 18.5 (18.1–18.9) | 15.7 (14.9–16.2) |
|
| 13.6 (13.0–13.9) | 16.8 (16.4–18.0) | 16.6 (15.9–17.0) |
|
| 13.7 (13.1–13.9) | 22.5 (22.3–23.2) | 14.3 (14.2–14.9) |
|
| |||
|
| 18.1 (17.7–18.4) | 22.9 (22.7–24.0) | 17.6 (16.9–18.2) |
|
| 14.8 (14.2–15.5) | 21.8 (21.4–22.9) | 16.5 (15.8–16.9) |
Given are the median and, in parentheses, the inter-quartile range.