| Literature DB >> 30326846 |
Yeeok Kang1,2, Seong-Hyeuk Nam1, Kyung Sun Park1, Yoonjung Kim3, Jong-Won Kim4, Eunjung Lee5, Jung Min Ko6, Kyung-A Lee7, Inho Park8.
Abstract
BACKGROUND: Targeted next-generation sequencing (NGS) is increasingly being adopted in clinical laboratories for genomic diagnostic tests.Entities:
Keywords: Copy-number variation; Exon-level; Germ-line; Targeted sequencing; Visualization
Mesh:
Year: 2018 PMID: 30326846 PMCID: PMC6192323 DOI: 10.1186/s12859-018-2409-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of the dataset used for retrospective and clinical analyses
| Gene panel name | Capture method | Number of target genes | Probes (or amplicons) | Probe coverage size | Average number of probes per exon | Clinical use | Number of samples |
|---|---|---|---|---|---|---|---|
| IMD_HYB | Hybridization | 259 | 19210 | 982,657 bps | 5.7 | Newborn screening | 30a (cell line) |
| IMD_PCR | PCR | 259 | 9072 | 1,216,913 bps | 2.7 | Newborn screening | 14 (cell line) |
| IMD_V1 | PCR | 97 | 2054 | 338,961 bps | 1.8 | Newborn screening | 178 (clinical) |
IMD inherited metabolism disorder, HYB hybridization-based capture approach, PCR polymerase chain reaction-based capture approach, bps base pairs
a27 unique cell line. Total 30 samples were sequenced because two cell lines were generated 3 times respectively
Summary of cell lines and clinical cohorts
| Panels | IMD_HYB | IMD_PCR | IMD_V1 | ||
| Batches | 3 | 2 | Unknown | ||
| Samples | 30a (cell line) | 36 (clinical) | 14 (cell line) | 20 (clinical) | 178 (clinical) |
| Average depth of coverage | 174X | 345X | 301X | 349X | 87X |
| Samples passing QC | 24 | 35 | 14 | 19 | 172 |
| Failure rate | 20% | 2.8% | 0% | 5% | 3.4% |
| Median number of raw duplications | 52.5 | 8 | 35.5 | 29 | 22.5 |
| Median number of raw deletions | 22.5 | 3 | 37 | 23 | 9 |
| Median number of raw CNVs | 82 | 13 | 85.5 | 67 | 34.5 |
| Median number of 5-scoreb duplications | 4.5 | 1 | 12 | 5 | 6 |
| Median number of 5-score deletions | 2 | 0 | 5.5 | 2 | 1 |
| Median number of 5-score CNVs | 6.5 | 1 | 24.5 | 7 | 7.5 |
QC quality control, CNV copy number variation, IMD inherited metabolism disorder, HYB hybridization-based capture approach, PCR polymerase chain reaction-based capture approach
a27 unique cell line. Total 30 samples were sequenced because two cell lines were generated 3 times respectively
bHigh-confidence CNVs received the highest score of 5
Description of the measures used in the DeviCNV scoring system
| Abbreviation | Description | Calculation method | Default parameter setting |
|---|---|---|---|
| ProbeCntInRegion | How many signals support the CNV candidate? | Counting read depth ratio signals for a CNV candidate | 1 point for ≥2 |
| AverageOfReadDepthRatios | How strong is the signal supporting the CNV candidate? | Calculating an average log2-transformed median predicted probe-level read depth ratio values for a CNV candidate | If deletion, 1 point for < log2(0.6); |
| STDOfReadDepthRatios | How stable are the signals supporting the CNV candidate? | Calculating a standard deviation for the log2-transformed median predicted probe-level read depth ratio values for a CNV candidate | 1 point for < 0.4 |
| AverageOfCIs | How small are the confidence intervals for the signals supporting the CNV candidate? | Calculating average log2-transformed 95% confidence interval lengths for predicted probe-level read-depth ratios for a CNV candidate | 1 point for < 0.4 |
| AverageOfR2vals | How reliable is the model that generated the signals that support the CNV candidate? | Calculating average mean R-squared values per probe for a CNV candidate, with the average R-squared value per probe referring to an average of the R-squared values of N models for one probe | 1 point for ≥0.85 |
CNV copy number variant, CI confidence interval
Comparison of the performances of DeviCNV and previous tools using cell lines with known CNVs
| Sample | Known CNV | DeviCNV | VisCap | XHMM | CODEX | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Panel | Cell line | Median read depth | Gene | NM | CNV | CNV size (kb) | Find?a | #CNVb | Find? | #CNV | Find? | #CNV | Find? | #CNV |
| IMD_HYB | GM14603 | 81.99 |
| NM_000152 | EX18 DEL | 0.16 | O | 24 | X | 7 | X | 0 | X | 56 |
| GM14734 | 249.4 |
| NM_000500 | 30 KB DEL, Entire gene DEL | 3.35 | O | 2 | O | 37 | O | 1 | X | 2 | |
| GM24007 | 142.84 |
| NM_000531 | Entire gene DEL | 68.97 | O | 7 | O | 14 | O | 3 | O | 46 | |
| NA01741 | 164.4 |
| NM_000155 | Entire gene DEL | 4.01 | O | 6 | O | 40 | O | 1 | O | 37 | |
| NA06804 | 261.98 |
| NM_000194 | EX2–3 DUP | 2.01 | O | 34 | O | 43 | X | 2 | O | 62 | |
| NA06805 | 80.13 |
| NM_000153 | EX11–17 DEL | 17.73 | O | 44 | O | 8 | O | 1 | O | 86 | |
| NA12217 | 269.08 |
| NM_000500 | 30 KB DEL | 1.14 | X | 1 | X | 7 | X | 3 | X | 11 | |
| NA22208 | 199.64 |
| NM_000282 | EX13–20 DEL | 146.38 | O | 3 | O | 17 | O | 2 | O | 15 | |
| IMD_PCR | NA01741 | Pool 1: 408.0, Pool 2: 556.0, Pool 3: 271.0 |
| NM_000155 | Entire gene DEL | 4.01 | O | 10 | O | 9 | X | 0 | O | 1 |
| NA12217 | Pool 1: 192.0, Pool 2: 117.0, Pool 3: 99.0 |
| NM_000500 | 30 KB DEL | 1.14 | X | 37 | X | 22 | X | 8 | X | 71 | |
| GM14603 | Pool 1: 215.0, Pool 2: 141.0, Pool 3: 90.0 |
| NM_000152 | EX18 DEL | 0.16 | O | 25 | X | 32 | O | 6 | O | 40 | |
| NA14734 | Pool 1: 359.0, Pool 2: 275.0, Pool 3: 335.0 |
| NM_000500 | 30 KB DEL, Entire gene DEL | 3.35 | O | 9 | O | 12 | O | 4 | X | 12 | |
| NA22208 | Pool 1: 235.0, Pool 2: 99.0, Pool 3: 158.0 |
| NM_000282 | EX13–20 DEL | 146.38 | O | 27 | X | 13 | O | 4 | O | 12 | |
| GM24007 | Pool 1: 37.0, Pool 2: 20.0, Pool 3: 16.0 |
| NM_000531 | Entire gene DEL | 68.97 | X | 1 | X | 23 | X | 0 | X | 0 | |
CNV copy number variation, IMD inherited metabolism disorder; HYB hybridization-based capture approach, PCR polymerase chain reaction-based capture approach, EX exon, DEL deletion, DUP duplication
aIndicates whether a known CNV was found using each tool. “O” means all CNVs were found, and “X” means they were not found at all
bindicates the number of CNV candidates found in the corresponding sample. For DeviCNV, the number of CNV candidates that received the highest score of 5 is indicated
Comparison of the performances of DeviCNV and previous tools using 16 CNVs confirmed by qPCR
| Sample | qPCR confirmed CNV | DeviCNV | VisCap | XHMM | CODEX | ||||
|---|---|---|---|---|---|---|---|---|---|
| Sample | Median read depth | Gene | NM | CNV | CNV size (kb) | ||||
| GM17433 | 82.13 |
| NM_001876 | EX10 DUP | 0.20 | Oa | X | X | X |
|
| NM_000733 | EX4 DEL | 0.01 | O | X | X | O | ||
|
| NM_001482 | EX9 DUP | 1.10 | O | X | X | X | ||
| GM24007 | 142.84 |
| NM_002838 | EX16–17 | 0.83 | O | X | X | O |
|
| NM_018368 | EX12 DUP | 0.10 | O | X | X | X | ||
|
| NM_019844 | EX4 DUP | 0.14 | O | X | X | O | ||
|
| NM_000277 | EX5 DEL | 0.07 | O | O | X | X | ||
|
| NM_000475 | EX1 DEL | 1.18 | O | O | X | O | ||
| NA00852 | 204.09 |
| NM_000517 | EX2–3 DEL | 0.59 | O | X | O | X |
| NA01741 | 164.4 |
| NM_003235 | EX20 DUP | 0.22 | O | X | X | X |
|
| NM_003235 | EX 21 DUP | 0.15 | O | X | X | X | ||
| NA02227 | 278.98 |
| NM_000500 | EX10 DUP | 0.80 | O | X | X | X |
| NA02659 | 608.46 |
| NM_000517 | EX3 DEL | 0.24 | O | X | O | X |
| NA12217 | 269.08 |
| NM_001005741 | EX12–11 DUP | 0.86 | O | X | X | X |
| NA22496 | 137.24 |
| NM_000181 | EX11 DUP | 0.14 | O | X | X | X |
|
| NM_000151 | EX2 DUP | 0.11 | O | X | X | O | ||
CNV copy number variation, EX exon, DEL deletion, DUP duplication
aIndicates whether a known CNV was found using each tool. “O” means all CNVs were found, and “X” means they were not found at all
Candidate pathogenic CNVs detected by clinical sample analysis using DeviCNV
| Sample | CNV candidates after scoringa | Selected pathogenic CNVsc | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Panel | Sample | Median | Raw CNVb | Score 5 | Score 4 | Score 3 | Score 2 | Score 1 | Score 0 | Gene | NM | CNV | CNV size (kb) | Confirmed by qPCR |
| IMD_HYB | Case_01 | 273.3 | 49 | 2 | 22 | 20 | 5 | 0 | 0 |
| NM_000018 | EX2 DEL (Score 4) | 0.08 | Failed |
| Case_02 | 341.4 | 12 | 7 | 3 | 2 | 0 | 0 | 0 |
| NM_000048 | EX15 DEL (Score 5) | 0.08 | Confirmed | |
| Case_03 | 276.8 | 25 | 5 | 18 | 2 | 0 | 0 | 0 |
| NM_021957 | EX6–11 DEL (Score 5) | 5.15 | Partially confirmed (EX6–7, 10–11) | |
| IMD_PCR | Case_04 | Pool 1: 174.0 Pool 2: 203.0 Pool 3: 185.0 | 82 | 26 | 46 | 9 | 1 | 0 | 0 |
| NM_004453 | EX1–7 DEL (Score 5) | 23.51 | Confirmed |
| Case_05 | Pool 1: 228.0 Pool 2: 330.0 Pool 3: 185.0 | 145 | 63 | 74 | 8 | 0 | 0 | 0 |
| NM_004453 | EX7–8 DEL (Score 5) | 2.20 | Confirmed | |
| IMD_V1 | Case_06 | Pool 1: 69.0 Pool 2: 56.0 | 106 | 37 | 40 | 26 | 3 | 0 | 0 |
| NM_000531 | EX2 DEL (Score 5) | 0.14 | Confirmed |
| Case_07 | Pool 1: 52.0 Pool 2: 51.0 | 65 | 23 | 23 | 14 | 5 | 0 | 0 |
| NM_000531 | Entire gene DEL (Score 5) | 68.38 | Confirmed | |
CNV copy number variation, IMD inherited metabolism disorder; HYB hybridization-based capture approach, PCR polymerase chain reaction-based capture approach, EX exon, DEL deletion, DUP duplication, qPCR quantitative polymerase chain reaction
aIndicates the number of CNV candidates for each score
bindicates the number of all CNV candidates before scoring
cindicates the selected pathogenic CNVs identified in the clinical sample by one expert. The number in parentheses indicates the score of the selected CNV
Fig. 1Gene-centric view plots for four selected clinical cases. Panels A–D contain four examples of gene-centric view plots for the pathogenic CNVs detected in clinical samples shown in Table 6. a A single exon deletion within ASL, b a multi-exon deletion within GYS2 using the inherited metabolic disorder panel and hybridization capture approach, c a multi-exon deletion within ETFDH using the inherited metabolic disorder panel and polymerase chain reaction-based capture approach, and d an entire gene deletion within OTC using the previous version of the inherited metabolic disorder panel and polymerase chain reaction-based capture approach
Fig. 2DeviCNV workflow. Analysis-ready BAM files were used for DeviCNV input. After read-depth normalization for chromosome X, DeviCNV filters low-quality samples from the input dataset. Then, DeviCNV builds N (1,000 by default) linear regressions per probe (or amplicon) to predict a read-depth ratio and confidence interval per probe for each sample. By combining signals of probe-level read-depth ratios, DeviCNV calls raw CNV candidates and evaluates them using a new scoring system. Finally, DeviCNV provides a CNV candidate list and visualization plots for each sample and gene
Fig. 3Example of DeviCNV plots. Predicted read-depth ratios (observed read depth/predicted read depth) of probes on a panel plotted on a log2 scale for each sample: a the whole-genome view plot depicts all probes on a panel, and b the gene-centric view plot depicts the probes within a gene. Each point represents the read-depth ratio for each probe, and its shape indicates the pool or an assessment of faulty or low-quality types that are classified when building the linear regression models. The color of each point shows the p-value for duplications and deletions (the thresholds are set at 1.3 and 0.7, thin black dotted lines). The whiskers represent the 95% confidence interval for the read-depth ratio. This is an example of a multi-exon deletion within CYP21A2 found in a cell line using the inherited metabolic disorder panel and the hybridization capture approach