| Literature DB >> 31729446 |
Whitney Whitford1,2, Klaus Lehnert3,4, Russell G Snell3,4, Jessie C Jacobsen3,4.
Abstract
The popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However, the additional source of information from read balance (defined as relative proportion of reads of each allele at each position) has been underutilised in the existing applications. Here we present Read Balance Validator (RBV), a bioinformatic tool that uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report. RBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV.Entities:
Year: 2019 PMID: 31729446 PMCID: PMC6858463 DOI: 10.1038/s41598-019-53181-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution of relative reads for diploid, haploid, and triploid regions in whole genome sequence. (A) Expected distribution of all positions in a diploid genome. (B) Expected distribution of all positions in a hemizygous genome. (C) Expected distribution of all positions in a triploid genome.
Figure 2RBV data analysis curves. (A) Read balance of the most common allele from heterozygous positions in a diploid genome. (B) Read balance of the most common allele from heterozygous positions in a triploid genome. (C) CDF curve utilised in a 2-sample KS test, comparing distribution of read balance between randomly generated heterozygous SNVs throughout the reference diploid genome: a 100 kb diploid region, and a 100 kb triplicated region.
RBV performance analysis for deletions for 25 Phase 3 1000 Genomes Project individuals with CNV calls and high coverage whole genome sequence.
| Size | Total | TP | FN | TN | FP | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| >10 kb | 3783 | 1459 | 2324 | 3703 | 80 | 0.3856727 | 0.978853 |
| >20 kb | 1914 | 1254 | 660 | 1846 | 68 | 0.6551724 | 0.964472 |
| >30 kb | 1326 | 1089 | 237 | 1271 | 55 | 0.821267 | 0.958522 |
| >50 kb | 738 | 643 | 95 | 702 | 36 | 0.8712737 | 0.95122 |
| >100 kb | 397 | 374 | 23 | 375 | 22 | 0.9420655 | 0.944584 |
| >150 kb | 169 | 162 | 7 | 161 | 8 | 0.9585799 | 0.952663 |
| >200 kb | 93 | 88 | 5 | 89 | 4 | 0.9462366 | 0.956989 |
| >300 kb | 55 | 54 | 1 | 52 | 3 | 0.9818182 | 0.945455 |
| >400 kb | 31 | 31 | 0 | 30 | 1 | 1 | 0.967742 |
| >500 kb | 20 | 20 | 0 | 20 | 0 | 1 | 1 |
| >1 Mb | 10 | 10 | 0 | 10 | 0 | 1 | 1 |
| All | 23851 | 1459 | 22392 | 23771 | 80 | 0.0611714 | 0.996646 |
SNV: single nucleotide variant, TP: true positive, FN: false negative, TN: true negative, FP: false positive.
RBV performance analysis for duplications for 25 Phase 3 1000 Genomes Project individuals with CNV calls and high coverage whole genome sequence.
| Number of heterozygous SNVs | Total | TP | FN | TN | FP | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| 1–2 | 703 | 126 | 577 | 652 | 51 | 0.179232 | 0.927453 |
| 3–9 | 714 | 434 | 280 | 627 | 87 | 0.607843 | 0.878151 |
| 10–19 | 452 | 341 | 111 | 393 | 59 | 0.754425 | 0.869469 |
| 20–49 | 784 | 640 | 144 | 665 | 119 | 0.816327 | 0.848214 |
| 50–99 | 643 | 581 | 62 | 527 | 116 | 0.903577 | 0.819595 |
| 100–199 | 695 | 639 | 56 | 551 | 144 | 0.919424 | 0.792805 |
| 200–499 | 489 | 460 | 29 | 346 | 143 | 0.940695 | 0.707566 |
| 500+ | 82 | 69 | 13 | 45 | 37 | 0.841463 | 0.548780 |
| All | 7940 | 3290 | 4650 | 3806 | 4134 | 0.414358 | 0.479345 |
SNV: single nucleotide variant, TP: true positive, FN: false negative, TN: true negative, FP: false positive.
Figure 3Ability of RBV to prioritise authentic CNVs. Comparision between the results from 31,791 CNV from 25 Phase 3 1000 Genomes Project individuals[39] and randomly generated diploid regions with the same number of callalble positions as each deletion, or number of heterozygous positions for each duplication. (A) Performance of RBV for 23,851 deletions. (B) Performance of RBV for 7,940 duplications.