| Literature DB >> 25812131 |
Marc-André Legault1, Simon Girard2, Louis-Philippe Lemieux Perreault3, Guy A Rouleau4, Marie-Pierre Dubé3.
Abstract
BACKGROUND: The advent of high throughput sequencing methods breeds an important amount of technical challenges. Among those is the one raised by the discovery of copy-number variations (CNVs) using whole-genome sequencing data. CNVs are genomic structural variations defined as a variation in the number of copies of a large genomic fragment, usually more than one kilobase. Here, we aim to compare different CNV calling methods in order to assess their ability to consistently identify CNVs by comparison of the calls in 9 quartets of identical twin pairs. The use of monozygotic twins provides a means of estimating the error rate of each algorithm by observing CNVs that are inconsistently called when considering the rules of Mendelian inheritance and the assumption of an identical genome between twins. The similarity between the calls from the different tools and the advantage of combining call sets were also considered.Entities:
Mesh:
Year: 2015 PMID: 25812131 PMCID: PMC4374778 DOI: 10.1371/journal.pone.0122287
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the selected CNV calling algorithms.
| Algorithm | Category | Summary characteristics | Reference |
|---|---|---|---|
| CNVer | Hybrid (PEM + RD) | Builds a donor graph integrating PEM and RD information. Requires specific mapping with Bowtie. | [ |
| BreakDancer | PEM | Detects CNVs, inversions and translocations. Provides Phred-like score. Software for small indels (10–100 bp) also available. | [ |
| CNVnator | RD | Uses a mean-shift technique to partition genomic bins in segments of different copy number. | [ |
| ERDS | Hybrid (PEM + RD + SC | Uses a HMM to combine SNV heterozygosity and RD information. Supports calls with PEM and SC signatures. Requires a Variant Call Format file and high coverage (> 20X is recommended) | [ |
*Soft-clipping signatures are analogous to split-reads signatures
PEM stands for Paired-End Mapping, RD for Read Depth and SC for Soft Clipping.
Characterization of the CNV calls for the different tools.
| Mean Number of CNVs | Mean size (kilobase) | Mean distance between adjacent CNVs (kb) | Genome coverage | Inherited CNV rate | Familial classification F1 Score | |
|---|---|---|---|---|---|---|
| CNVer | 1751 (72) | 38 (2) | 1259 (77) | 0.019 | 0.23 (0.02) | 0.91 |
| Breakdancer | 4903 (299) | 4 (5) | 568 (43) | 0.007 | 0.43 (0.04) | 0.84 |
| CNVnator | 1231 (39) | 190 (6) | 2537 (93) | 0.102 | 0.71 (0.01) | 1 |
| ERDS | 2292 (106) | 15 (0.8) | 1234 (63) | 0.013 | 0.74 (0.02) | 1 |
The mean for all samples () is presented for the number, size, distance between CNVs and genome coverage. The standard deviation is given in parenthesis. For the genome coverage, all the standard deviations were smaller than 1%. For the inherited CNV rate the mean and standard deviation is across families.
Fig 1Venn diagrams representing the agreement between different CNV calling tools.
The first diagram (A.) represents the mean number of CNVs shared between tools prior to any filtering based on familial relationship. B. is the mean number of CNVs shared between the tools after filtering for Mendelian inheritance (i.e. CNVs that are in both twins and at least one parent). C. is the ratio of lost CNVs when filtering for Mendelian inheritance (). DGV is the Database of Genomic Variants.
Fig 2Rate of inherited CNVs when considering sets resulting from the intersection or union of pairs of tools.
The rate of inherited CNVs is defined as the number of regions that were detected in both twins and at least one parent, divided by the total number of distinct regions in the twins. The union operation takes all the CNVs in either of the tools from a given pair and the intersection represents only the CNVs that were found by both tools. The mean number of CNVs in each set is provided on top of the corresponding bars.
Fig 3Rate of inherited CNVs as a function of the reciprocal overlap threshold used to declare copy-number variable regions identical.
The slope, representing the variation in the inherited CNV rate when the reciprocal overlap threshold varies, is given in parenthesis.