| Literature DB >> 27770783 |
German Demidov1,2,3,4, Tamara Simakova1, Julia Vnuchkova1, Anton Bragin5.
Abstract
BACKGROUND: Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool.Entities:
Keywords: Germline CNV; MPS; Machine learning; Multiplex PCR; Targeted amplification
Mesh:
Year: 2016 PMID: 27770783 PMCID: PMC5075217 DOI: 10.1186/s12859-016-1272-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pipeline scheme
Fig. 2(Unsupervised algorithm idea). Linear model of amplicons coverage. Axes: logarithms of coverages of two different amplicons for one run (48 samples). Dots: wild type amplicons, triangles: heterozygous deletions. Dash lines: OLS regression for wild type amplicons and 0.99 prediction interval. Dot line: the regression line for heterozygous deletion
Fig. 3(Supervised algorithm idea). Top: duplication of exons 7–11 in CFTR gene (amplicons within [12;19]), bottom: deletion of exons 2–9, CFTR (amplicons within [2;17]). y-axis: distance from 0 in terms of standard deviations, x-axis: amplicons from CFTR gene. Horizontal lines ±3 standard deviations, vertical lines shows the exons’ structure. Top line (points as squares): distance to M , middle line (points as circles): distance to M , bottom line (points as triangles): distance to M . Distances inside each region are combined into one NED per model
Fig. 4Barplots of CNVs’ lengths detected by two algorithms. x-axis: lengths of CNVs, in amplicons (after QC), y-axis: CNVs’ frequencies. Top: unsupervised algorithm (number of True Negatives: 805), bottom: supervised algorithm (number of True Negatives: 790)