| Literature DB >> 35216262 |
Veronika Gordeeva1,2,3, Elena Sharova2, Georgij Arapidi2,3,4.
Abstract
Copy number variations (CNVs) are the predominant class of structural genomic variations involved in the processes of evolutionary adaptation, genomic disorders, and disease progression. Compared with single-nucleotide variants, there have been challenges associated with the detection of CNVs owing to their diverse sizes. However, the field has seen significant progress in the past 20-30 years. This has been made possible due to the rapid development of molecular diagnostic methods which ensure a more detailed view of the genome structure, further complemented by recent advances in computational methods. Here, we review the major approaches that have been used to routinely detect CNVs, ranging from cytogenetics to the latest sequencing technologies, and then cover their specific features.Entities:
Keywords: chromosome microarray analysis; copy number variation; karyotyping; long-read and short-read sequencing
Mesh:
Year: 2022 PMID: 35216262 PMCID: PMC8879278 DOI: 10.3390/ijms23042143
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Cytogenetic techniques: (a) karyotyping, (b) FISH, and (c) comparative genome hybridization. Created with BioRender.com (accessed on 9 February 2022).
Figure 2Chromosome microarray analysis: (a) array-based comparative genomic hybridization, and (b) DNA arrays for genotyping. Created with BioRender.com (accessed on 9 February 2022).
Modern platforms for chromosomal microarray analysis.
| Array Platform | Specification * | Resolution ** | Description |
|---|---|---|---|
| Agilent SurePrint G3 Human CGH | 1 × 1 M | 2.1 kb | enhanced coverage on known genes, promoters, miRNAs, PAR, and telomeric regions |
| 2 × 400 K | 5.3 kb | ||
| 4 × 180 K | 13 kb | ||
| 8 × 60 K | 41 kb | ||
| Agilent Human Genome CGH | 2 × 105 | 35 kb | |
| 4 × 44 K | 43 kb | ||
| Agilent SurePrint G3 Human Genome CGH + SNP | 2 × 400 K | 7.2 Kb | |
| 4 × 180 K | 25.3 kb | ||
| Agilent SurePrint G3 Unrestricted CGH ISCA v2 | 4 × 180 K | 25 kb | enhanced coverage on |
| 8 × 60 K | 60 kb | ||
| 4 × 44 K | 75 kb | ||
| Agilent SurePrint G3 ISCA v2 CGH + SNP | 4 × 180 K | 25.3 kb | |
| Agilent SurePrint G3 Human High-Resolution Discovery | 1 × 1 M | 2.6 kb | association studies |
| Agilent SurePrint G3 Human CNV | 2 × 400 K | 1 kb | |
| Agilent Human CNV Association | 2 × 105 K | 232 b | |
| Agilent SurePrint G3 CGH Postnatal Research | 4 × 180 K | 2.4 kb | regions identified by Baylor College of Medicine experts |
| 8 × 60 K | 3.7 kb | ||
| Agilent GenetiSure Postnatal Research CGH + SNP | 2 × 400 K | 9.8 kb | disease-associated regions (The Clinical Genome/ISCA database) |
| Agilent GenetiSure Pre-Screen | 4 × 180 K | 31 kb | CNV identification from embryo biopsies and single-cell samples; increased density on chromosomes 13, 18, 20, 21, 22, and X |
| 8 × 60 K | 50 kb | ||
| Agilent GenetiSure Cyto CGH | 4 × 180 K | 3.5 kb | disease-associated regions linked to developmental delay, intellectual disability, neuropsychiatric disorders, congenital anomalies, or dysmorphic features |
| 8 × 60 K | 7.1 kb | ||
| Agilent GenetiSure Cyto CGH + SNP | 4 × 180 K | 7.3 kb | |
| Agilent GenetiSure Cancer Research CGH + SNP | 2 × 400 K | 9.8 kb | cancer regions of the genome |
| Illumina HumanCytoSNP | 12 × 300 K | 6.2 kb | enhanced coverage of ~250 disease regions, including subtelomeric regions, pericentromeric regions, and sex chromosomes |
| Illumina Infinium CytoSNP-850 K | 8 × 850 K | 1.8 kb | comprehensive coverage of cytogenetically relevant genes for congenital disorders and cancer research |
| Illumina Infinium Core | 24 × 300 K | 5.8 kb | genome-wide tag SNPs found across diverse world populations |
| Illumina Infinium Exome | 24 × 300 K | 0.21 kb | comprehensive coverage of putative functional exonic variants (including markers representing a range of common conditions, such as type 2 diabetes, cancer, and metabolic, and psychiatric disorders) |
| Illumina Infinium CoreExome | 24 × 600 K | 1.82 kb | all of the markers from the Infinium Core-24 BeadChip and the Infinium Exome-24 BeadChip |
| Illumina Infinium Global Diversity Array | 8 × 2 M | 0.63 kb | common and low frequency variants in global populations, curated clinical research variants |
| Illumina Infinium Global Screening Array | 24 × 700 K | 2.3 kb | multiethnic genome-wide content, curated clinical research variants |
| Illumina Infinium Omni2.5 | 8 × 2.4 M | 0.65 kb | common and rare SNP content from the 1000 Genomes Project (MAF > 2.5%) |
| Illumina Infinium Omni2.5Exome | 8 × 2.7 M | 0.56 kb | combined Infinum Omni2.5 and Infinium Exome-24 markers |
| Illumina Infinium Omni5 | 4 × 4.3 M | 0.36 kb | comprehensive coverage of the genome including common, intermediate, and rare SNPs |
| Illumina Infinium Omni5 Exome | 4 × 4.6 M | 0.33 kb | comprehensive genome-wide backbone combined with putative functional exonic variants |
| Illumina Infinium OmniExpress | 24 × 700 K | 2.23 kb | high coverage of common variants for |
| Illumina Infinium OmniExpressExome | 8 × 1 M | 1.36 kb | tag SNPs and functional exonic content |
| Illumina Infinium OncoArray | 24 × 500 K | 5.4 kb | genetic variants associated with five common cancers |
| Illumina Infinium PsychArray | 24 × 700 K | 1.74 kb | genetic variants associated with common psychiatric disorders |
| Affymetrix Genome-Wide Human SNP Array 6.0 | 1 × 1.8 M | 0.68 kb | comprehensive coverage of the genome |
| Affymetrix CytoScan XON Suite | 24 × 6.85 M | 0.5 kb | enhanced coverage in 7000 clinically relevant gene, exon-level copy number changes |
| Affymetrix CytoScan HD | 24 × 2.7 M | 1.3 kb | enhanced coverage on cytogenetic relevant region |
* Samples × No. probes. ** Overall median probe spacing.
The most widely used algorithms for CNV detection that use microarray data.
| Tool | Description | aCGH | SNP-Array | Reference | |
|---|---|---|---|---|---|
| Affymetrix | Illumina | ||||
| ADM-2 | search for intervals in which a Z-score based on the average weighted log ratio exceeds a user-specified threshold | ✓ | technical documentation (Agilent) | ||
| Birdsuite | integration of common CNP genotypes and CNVs discovered using HMM | ✓ | [ | ||
| ChAS | HMM on the log2 ratios processed through a Bayes wavelet shrinkage estimator | ✓ | technical documentation (Affymetrix) | ||
| cnvPartition | recursive partitioning approach based on preliminary copy number estimates | ✓ | technical documentation (Illumina) | ||
| DNAcopy | circular binary segmentation | ✓ | [ | ||
| GenoCN | estimation of HMM, parameters from data, germline, and somatic modes | ✓ | ✓ | [ | |
| iPattern | normalization of the total intensities across individuals, Gaussian mixture model fitting | ✓ | ✓ | [ | |
| Nexus | the probe’s log-ratio rank segmentation | ✓ | ✓ | ✓ | [ |
| PennCN | HMM, also counted for the population frequency of the B allele | ✓ | ✓ | [ | |
| QuantiSNP | objective Bayes-HMM, fixed rate of heterozygosity for each SNP | ✓ | ✓ | [ | |
Figure 3Approaches to CNV detection using sequencing data. Created with BioRender.com (accessed on 9 February 2022).
The most widely used algorithms for whole-exome and targeted data sequencing.
| Tool | Description | Data | Mode | Reference | ||
|---|---|---|---|---|---|---|
| WES | Targeted | Germline | Somatic | |||
| cn.MOPS | mixture Poisson model and Bayes approach | ✓ | ✓ | ✓ | ✓ | [ |
| CNVkit | in- and off-target regions, rolling median bias correction, CBS | ✓ | ✓ | ✓ | [ | |
| CODEX | log-linear decomposition-based normalization, Poisson likelihood-based segmentation | ✓ | ✓ | ✓ | ✓ | [ |
| CoNIFER | singular value decomposition-based normalization, ± 1.5 SVD-ZRPKM threshold | ✓ | ✓ | [ | ||
| CoNVaDING | ratio scores and Z-scores of the sample of interest compared to the selected control | ✓ | ✓ | [ | ||
| DECoN | ExomeDepth modification (the distance between exons is taken into account) | ✓ | ✓ | [ | ||
| ExomeDepth | beta-binomial distribution, optimized reference set, HMM | ✓ | ✓ | ✓ | [ | |
| XHMM | principal component analysis normalization, HMM | ✓ | ✓ | [ | ||
Combinations of approaches to the analysis of whole-genome sequencing and the most frequently implemented algorithms.
| Approach | Tool | Description | Reference |
|---|---|---|---|
| RP | BreakDancer | search for regions that include more anomalous read pairs than expected | [ |
| SR | Pindel | pattern growth approach for breakpoint identification | [ |
| RD | CNVnator | mean-shift technique, multiple-bandwidth partitioning, and GC correction | [ |
| AS | Cortex | bubble-calling in the colored de Bruijn graph | [ |
| RP + RD | GenomeSTRiP | connected components algorithm for read pair clustering, Gaussian mixture model for read depth genotyping | [ |
| RP + SR | DELLY | graph-based paired-end clustering, breakpoints refinement using split-read alignment | [ |
| RP + AS | Hydra | assembly of discordant mate pairs and aligned to the reference genome with MEGABLAST | [ |
| RP + SR + AS | Manta | breakend graph construction, independent for each edge variation hypothesis refinement and scoring with diploid model | [ |
| RP + SR + RD | Lumpy | probabilistic representation of an SV breakpoint | [ |
| Ensemble | MetaSV | merging calls from tools (BreakDancer, CNVnator, BreakSeq, Pindel), breakpoint refinement by aligning the assembled CNV regions | [ |