Literature DB >> 34649833

DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects.

Ondrej Pös¹, Jan Radvanszky², Gergely Buglyó³, Zuzana Pös⁴, Diana Rusnakova¹, Bálint Nagy⁵, Tomas Szemes⁶.

Abstract

Copy number variants (CNVs) were the subject of extensive research in the past years. They are common features of the human genome that play an important role in evolution, contribute to population diversity, development of certain diseases, and influence host-microbiome interactions. CNVs have found application in the molecular diagnosis of many diseases and in non-invasive prenatal care, but their full potential is only emerging. CNVs are expected to have a tremendous impact on screening, diagnosis, prognosis, and monitoring of several disorders, including cancer and cardiovascular disease. Here, we comprehensively review basic definitions of the term CNV, outline mechanisms and factors involved in CNV formation, and discuss their evolutionary and pathological aspects. We suggest a need for better defined distinguishing criteria and boundaries between known types of CNVs.

Entities: Chemical

Keywords: CNV formation; Copy number variants; Evolution; Genetic diseases; Human genome; Structural variation

Mesh：

Year: 2021 PMID： 34649833 PMCID： PMC8640565 DOI： 10.1016/j.bj.2021.02.003

Source DB: PubMed Journal: Biomed J ISSN： 2319-4170 Impact factor: 4.910

Copy number variation (CNV) is a general term used to describe a molecular phenomenon in which sequences of the genome are repeated, and the number of repeats varies between individuals of the same species. Biological roles of resulting copy number variants (CNVs) range from seemingly no effect on common variability of physiological traits [1], through morphological variation [2,3], altered metabolic states [4], susceptibility to infectious diseases [5,6], and host–microbiome interactions [[7], [8], [9]], to a substantial contribution to common and rare genetic disorders/syndromes [10]. As such, they have a high potential to contribute to human population diversity [11] and also to micro- and macro-evolutionary processes [12]. In addition to their biological roles, their presence in our genomes may have several technical implications in biomedicine, either as biomarkers for certain pathological processes such as cancer, as biomarkers of environmental exposures such as radiation [13], or even as potential confounding factors when evaluating results of certain genetic diagnostic tests [14]. While it is not yet well described how many genes are absent from human reference genomes, approximately 100 genes were found to be homozygously deleted from the genomes of human individuals without causing apparent phenotypic consequences, likely due to the presence of redundant paralogs, the genes being limited to causing age-related phenotypes or being relevant only under certain environmental or physiological conditions [15]. These findings suggest that the field of pangenomics may open its doors to human pan-genomes [16] in addition to more commonly mentioned bacterial, archaeal, and plant pan-genomes [17]. Much effort has also been made to study the genomes of livestock and domestic animals in the context of CNV-associated, economically important traits [18,19]. These variants overlap genomic regions associated with traits such as feed conversion ratio [20], meat quality [21], milk production [22,23], and animal health [24], thus, may result in significant economic losses due to reduced production and quality of commodities or decreased commercial value of affected animals. CNVs in the form of large insertions and deletions were reported among the first genetic “mutations” ever [25], well before the description of DNA structure and the birth of molecular biology. When searching PubMed for the term “copy number variation”, the returned list contains 4759 results in a date-range from 1983 to 2020, while limiting the search to “humans” produces 3047 results dated between 1991 and 2020. As visible in publication timelines [Fig. 1], a sharp increase in publication numbers began around the year 2005, partly due to the term “copy number variation” getting widely adopted, but also to the research interest that CNVs started to gain at the time, resulting in rapidly accumulating knowledge in the field. Whether plateauing publication numbers following the year 2014 anticipate a permanent trend is not predictable at the moment, however, this does seem to suggest that CNVs are being readily prepared for routine biomedical applications. One may argue that clinical applications of CNV detection, particularly in the field of oncogenetics and severe congenital anomalies, were among the first in the history of genetic testing and are still in general use. However, considering the technical innovations now allowing population-scale genome-wide CNV screening, together with the still widening spectrum of known biological roles of CNVs, we have yet to take full advantage of CNV detection in routine clinical care [26]. In favor of this point-of-view, official systematic guidelines on the interpretation of CNVs and their classification by clinical impact were issued by the American College of Medical Genetics and Genomics for the first time only in 2019 [27]. Because of their important biological roles, relevant technical impact, and several associated uncertainties, we believe that the field of CNV research and application urgently calls not only for further investigation, but also for regular review of available knowledge and for constant revision of relevant definitions and classifications.

Fig. 1

Publication records. PubMed search for the term “copy number variation” as of July 24, 2020. Limiting results to the year 2004 reduced the number of entries considerably. Excluded entries are highlighted in grey; (A) no filter applied; (B) applying a built-in species-specific filter for “humans”. In this review, we outline basic definitions of the term CNV and discuss general perception of copy number variants, along with mechanisms and factors involved in their formation, their evolutionary aspects, and their influence on phenotype and disease.

Definitions, perception, and classification of CNVs

The first association of a CNV with a phenotype reported in a non-human species was the case of a reduced-eye mutant Drosophila melanogaster having the bar eyes phenotype [25] due to a single duplication of the Bar gene [28]. After the “general” human karyotype was established [29], reports of microscopically visible chromosomal aberrations in the human genome emerged as early as the 1960s, when cytogeneticists recognized the genetic background for many disorders, including Cri du chat syndrome associated with a partial deletion of the short arm of chromosome 5 [30]. For decades, due to a limited resolution of microscopy-based cytogenetic methods, reports were mostly limited to large pathogenic abnormalities involving several genes and having prominent adverse effects on physiological processes. Although large structural changes including insertions and deletions were among the first genetic anomalies recognized, the concept of CNV as we perceive it today originates from less than two decades ago [Fig. 2], when large-scale differences between human genomes, previously considered rare mutational events, were shown to be common among normal human individuals, forming a considerable part of our intraspecies physiological variability [31,32]. It is estimated that around 4.8–9.5% of the genome is affected by CNVs [15], a larger fraction than by single nucleotide variants [11].

Fig. 2

Timeline of CNV research. It includes research milestones (orange) in the context of evolving methods for the assessment of CNVs (green) and the minimal length of variants to be considered as CNVs at a given time (blue). Main trends of CNV research are visible, e.g., the relatively wide time-frame following the first descriptions of large CNVs around the beginning of the 20th century. This nearly century-long phase was mainly about the development of methods, which finally allowed genome-wide, high-resolution CNV detection around the beginning of the 21st century, leading to the recognition of common features of CNVs and a subsequent shift in their definition, allowing shorter and shorter variants to be considered CNVs. Although it is generally accepted that CNVs are a subtype of structural genome variants, their definition is still somewhat vague. In addition to basic criteria about the repeating nature and numerical variability among individuals, a “considerable” length is usually also required. Earlier studies defined a CNV as a DNA segment larger than 1000 bp [33], followed by ∼100 bp [1], but currently, the size of CNVs is defined from 50 bp [34] to several Mb [35]. Despite CNVs being insertions or deletions, it is also debated in the literature whether a distinction should be made between indels and CNVs based on their size [36]. We did not find suggested criteria for the extent of homology between sequences to be considered CNVs, most likely because commonly used methods other than sequencing are not suitable to determine such a sequence homology. Mechanisms of formation are sometimes considered for distinction criteria, but there is no unambiguous relationship between the length of an insertion or deletion and its mechanism of formation, as models may be shared across different size-ranges [37]. Classification of identified variants is generally a routine procedure with no time or possibility to determine the exact mechanisms behind their formation. A discussion seems justified about whether there is any real reason to restrict the term CNV to variants having a “considerable length” instead of applying it to all variants meeting core criteria with no regard to molecular mechanisms or to the number and length of repeated elements. For the reasons above, we favor the definition of “copy number variants” used for unbalanced structural rearrangements of the genome, which lead to variability, i.e., relative difference in copy numbers of particular DNA sequences among individuals of the same or distinct populations of a species. In line with this, when considering the definition in the light of the fact that there is no such thing as a “standard genome” [38], the term “copy number variant” is best applied in a general sense to describe variants contributing to “copy number variation” or “copy number variability”, while duplications, insertions, and deletions may be considered as their particular molecular phenotypes. Terms such as “insertion”, “deletion”, “gain” or “loss” are most suitable to describe relative differences: i) against an artificial reference genome when the variant is de novo, or even if it is present in the population and was inherited from an ancestor with no way to determine the original copy number; ii) against a parental genome, when a de novo variant occurs in an offspring; or iii) against a germline genome of the individual, when a de novo somatic variant occurs; in each case with an aim to characterize the type of CNV more precisely. Complying with these arguments, the list of variants belonging to CNVs can be extended (and is already extended by some authors) with several other variant types. On one end of the spectrum, we find changes involving entire sets of chromosomes and aneuploidies (i.e., numerical abnormalities defined as a loss or gain of an entire chromosome), as well as structural abnormalities represented as non-balanced chromosome rearrangements [39]. The opposite end of the spectrum may range from interspersed and tandem repeats, including microsatellites, minisatellites, and macrosatellites, to few nucleotide insertions and deletions. Perhaps the most common and ancient benign CNV is one that is usually not listed as a CNV at all: the XY sex-determination system in humans and many other species, maintained through a strong balancing selection and determining extensive phenotypic differences between individuals of the same population. Interspersed repeated sequences are generally composed of low-copy repeats, also called segmental duplications, which have greater than 95% sequence identity, and high-copy repeats, which include endogenous retroviruses, retrotransposons, and other transposable or mobile elements. Retrotransposons encompass long terminal repeat (LTR) and non-LTR retrotransposons, comprising of the well-known short interspersed nuclear elements (SINEs, such as the Alu element) and long interspersed nuclear elements (LINEs, such as L1). Low and high copy number repeats may act as mediators of the formation of other, larger CNVs, apart from being CNVs themselves. Although the current classification of CNVs is satisfactory, distinguishing criteria and boundaries between different types of variants are not clearly defined, leading to some variants meeting criteria of more than one category at the same time. For core parts of this review, we focus on the conventional concept of CNVs, but where relevant, we discuss variants more difficult to classify.

Mechanisms of CNV formation

One of the main questions in understanding the biology of CNVs is connected to their origin and mechanisms of formation. To date, several different replication and repair mechanisms have been shown to be involved in the development of CNV. Apart from genomic factors, chemical and physical mutagens also drive their formation.

Genomic factors and mechanisms of CNV formation

CNVs emerge from different mutational mechanisms, including DNA recombination-, replication- and repair-associated processes. Mechanisms of change in gene copy number have been extensively studied through the analysis of CNV breakpoint junction sequences. Repeated sequences, including low-copy repeats (e.g., segmental duplications) and high-copy repeats (e.g., SINEs, LINEs, and endogenous retroviruses), are enriched in the vicinity of breakpoints, thus represent an important factor for CNV instability [40]. Such sequence motifs play a key role in triggering non-allelic homologous recombination (NAHR), one of the general mechanisms involved in the formation of recurrent CNVs [41]. Recurrent rearrangements share a common size, show clustering of breakpoints, and recur in multiple individuals. On the other hand, non-recurrent rearrangements with scattered breakpoints differ in size, but may share the smallest region of overlap among different patients [Fig. 3] [42] and might be formed by several different mechanisms: i) non-replicative mechanisms, including non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ); or ii) replicative mechanisms such as replication slippage, fork stalling and template switching or microhomology-mediated break-induced replication [41].

Fig. 3

Recurrent vs. non-recurrent rearrangements. Recurrent CNVs have the same size and common breakpoints enriched in low-copy repeats (LCRs). Non-recurrent CNVs with different sizes may share the smallest region of overlap (SRO). CNVs may occur as inherited or de novo events. Inherited CNVs are not recurrent events but always share the same breakpoints, resulting in a similar phenotypic effect. De novo CNVs are independently arising events that may have recurrent or distinct (non-recurrent) breakpoints. Even CNVs with non-recurrent breakpoints may show overlapping effects, as they disrupt the same SRO. Short repetitive sequence motifs (e.g., inverted repeats) may adopt non-B DNA structures (e.g., cruciforms) [43] that may result in replication errors inducing CNV formation [40]. Such non-B-DNA forming sequences are also enriched in promoter regions, thus Conrad et al. suggested that the same properties that enable regulation of transcription may also be mutagenic for the formation of CNVs [44]. Consequently, CNVs may influence the evolution of gene regulation. In addition to accurate replication of DNA during cell division, its equal distribution into daughter cells is essential in genome maintenance, specifically in ensuring the balance of genetic content in eukaryotic cells [45]. However, errors such as nondisjunction, merotelic kinetochore-microtubule attachment, monopolar and multipolar spindle, and telomerase dysfunction may occur during this high-precision process. Nondisjunction is caused by weakened or completely inactive mitotic checkpoints. If a chromosome fails to separate correctly, it results in one daughter cell with a missing copy of the chromosome and the other one having an extra copy. If mitotic checkpoints are completely inactive, it causes the two daughter cells to have a disjoint set of chromosomes. Unbalanced chromosome segregation occurs when one kinetochore is attached to the microtubules emanating from both poles of the spindle [46]. In the case of dysfunction or absence of telomerase, the chromosome ends become uncapped. This condition triggers activation of the NHEJ DNA repair pathway in the cell, leading to a telomere fusion between the two chromosomes and the formation of a dicentric chromosome. Structural aneuploidy may occur by several mechanisms, including incorrect NAHR (the same mechanism that causes CNVs), misalignment of homologous chromosomes and/or unequal crossing over between non-sister chromatids during meiosis [39]. Such structural abnormalities may subsequently affect the development of chromosomal aneuploidies, suggesting a link between them. When considering tandem repeats, both micro-, mini-, and macrosatellites display notable somatic and intergenerational dynamics leading to copy number changes. Microsatellites are typically altered during DNA repair and replication in mitosis mainly due to slippage by strand misalignment, primarily accounting for small-scale instability [47], however, unequal crossing over was also suggested to cause variable copy numbers of microsatellite motifs [48]. In the case of disease-associated repeats, which may gain up to a few thousands of repeat units in affected individuals, the mechanisms of expansion are different and may depend on the repeat sequence, length and location within a genome as well as the organism or cell type. They likely involve transient DNA secondary structures formed by the repetitive tracts as DNA unwinds for replication, transcription, or repair, during which these secondary structures may get excised or incorporated into the DNA, resulting in a reduction or expansion of the repetitive region [47]. These mechanisms are generally involved in expanding copy numbers over certain thresholds at single loci and do not lead to general microsatellite instability. Mechanisms may further be complemented by an impairment of the mismatch repair system leading to general hypermutability of microsatellite loci throughout the genome, called microsatellite instability, most commonly due to a failure to correct common errors during replication of repetitive DNA sequences [49]. While a link was reported between microsatellite instability and telomere shortening suggesting some association between the DNA mismatch repair system and telomere maintenance mechanisms, at least in colorectal cancer [50], telomeres per se may represent atypical mechanisms of CNV generation. Telomeres contain TTAGGG hexanucleotide repeats protecting the genetic content of chromosomes against chromosome shortening during cell division. In an active compensatory mechanism known as telomere length maintenance, telomerase increases copy numbers by adding TTAGGG repeat units to the ends of telomeres [51]. Minisatellites, on the other hand, generally mutate in the germline by complex conversion-like transfer of repeats between alleles. Inter-allelic unequal crossovers also occur between loci, although at low frequency [52].

Environmental factors contributing to the formation of CNVs

The contribution of environmental factors to the origin of CNVs is poorly understood. It was shown that perturbing DNA replication by chemical mutagens (e.g., aphidicolin or hydroxyurea) results in replication stress that induces the formation of CNVs across the human genome. Replication fork delay and subsequent error-prone repair seem to be important mechanisms in the process [53]. Hastings et al. proposed that under stress conditions, the repair of broken replication forks switches from high-fidelity homologous recombinational repair to non-homologous repair promoting CNVs [41]. Physical mutagens such as low-dose ionizing radiation also effectively induce de novo CNV mutations via a replication-dependent mechanism, since radiation-induced DNA strand breaks may collapse the replication fork [13]. It was shown that radiation induces duplications and deletions equally, while chemical mutagens induce copy number losses at a higher frequency compared to gains. On another note, radiation impacts random loci of the genome, while chemical mutagenesis targets specific genomic regions [54]. Costa et al. showed a 1.5-fold increase in the germline CNV mutation burden in offspring of parents accidently exposed to a low dose of ionizing radiation. The mutation rate of de novo CNVs was proven to be a biomarker of parental exposure useful in retrospective studies of human populations exposed to low doses of ionizing radiation [55]. It was also demonstrated that irradiation with laser-driven electrons may induce CNVs in human leukocytes in vitro. These CNVs are usually duplications or amplifications and tend to inversely correlate with chromosome size and gene density. CNVs may persist in the cell population as stable chromosomal changes for several days after exposure [56].

Evolutionary aspects of CNVs

As excessively static genomes are not able to conform to an ever-changing environment, genome plasticity driven by insertion sequence elements, transposons and integrons, together with DNA rearrangements, may determine whether an organism can survive changing conditions and compete for the resources it needs to reproduce, at least in prokaryotes. However, the main players of prokaryote genome plasticity, including CNV-associated mechanisms mentioned in the previous section, are also active in humans. Although changes to the genome, and thus to the phenotype, may threaten the ability to survive, gaining new phenotypes may also enhance chances of surviving even in previously disadvantageous environments [57]. Both are possible scenarios in the context of the extent of change and environmental pressure, which define whether individuals fit the actual environment. CNVs, like other changes in DNA, create phenotypes in which descendants slightly or largely differ from their predecessors. Small differences in phenotypes are generally considered normal variation, especially when relatively common in the population with small changes in physiology, i.e., benign traits or even common diseases and intolerances. Larger deviations, on the other hand, may be perceived as reproductional losses, malformations, or genetic/genomic disorders, which may result from large CNVs that are both rare and deleterious, as they tend to contain dosage sensitive genes [41]. It was shown that the number of events is steadily decreasing with increased CNV length, so there are many more small (common) CNVs than large (deleterious) ones [58].

CNVs in the light of evolutionary mechanisms

It was reported in several organisms that genes having tissue-specific expression tend to be more variable in copy numbers than widely expressed genes, which might have housekeeping functions [[59], [60], [61]]. The most commonly cited examples of evolutionarily important CNVs in humans fit well into this concept, including the CNV-related evolution of trichromatic vision [62], hemoglobins and myoglobins [63], and olfactory genes [64]. The gene family of amylase, a starch-digesting enzyme, may serve as a multipurpose example in understanding CNV-mediated evolutionary processes, from a mechanistic view of the generation of complex genomic features and novel functions to the explanation of certain adaptation models. The study of Axelsson et al. showed a 7.4-fold average increase of AMY2B copy numbers in dogs relative to wolves. The change correlates to an increase in both expression level and enzyme activity, indicating that duplications of the alpha-amylase locus conferred a selective advantage to early ancestors of modern dogs. An increase in amylase activity allowed them to thrive on a diet rich in starch compared to the carnivorous diet of wolves and constituted a crucial step in early dog domestication [65]. Mammals, in general, produce amylase in their pancreas, while primates, rodents, and lagomorphs also show salivary expression of the enzyme. The GRCh38/hg38 assembly of the human reference genome contains two pancreatic (AMY2A, AMY2B) and three salivary amylase genes (AMY1A, AMY1B, AMY1C), arrayed close to each other on chromosome 1, but AMY1B having an opposite orientation [66]. These genes were reported to be present in highly variable copy numbers in humans, ranging from 2 to 18 copies with an average of 6 copies per person [67]. Average copy numbers of AMY1 were shown to be higher in populations with high-starch diets compared to those with traditionally low-starch diets, suggesting that when starch became a prominent component of the human diet, it led to the positive selection of the AMY1 gene [66]. Evolution of this human multigene family likely involved several steps in primate, and later in human evolutionary lineages, with mechanisms involving unequal, homologous, and inter- and intrachromosomal crossovers [68]. Among typical examples of balancing selection maintaining CNVs in the population, the hemoglobin genes and their association to two severe human diseases, thalassemia and malaria are usually mentioned [69]. While homozygous deletions of one of the α-globin genes cause α-thalassemia on one arm of the balance, heterozygous deletions protect their carriers against malaria on the opposite arm, leading to a correlating frequency of α-thalassemia and malaria prevalence in human populations of malaria-endemic countries [70]. Such balancing selection may shape the distribution of certain CNVs for generations, and they may be further maintained by other selective agents in later generations. For such cases, CNVs involving the immunoregulatory and inflammatory cytokine CCL3L1 may be used as models in which higher copy numbers were found to correlate with a lower risk of HIV infection and AIDS progression [71], but CCL3L1 copy numbers may also be reduced due to their association with other severe diseases such as systemic lupus erythematosus [72], strongly suggesting that certain CNVs may have been subject to highly dynamic and heterogeneous forms of evolutionary pressure [69].

Duality of evolution and disease

The mechanisms by which CNVs contribute to evolution but also cause disease are highly similar and share a common basis. The most commonly mentioned mechanisms are gene duplications or multiplications, which are considered to be essential sources of evolutionary innovation, as redundant gene copies may acquire new functions. If the multiplicated gene is not dosage sensitive, so fitness is not reduced, one of the copies may keep its original function while the other one may escape selective pressure and silently undergo continuous changes by mutation. And although the original copy (or even a functional single-copy gene) may also undergo slight changes over generations, duplicated genes have a tendency to accumulate mutations faster. Some other CNV-related processes acting on disease and evolution [41] include: i) direct influence on the expression of gene products leading to changing levels of a protein, e.g., by changing the copy number of the coding sequence itself, as the gene and its regulatory regions may both be encompassed by the CNV; ii) direct influence on expression levels, or on localization or timing of expression through alteration – including creation or disruption – of regulatory regions when the coding sequence itself is not encompassed by the CNV; or iii) acquisition of new functions by forming new or modified products – or altered localization or timing of expression of pre-existing products – through recombination of functional domains of different genes. Comparative analyses of human and chimpanzee genomes helped us understand the evolutionary significance of CNVs. Approximately one-third of the CNVs observed in the human genome, including some human disease-causing duplications, are not duplicated in chimpanzees [12]. It seems that the evolution of the mammalian genome during primate speciation has led to a genome architecture predisposing some regions to rearrangements, which also resulted in genomic disorders [73]. Genes that have likely evolved under purifying or positive selection for copy number changes have been identified, particularly those with inflammatory response functions such as APOL1, APOL4, CARD18, IL1F7, and IL1F8, which are deleted in chimpanzees. Moreover, a copy number loss of the oncogene TBC1D3 involved in cell proliferation was observed in the chimpanzee compared with the human reference genome [74]. In conclusion, although there is evidence that altered copies of specific genes offer a selective advantage, many variations in copy numbers are disadvantageous as they are involved in the formation and progression of several pathological conditions [41].

Pathological aspects of CNVs

As we mentioned above, many variations of the human genome, including variations in gene copy number, play important roles in human health and disease. Since CNVs often span across a large number of genes and regulatory regions, many of which are biomedically actionable and/or included in the OMIM database, pathogenic CNVs have been found to cause genomic disorders with Mendelian inheritance [75] or to be associated with complex, multifactorial diseases [76] including cancer [77], cardiovascular [78,79], autoimmune [80], neurodevelopmental and neurodegenerative disorders [81,82]. When considering aneuploidies and short tandem repeats (STRs) as CNVs, this list may be extended by aneuploidy-associated syndromes such as Down, Edwards, Patau, Klinefelter, or Turner [83], by repeat expansion disorders, which are typically severe neurodegenerative or neuromuscular conditions [84], and by many types of cancer that involve microsatellite instability [85] or a change in telomere length [86].

Pathomechanisms

CNVs are prevalent in both coding and non-coding regions of the genome. They can directly affect a coding sequence and cause disruption of gene function or alter gene dosage leading to the development of a disorder [87]. However, CNVs that encompass only non-coding regions may also have a functional impact on the human genome by a number of possible mechanisms. It was shown that CNVs may influence tissue-level gene expression through harboring small non-coding RNAs (sncRNAs) [88]. Studies have found that CNV-sncRNAs include miRNA [89], snoRNA [90], piRNA [91], and tRNA [92], and CNVs harboring long non-coding RNAs have also been reported [93]. CNVs may alter chromatin interaction domains, also known as topologically associated domains, which may disrupt spatial organization of the genome and result in pathogenic phenotypes [94]. Expansions of microsatellites – sometimes considered as CNVs – represent additional pathogenic mechanisms such as the formation of extremely stable aggregates of mRNA or protein [95]. Proteins may exert a cumulative direct toxic effect both after canonical translation and non-canonical non-ATG initiated translation [96]. At the RNA level, well-known indirect pathogenic effects act through an expanded RNA-repeat-mediated depletion or activation of regulators of various cellular functions, such as alternative splicing [97]. Telomeres and the highly dynamic nature of their repeat numbers also represent important problems, including their continuous shortening over a lifetime contributing to aging and their escape from shortening in cancerous cell lines [86]. Another conventional, although indirect cause of genomic rearrangement is the unmasking of recessive mutations or functional polymorphisms when a deletion occurs [98]. In contrast, duplication of a normal gene copy on one chromosome may mask a disease-causing mutation on the other chromosome, resulting in a healthy phenotype [99]. Copy number of the SMN2 gene may indirectly modify severity in spinal muscular atrophy (SMA), a disease caused by the homozygous deletion of the SMN1 gene. The gene SMN2 differs from SMN1 in a substitution causing exclusion of exon 7 during splicing, resulting in a truncated and unstable protein for most (∼85%) SMN2 transcripts. Although the absence of SMN2 alone does not cause SMA, its copy number may alter the SMA phenotype: increased SMN2 copy numbers were present in patients with milder (SMA type III) phenotype compared to patients with more severe SMA type I [100]. However, SMN2 copy number is not a definitive modifier of SMA severity, and other modifying factors need to be taken into account for disease characterization [101].

Genomic disorders with Mendelian inheritance

In 1998, the term “genomic disorder” was defined: it refers to disorders caused by structural changes of the human genome. Such DNA rearrangements may lead to the loss or gain of dosage-sensitive gene(s) or disrupt a gene [10]. While in many monogenic diseases, an abnormal phenotype is caused by a point mutation, many of these genomic disorders are known as recombination-based conditions [102]. In addition, similarly to large chromosomal aberrations, smaller CNVs affecting only one exon or even smaller regions are also associated with human diseases. Therefore, depending on the size of the genomic segment involved, its position and genomic context, as well as the number of genes within the rearranged segment together with other risk factors, CNV-associated disorders may be classified as Mendelian diseases, contiguous gene syndromes, chromosomal disorders, or other sporadic or complex traits [73]. Among these, Mendelian genomic disorders may segregate as autosomal recessive, such as nephronophthisis 1, juvenile [103], autosomal dominant, such as Charcot–Marie–Tooth disease type 1A or hereditary neuropathy with liability to pressure palsies (both caused by one of the first identified disease-associated submicroscopic CNVs, i.e., a duplication or a deletion, respectively, which are the reciprocal products of the same non-homologous crossover event) [75], X-linked, such as hemophilia A [104], or even as Y-linked traits, such as azoospermia [105]. Repeat expansion disorders show a mode of inheritance depending on the chromosomal localization of the involved gene. Certain non-Mendelian manifestations are also typical for many of these disorders. These may include anticipation in myotonic dystrophy type 1 [106] or phenotype-modifying potential of certain repeat alleles when in combination with other pathogenic variants, like in the case of the suggested reciprocal interaction between myotonic dystrophy type 2 premutations and congenital myotonia caused by mutations of the CLCN1 gene [107].

Multifactorial diseases

Another example where CNVs may play an important role are complex diseases with multifactorial etiology, caused by the combination of several genetic factors (each one having a low impact on the phenotype alone) and environmental factors. Continuous progress in recent decades has increased our understanding of the pathophysiology of many complex diseases. However, there are still unanswered questions of risk factors or largely unknown genetic background, which prevent us from clearly uncovering and understanding the pathomechanism of such diseases. Therefore, it is assumed that the implication of CNVs in the pathogenesis of complex diseases could explain at least a fraction of the well-known “missing heritability” problem of these complex disorders [108]. Inflammatory bowel disease (IBD) represents a typical example of multifactorial disease. More than 200 IBD associated loci are known, yet the pathogenesis is still unclear [109]. Some authors assume that studying CNVs may shed more light on the mystery of IBD [76,110,111]. Recently, results of Frenkel et al. pointed not only to the important role of CNVs, but also to significant pathways in the pathogenesis of IBD [112]. Even though CNVs are heavily implicated, such large genetic variants are still understudied in IBD and other complex diseases [113]. This is probably due to the limitations of methods suitable for CNV detection at the time when major genome-wide association studies were carried out. Infectious diseases may also be considered as complex multifactorial diseases as both genetic and environmental variability affect the susceptibility of individuals to infections. It was shown that host CNVs play an important role in clinical phenotypes related to infectious diseases. Examples include variants of α-globin [6] or the CCL3L1 gene [5] as explained in chapter 4 (Evolutionary aspects of CNVs), among others [114]. CNV detection may also find application in the evaluation of microbiome balance through the analysis of CNVs in metagenomes in different body parts. The human microbiome interacts with the host and plays an important role in many host biological processes [7]. Host genomic variations influence the composition of the microbiome, which in turn affects the health of the individual. While numerous studies have been focused on associations between the gut microbiome and specific alleles of the host genome, gene copy number also varies. It was shown, for instance, that duplication of the human AMY1 gene is associated with an increased number of oral Porphyromonas in saliva, which is linked to periodontitis. The gut microbiota of these individuals had an increased abundance of resistant starch-degrading microbes, produced higher levels of short-chain fatty acids, and drove increased adiposity when transferred to germ-free mice [8]. This case demonstrated that even seemingly harmless variants in the host genome may affect the health of individuals. Current knowledge suggests the importance of analyzing CNVs not only in human cells but also in the microbiome. Taxonomic characterization of the human microbiota is often limited to the species level, but each microbial species represents a large collection of strains that may contain considerably different sets or copy numbers of genes resulting in potentially distinct functional features. This intra-species variation is caused by deletion and duplication events, which were shown to be prevalent in the human gut environment, with some species exhibiting CNVs in >20% of their genes. Variability is especially relevant in disease-associated genes involved in important functions such as transport and signaling. Greenblum et al. showed obesity to be associated with higher copy numbers of thioredoxin 1 in Clostridium sp., an increased copy number of an MFS transporter gene in the Roseburia inulinivorans genome cluster, and increased HlyD in Bacteroides uniformis associated with IBD-afflicted individuals. According to the authors, the analysis of species composition alone is not sufficient to capture the true functional potential of the microbiome, as it may fail to capture important functional differences [9]. Hence, the analysis of intra-species variation in microbial communities is crucial.

Cancer

Although cancer may be monogenic or, more typically, multifactorial, oncogenetics tends to be considered as a special field and discussed separately. In cancer genetics, CNVs are divided into two classes based on their size: i) large-scale, also known as chromosome-arm level variants encompassing >25% [115] or ⅓ of the chromosome arm [116]; and ii) focal variants defined as small regions of the genome, usually not more than 3 Mb in size, containing up to a few genes [[117], [118], [119]]. Since 25% of an average human chromosome arm contains more than 15 Mb of DNA, variants ranging from 3 to about 15 Mb do not meet the criteria of either one of the above categories. On the other hand, in the case of 21p or Yp, 25% of total length comprises less than 3 Mb, so a variant at such a location might fall into both classes at the same time. To our knowledge, there is no strict consensus in the literature, so we suggest to classify large-scale variants as >3 Mb, while focal variants should retain their current definition. Both types of CNVs are important in the context of disease, but the relatively small size and low gene content make focal CNVs more suitable for the identification of candidate driver genes. Analysis of CNVs is an important aspect of the molecular diagnosis of cancer. It was shown that recurring deletions are typically overrepresented in tumor suppressor genes and underrepresented in oncogenes [120]. Aberrations in gene copy numbers may reveal therapeutic targets or markers of drug resistance in several types of cancer [121]. High-resolution copy-number profiles of 3131 cancer specimens revealed the CNV landscape of the vast majority of cancer types. An average tumor sample consists of 17% genome amplification and 16% deletion, compared to averages of 0.35% and less than 0.1% in normal samples, respectively. Specific gene families and pathways have been shown to be overrepresented among focal somatic copy-number alterations. The most enriched ones are gene families important in cancer pathogenesis, such as kinases, cell cycle regulators and MYC family members [77], which may represent potential therapeutic targets.

Conclusions

The days when the Mendelian dogma of all our genes being present in two copies could be applied universally have long since passed. Normal and pathogenic CNVs have been described in eukaryotic cells as well as in the microbiome, and their phenotypic associations and causes involving replication, recombination and repair, along with environmental mutagens, are becoming more well-known as research methods evolve. And yet, general awareness of CNVs as common polymorphisms and studies aiming to shed light on their contribution to the missing heritability problem seen in genome-wide association studies are relatively scarce, with SNPs still stealing most of the spotlight [58,122]. If we want to be careful not to “miss the forest for the trees” [123], an integrated approach is advised, taking different forms of polymorphism and gene expression data into account, rather than maintaining a sole focus on SNP analysis [124]. Having assessed recent developments in the field, we are of the view that copy number research deserves more attention as a vital and very interesting aspect of the paradigm shift currently underway in molecular and clinical genetics.

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

119 in total

Review 1. Copy number variation in the human genome and its implications for cardiovascular disease.

Authors: Rebecca L Pollex; Robert A Hegele
Journal: Circulation Date: 2007-06-19 Impact factor: 29.690

Review 2. Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability.

Authors: Jacques S Beckmann; Xavier Estivill; Stylianos E Antonarakis
Journal: Nat Rev Genet Date: 2007-08 Impact factor: 53.242

Review 3. Structural variation in the human genome and its role in disease.

Authors: Paweł Stankiewicz; James R Lupski
Journal: Annu Rev Med Date: 2010 Impact factor: 13.739

4. Candidate driver genes in focal chromosomal aberrations of stage II colon cancer.

Authors: Rebecca P M Brosens; Josien C Haan; Beatriz Carvalho; François Rustenburg; Heike Grabsch; Philip Quirke; Alexander F Engel; Miguel A Cuesta; Nicola Maughan; Marcel Flens; Gerrit A Meijer; Bauke Ylstra
Journal: J Pathol Date: 2010-08 Impact factor: 7.996

5. Relationship between microsatellite instability and telomere shortening in colorectal cancer.

Authors: S Takagi; Y Kinouchi; N Hiwatashi; F Nagashima; M Chida; S Takahashi; K Negoro; T Shimosegawa; T Toyota
Journal: Dis Colon Rectum Date: 2000-10 Impact factor: 4.585

6. Evolution of the human alpha-amylase multigene family through unequal, homologous, and inter- and intrachromosomal crossovers.

Authors: P C Groot; W H Mager; N V Henriquez; J C Pronk; F Arwert; R J Planta; A W Eriksson; R R Frants
Journal: Genomics Date: 1990-09 Impact factor: 5.736

7. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon.

Authors: Klaus Fellermann; Daniel E Stange; Elke Schaeffeler; Hartmut Schmalzl; Jan Wehkamp; Charles L Bevins; Walter Reinisch; Alexander Teml; Matthias Schwab; Peter Lichter; Bernhard Radlwimmer; Eduard F Stange
Journal: Am J Hum Genet Date: 2006-07-12 Impact factor: 11.025

Review 8. Alpha-thalassaemia.

Authors: Cornelis L Harteveld; Douglas R Higgs
Journal: Orphanet J Rare Dis Date: 2010-05-28 Impact factor: 4.123

Review 9. Genome architecture and its roles in human copy number variation.

Authors: Lu Chen; Weichen Zhou; Ling Zhang; Feng Zhang
Journal: Genomics Inform Date: 2014-12-31

10. Analysis of copy number variations induced by ultrashort electron beam radiation in human leukocytes in vitro.

Authors: Tigran Harutyunyan; Galina Hovhannisyan; Anzhela Sargsyan; Bagrat Grigoryan; Ahmed H Al-Rikabi; Anja Weise; Thomas Liehr; Rouben Aroutiounian
Journal: Mol Cytogenet Date: 2019-05-16 Impact factor: 2.009

10 in total

1. Multiclass Cancer Prediction Based on Copy Number Variation Using Deep Learning.

Authors: Haleema Attique; Sajid Shah; Saima Jabeen; Fiaz Gul Khan; Ahmad Khan; Mohammed ELAffendi
Journal: Comput Intell Neurosci Date: 2022-06-09

2. CNV Hotspots in Testicular Seminoma Tissue and Seminal Plasma.

Authors: Dora Raos; Irena Abramović; Miroslav Tomić; Alen Vrtarić; Tomislav Kuliš; Marijana Ćorić; Monika Ulamec; Ana Katušić Bojanac; Davor Ježek; Nino Sinčić
Journal: Cancers (Basel) Date: 2021-12-31 Impact factor: 6.639

3. Automated prediction of the clinical impact of structural copy number variations.

Authors: M Gažiová; T Sládeček; O Pös; M Števko; W Krampl; Z Pös; R Hekel; M Hlavačka; M Kucharík; J Radvánszky; J Budiš; T Szemes
Journal: Sci Rep Date: 2022-01-11 Impact factor: 4.379

4. Fine Breakpoint Mapping by Genome Sequencing Reveals the First Large X Inversion Disrupting the NHS Gene in a Patient with Syndromic Cataracts.

Authors: Alejandra Damián; Raluca Oancea Ionescu; Marta Rodríguez de Alba; Alejandra Tamayo; María José Trujillo-Tiebas; María Carmen Cotarelo-Pérez; Olga Pérez Rodríguez; Cristina Villaverde; Lorena de la Fuente; Raquel Romero; Gonzalo Núñez-Moreno; Pablo Mínguez; Carmen Ayuso; Marta Cortón
Journal: Int J Mol Sci Date: 2021-11-24 Impact factor: 5.923

5. Greek gods and the double-edged sword of liver regeneration.

Authors: Aila Akosua Kattner
Journal: Biomed J Date: 2021-10-26 Impact factor: 4.910

6. Culture of patient-derived multicellular clusters in suspended hydrogel capsules for pre-clinical personalized drug screening.

Authors: Haijiang Dong; Zequn Li; Suchen Bian; Guangyuan Song; Wenfeng Song; Mingqi Zhang; Haiyang Xie; Shusen Zheng; Xuxu Yang; Tiefeng Li; Penghong Song
Journal: Bioact Mater Date: 2022-03-19

7. Integrated Workflow for the Label-Free Isolation and Genomic Analysis of Single Circulating Tumor Cells in Pancreatic Cancer.

Authors: Brittany Rupp; Sarah Owen; Harrison Ball; Kaylee Judith Smith; Valerie Gunchick; Evan T Keller; Vaibhav Sahai; Sunitha Nagrath
Journal: Int J Mol Sci Date: 2022-07-16 Impact factor: 6.208

8. Case Report: Whole-Exome Sequencing-Based Copy Number Variation Analysis Identified a Novel DRC1 Homozygous Exon Deletion in a Patient With Primary Ciliary Dyskinesia.

Authors: Ying Liu; Cheng Lei; Rongchun Wang; Danhui Yang; Binyi Yang; Yingjie Xu; Chenyang Lu; Lin Wang; Shuizi Ding; Ting Guo; Shaokun Liu; Hong Luo
Journal: Front Genet Date: 2022-07-06 Impact factor: 4.772

9. Different Strategies for Counting the Depth of Coverage in Copy Number Variation Calling Tools.

Authors: Wiktor Kuśmirek
Journal: Bioinform Biol Insights Date: 2022-08-03

Review 10. Genomic Variations and Mutational Events Associated with Plant-Pathogen Interactions.

Authors: Aria Dolatabadian; Wannakuwattewaduge Gerard Dilantha Fernando
Journal: Biology (Basel) Date: 2022-03-10

10 in total