Literature DB >> 17953491

Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies.

Abstract

Genome-wide association scans (GWASs) using single nucleotide polymorphisms (SNPs) have been completed successfully for several common disorders and have detected over 30 new associations. Considering the large sample sizes and genome-wide SNP coverage of the scans, one might have expected many of the common variants underpinning the genetic component of various disorders to have been identified by now. However, these studies have not evaluated the contribution of other forms of genetic variation, such as structural variation, mainly in the form of copy number variants (CNVs). Known CNVs account for over 15% of the assembled human genome sequence. Since CNVs are not easily tagged by SNPs, might have a wide range of copy number variability, and often fall in genomic regions not well covered by whole-genome arrays or not genotyped by the HapMap project, current GWASs have largely missed the contribution of CNVs to complex disorders. In fact, some CNVs have already been reported to show association with several complex disorders using candidate gene/region approaches, underpinning the importance of regions not investigated in current GWASs. This reveals the need for new generation arrays (some already in the market) and the use of tailored approaches to explore the full dimension of genome variability beyond the single nucleotide scale.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Nucleotides

Year: 2007 PMID： 17953491 PMCID： PMC2039766 DOI： 10.1371/journal.pgen.0030190

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

A large number of studies describing GWASs has been published recently. Several old and new associations have been detected by genotyping large collections of samples with hundred thousands of markers. Proof of concept of GWASs has been demonstrated and new biological pathways are now on the priority list of several investigators trying to understand asthma, Crohn disease, and diabetes, among other disorders. However, for most diseases, the identified genomic regions explain only a small fraction of the familial aggregation. Although these studies have been focused on SNPs as the common resource to explore genetic variability, other types of markers exist, which likely exert important phenotypic effects on gene expression and function. In this review, we explore the contribution of CNVs to common human disorders and evaluate the caveats of SNP-based GWASs in covering regions of the genome that have a high degree of plasticity and that could play an important role in disease susceptibility.

What Have We Missed in Current Genome-Wide Association Studies?

SNPs are the markers that have been selected to do the trick of uncovering the genetic determinants of complex traits and common disorders. This choice was mainly based on their abundance (over 12 million SNPs), and their use was boosted by the technological development of tools for high-throughput analysis of these variants. The Human Genome Project, followed by the HapMap Project [1] (http://www.hapmap.org/), has provided the landmark for the development of high-density SNP arrays to explore the nucleotide variability of the human genome, using powerful analytical methods based on statistical genetics, population genetics, and epidemiology. Current association studies for common disorders and complex traits, aim to detect linkage disequilibrium (LD) between SNPs that genetically mark a given region (tagSNPs) and the functional variants (either at the RNA or protein level) responsible for the phenotypes. Due to their abundance and variability, SNPs have been considered powerful markers to identify loci underlying phenotypic variation in genetic association studies. To provide common and robust tools for disease-associated gene discovery, the HapMap Consortium has genotyped nearly 4 million SNPs from individuals of the main human populations. A subset of these SNPs, covering the genome at the physical and genetic levels, is included in the commercially available arrays. The outcome of the first round of studies involving thousands of patients and controls, and several hundred thousand SNPs has recently been published. GWASs have been completed for more than a dozen common disorders (Table 1) and several new associations have been detected. The Wellcome Trust Case Control Consortium (WTCCC) has reported the largest genome-wide association study performed so far, for seven diseases involving 14,000 patients and 3,000 controls [2]. Together with the WTCCC study, other publications (Table 1) have reported new, and confirmed previously known, statistically compelling associations for common disorders, including type 1 [2,3] and type 2 diabetes [2-7], obesity [2], coronary heart disease [2,8-10], breast [11-13] and prostate [14,15] cancer, rheumatoid arthritis [2], Crohn disease [2,16-18], celiac disease [19], asthma [20], age-related macular degeneration [21], restless leg syndrome [22], and multiple sclerosis [23].

Table 1

Associations Identified in GWASs for Common Disorders Using Genotyping Arrays

Associations Identified in GWASs for Common Disorders Using Genotyping Arrays The above-mentioned analyses represent an obvious step forward in the arena of the study of the genetic contribution to complex diseases and have undoubtedly proved the utility of the GWAS approach using SNPs to identify new genetic associations without previous hypotheses about their biology. Each of these reports has described links with known or new biological pathways, and has also established novel mechanistic connections among pathways and among disorders. The set of loci reported so far should potentiality facilitate progress in the understanding of the physiology of each of these disorders. These studies, however, raise several questions in relation to the genetic basis of complex diseases and the strategies used so far towards the identification of a complete set of susceptibility loci. First, it is obvious that the genetic picture obtained for each of these disorders, even for those targeted by independent cohorts, such as in the case of type 2 diabetes, is still far from complete. The identified associations, with some exceptions (the major histocompatibility complex, MHC, locus), have a modest effect with odds ratios lower than 1.5. Thus, the nine confirmed loci for type 2 diabetes [2-7] might explain about 3% of the genetic variance, and 14 loci identified for Crohn disease [2,16-18] cover less than 10% of the variance. If we take into account the outcome achieved in these studies using such large number of samples and SNPs, it is expected that new associations for common disorders using SNP markers will likely have similar or even lower effects, and association values will likely not go far above current figures. Furthermore, we are uncertain about how well the additional small effects will be able to disclose the strong heritability that many complex disorders exhibit. Second, it is also obvious from the studies reported so far (with the exception of age-related macular degeneration [21] and some other disorders), that the identified variants are not the functional ones. Thus, the role of most genetic changes in the molecular basis of disorders has not yet been discovered. Sequencing of a large number of patients with the aforementioned disorders along with a deep coverage of the regions surrounding the detected associations must be performed, and is already under way in some cases. This will help to detect variants with functional consequences and larger effects than those so far uncovered, even if they are rare in the population and account only for a subset of patients. Third, it would be interesting to see if epistasis exists between functional variants once they have been detected. It is remarkable that the data obtained so far mainly show absence of epistasis between variants for the same disorder or groups of disorders. Specific screens should be performed to assess the additive nature of the genetic component of the identified associations. Fourth, although the HapMap project has provided an excellent tool for genetic association studies, it is clear that the set of markers analyzed in GWASs do not cover the entire genome variability. Despite the large number of SNPs that have been selected to explore genetic association using LD measures [24], and the coverage of nearly 100% of the genome using between 0.5 and 1 million tagSNPs [25], some regions are likely to be missed. Certainly, there are regions not well covered in HapMap due to the lack of sequence information, and, in large part, to the presence of CNVs and segmental duplications [26,27]. This has caused commercial panels to be deficient in SNPs covering these regions. Thus, future studies trying to reveal a more complete set of genetic determinants will necessarily require a larger number of SNPs (many with low minor allele frequencies and covering “unsettled” regions) and even larger cohorts. It has been estimated that to identify the complete set of loci involved in the genetic susceptibility to common disorders, sample sizes in the range of 2,000 to 60,000, and denser genetic maps, over 1 million SNPs will be needed. Despite the claims for “denser and larger,” the relatively large sample size of the studies performed so far and the wide genome coverage achieved suggests that, for some of the most deeply investigated disorders, the common genetic variants that underpin their genetic component have already been identified. It seems clear that some of these questions will be solved by simply analyzing larger sample sets with denser SNP arrays, and by resequencing loci showing associations in a large number of samples. However, it is obvious that we need to explore the genome for other sources of variability that could explain the strong genetic component of several of the common disorders. Among sources to be explored are noncoding RNAs, structural variants, and epigenetic changes.

Many Versions Account for the Human Genome Sequence

When the human genome sequence was publicized six years ago, it was openly claimed that genetic differences between individuals account for less than 0.1% of the DNA sequence [28,29], a total of about 3 million nucleotides. Certainly, the statement referred to, and inferred from, the types of markers that had been, until then, widely used to explore diversity, construct genetic maps, and identify the genes responsible for more than 2,000 human monogenic disorders (http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM). These markers included “old and new” types of polymorphisms, comprising restriction fragment length polymorphisms [30], variable number of tandem repeats or minisatellites [31], short tandem repeats or microsatellites [32,33], insertion/deletion polymorphisms [34], and the over 12 million SNPs that have been deposited in the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/). In the last three years, a new form of genetic variation has been extensively reported. Genome structural variation has been known at the cytogenetic and molecular levels for a long time [35-37], but its importance at a genome-wide scale was not discovered until recently [38,39], with the use of array-based comparative genomic hybridization and other types of genome-scanning technologies. This variability entails large segments of DNA, typically over one kilobase (kb) and up to several megabases (Mb) and it comprises insertions, deletions, translocations, and inversions of genomic material (Figure 1). So far, the most commonly identified types of variants are gains and loses of DNA, which are called CNVs [40]. Inversions are also likely to be important changes, with direct potential positional effects and suppression of meiotic recombination, but, with some exceptions [41,42], most efforts toward characterization of variants have so far been focused on other types of changes. Obviously, structural variants are not exclusive of humans and they have also been identified in other organisms [43,44].

Figure 1

Types of Genomic Structural Changes Affecting Segments of DNA, Leading to Deletions, Duplications, Inversions, and CNV Changes (Biallelic, Multillelic, and Complex)

The only segment that is constant is “A.” Segment “B” varies in orientation in the inversion. Segments “C” and “D” show different types of variation.

Types of Genomic Structural Changes Affecting Segments of DNA, Leading to Deletions, Duplications, Inversions, and CNV Changes (Biallelic, Multillelic, and Complex)

The only segment that is constant is “A.” Segment “B” varies in orientation in the inversion. Segments “C” and “D” show different types of variation. Fifteen comprehensive studies have explored structural variation in the human genome [38-41,45-52] (Table 2). These studies have used several approaches, mainly bacterial artificial chromosome (BAC) arrays, oligonucleotide arrays, SNP arrays, genotyping data, and computational alignment of genome sequences. There is wide variation of the coverage provided by the different methods and the level of polymorphism detected in the different studies (Figure 2). Many reasons account for these differences, including type of platform, genomic coverage, source of DNA samples (cell lines or fresh samples), control samples used by the different projects, algorithms employed, and statistical thresholds. Comparison of experimental platforms, algorithms, and published surveys has recently been reviewed [53]. It is clear that the analysis of structural variants is still in its infancy, as compared to SNPs, but we have to admit that CNV analyses have additional complexity, due to their heterogeneity and the poor coverage that they exhibit in the assembled individual genomes [47].

Table 2

Summary of Genome Scans to Study Structural Variations and CNVs of the Human Genome

Figure 2

Approaches Used for the Identification of CNVs and Other Types of Structural Changes in the Human Genome

Summary of Genome Scans to Study Structural Variations and CNVs of the Human Genome

Approaches Used for the Identification of CNVs and Other Types of Structural Changes in the Human Genome

Myriad methods and technologies have been employed to identify structural variants in the human genome. They are based on completely different experimental procedures and provide very different levels of resolution. The majority of findings (>80%) are attributable to a restricted number of high-throughput experiments with a limited resolution. The compilation of all reported variable regions is provided at several Web sites, including the UCSC (http://genome.ucsc.edu/) and Ensembl (http://www.ensembl.org/) genome browsers, and the most updated summary can be found at the Database of Genomic Variants (http://projects.tcag.ca/variation/), which lists 8,083 CNVs that correspond to 3,933 loci in the human genome assembly (7 September 2007). After the initial discovery that CNVs are common in the population, it was envisioned that CNVs might be traced using SNPs as proxies for different alleles of the structural changes. Although this is the case for some simple biallelic CNVs [40], the most common and polymorphic ones have a complex inheritance pattern and the SNPs located within do not always show Mendelian inheritance or are not in Hardy–Weinberg equilibrium. As a result of this, and also because of their identity with related sequences due to segmental duplications, many SNPs located at CNVs do not fulfill quality-control criteria and have been discarded in the design or in the analysis of genotyping experiments. Non-Mendelian behavior has also posed difficulties in the use of SNPs for tagging the inheritance of such variants. However, this abnormal behavior of markers has been used to successfully identify polymorphic deletions and inversions [42,50-52]. Since CNVs are not tagged easily by SNPs, many fall in regions that are not well defined in the available human genome sequence, and SNP content in commercial platforms is skewed towards “genotypable” SNPs present in the HapMap, it is likely that most GWASs (Table 3) have missed the potential contribution of CNVs to complex disorders. As mentioned above, our understanding of the organization of CNVs and their heritability is still very rudimentary. CNVs are likely to affect recombination, and the relationship with other markers might be relevant for common CNVs.

Table 3

Summary of Common Disorders for Which Associations to CNVs Have Been Reported

Summary of Common Disorders for Which Associations to CNVs Have Been Reported The current knowledge of CNVs is far from complete, because technological limitations of the approaches used so far to ascertain them have introduced an important bias towards medium-to-large-size CNVs. While technology has done very well for CNVs of sizes above 50 kb, smaller CNVs have hardly been detected. As further studies are performed covering regions below the 50-kb range, it is expected that a large number of additional CNVs, likely on the order of tens of thousands, will be detected (Figure 3). Considering the current human genome assembly, structural variants cover about 15% of the sequence (over 500 Mb). This figure is, however, imprecise, due to the lack of consensus in boundaries of CNV regions, the low level of resolution of clone arrays, and the near absence of replication of the reported data. On the basis of the expected size distribution of CNVs, they could likely affect up to one gigabase of sequence (∼1/3 of the genome). What is clear so far is that there is not a single human genome sequence and that several configurations, with alternative sequences at CNV regions, are present in the human population. Technologies that are able to screen the genome below this resolution will be essential. This should involve arrays specifically designed to interrogate at the 1–50-kb scale and sequencing specific regions with methods that allow the selection of DNA without previous knowledge of the sequence. In addition, efforts towards sequencing the genomes of different individuals to uncover their variability at the structural level are under way [54].

Figure 3

Expected and Observed Size Distribution of CNV Changes Identified to Date

Blue bars represent the frequencies of the currently identified CNVs in the size ranges depicted in the x-axis. A plausible scenario of variation in CNV size frequency is depicted as red vertical bars. An under-detection of variable fragments of small size (<50 kb) can be observed, which is likely due to technological limitations in the high-throughput assays used so far to identify CNVs, largely based on array CGH (Figure 2). Observed and expected CNVs that are >50 kb coincide, due to the powerful array methods, which cover the medium-to-large-size CNVs well. Dark blue bars represent the small-sized CNVs, which are more of a challenge to detect.

Expected and Observed Size Distribution of CNV Changes Identified to Date

Rare and Common CNVs Are Involved in Complex Disorders

CNVs have already been shown to be associated with several complex/common disorders. Interestingly, most of these findings have been obtained by specific analysis of candidate genes or regions. Rare CNVs have been detected in some families of patients affected by Parkinson disease, Alzheimer disease, and chronic pancreatitis. Multiple cases of patients with Parkinson disease due to genomic duplication or triplication of the alpha-synuclein gene (SNCA) have been reported to cause hereditary early-onset parkinsonism with dementia, demonstrating a direct relationship between SNCA gene dosage and disease progression [55-57]. Similarly, several cases of duplication of the amyloid precursor protein (APP), with a role in familial Alzheimer disease and in Down syndrome brain neurodegeneration, have been described in families with early-onset Alzheimer dementia with cerebral amyloid angiopathy [58-60]. Finally, some members of families affected by hereditary pancreatitis have duplications or triplications of the cationic trypsinogen gene (PRSS) [61]. It is clear that in these three common disorders, the CNVs associated with the respective diseases represent rare events, and are not the major mechanism for disease susceptibility. Thus, rare genomic rearrangement events could affect common disorders in a manner similar to what has been reported for monogenic diseases, such as Neurofibromatosis type 1, for which large deletions are detected in about 10% of patients [62]. However, since rare CNVs are abundant in the genome, they could represent an important source of variability with which to explore the relationship between candidate genes and disease, and therefore to define new pathophysiology pathways. Common CNVs have also been detected in people affected by certain other disorders. For example, variability in the susceptibility to HIV-1 infection has been related to copy number of the CCL3L1 gene [63]. Individuals with low copy numbers of the chemokine gene, relative to their ethnic background, are associated with markedly enhanced HIV-1/AIDS susceptibility. More recently, differences in copy number of the CCL3L1 chemokine have also been reported as a susceptibility factor for rheumatoid arthritis [64]. This region was not targeted in HapMap phases I and II and is not well covered by the Affymetrix and Illumina arrays; consequently, any attempt to perform association studies for HIV-1 susceptibility will likely fail in detecting a putative link with CCL3L1 copy variability (Figure 4). This region shows a large variability, not only in CCL3L1 copy number, but also in the genomic structure of individuals from different populations, as detected in the HapMap samples that have been genotyped [40]. In particular, the region is highly variable in the African population, with a large number of copy number gains in Africans and Asians, compared to Europeans.

Figure 4

Genomic Organization of the Chemokine Cluster on Human Chromosome 17, Containing the CCL3L1 Gene (Red Arrows), Which Shows Variability in Copy Number and Association to HIV-1 Infectivity and AIDS Susceptibility

CEU, European; HCB, Chinese; JPT, Japanese; YRI, African.

Genomic Organization of the Chemokine Cluster on Human Chromosome 17, Containing the CCL3L1 Gene (Red Arrows), Which Shows Variability in Copy Number and Association to HIV-1 Infectivity and AIDS Susceptibility

This region contains several segmental duplications and has been reported to vary in copy number in several studies. The Affymetrix 500K and Illumina HumanHap 550 arrays do not cover this region well, and completely lack SNPs in the CCL3L1/L3 gene (red dotted lines). A large number of gains and losses have been reported in the HapMap samples. Numbers in parentheses indicate the number of events involving genomic changes. CEU, European; HCB, Chinese; JPT, Japanese; YRI, African. Similarly, a copy number polymorphism including FCGR3 leads to a predisposition to glomerulonephritis in rats and humans, and to several types of autoimmune disorders, such as systemic lupus erythematosus (SLE), microscopic polyangiitis, and Wegener granulomatosis [65,66]. This region contains a complex 82-kb segmental duplication in the assembled genome sequence and CNVs have been detected in several studies in samples from the general population [40,46,48]. The coverage of the region is only partial in commercial arrays and the region of the CNVs and segmental duplication has a very low LD, with no blocks detected in HapMap populations. Recently, another CNV region has been shown to be associated with SLE. Variable copy number of the complement component C4 (C4A and C4B) leads to different susceptibilities to SLE [67,68]. C4 gene copy number varies from two to six for total C4, zero to five for C4A, and zero to four for C4B. Compared with healthy subjects, patients with SLE clearly have lower copy numbers of C4 and C4A, and SLE susceptibility is significantly increased among subjects with only two copies of total C4 but decreased in those with more than five copies of C4 [68]. Interestingly, variability in copy number for the C4 genes and the genetic association to markers in this MHC region on Chromosome 6p21.32 has been known for several years [67,69,70], but their complex organization and their relationship with SLE has not yet been examined in detail. The C4A gene is fully contained in a 33-kb segmental duplication that shows 99.6% identity between copies in the assembled sequence of the human genome. The region has also been reported to be polymorphic in two studies exploring CNV regions [40,41]. This 80-kb region is not covered by the Affymetrix and Illumina arrays, and only three SNPs have been genotyped in HapMap, precluding positive association findings to these genes in whole-genome association studies (Figure 5).

Figure 5

Schematic Representation of Two Genomic Regions That Involve CNVs Associated with SLE [65,66]

(B) The region of Chromosome 6p21, containing the C4A and C4B genes, is embedded in a region of complex genomic organization [67,69,70]. The region has been shown to contain segmental duplications and CNVs. The Affymetrix 500K and Illumina HumanHap 550 genotyping platforms do not cover this region, either (red dotted lines).

Schematic Representation of Two Genomic Regions That Involve CNVs Associated with SLE [65,66]

(A) The region of Chromosome 1 containing the FCGR3 gene cluster is highly variable and contains segmental duplications with a high sequence identity. Several CNVs have been reported that span this region. The genomic organization of the cluster is highly complex and not well solved in the current assembly of the genome sequence. The Affymetrix 500K and Illumina HumanHap 550 arrays do not cover this region well (red dotted lines). (B) The region of Chromosome 6p21, containing the C4A and C4B genes, is embedded in a region of complex genomic organization [67,69,70]. The region has been shown to contain segmental duplications and CNVs. The Affymetrix 500K and Illumina HumanHap 550 genotyping platforms do not cover this region, either (red dotted lines). Another report has detected variability in copy number of the beta defensin 2 gene (DEFB4) on Chromosome 8p23.1 in Crohn disease [71]. DEFB4 dosage is lower in colonic Crohn disease compared with controls, showing that a lower DEFB4 gene copy number predisposes to colonic Crohn disease through diminished beta-defensin expression. Again, for this locus, there is a cluster of segmental duplications, and most CNV studies have detected this region as being variable. This region, which spans about 1 Mb and contains a gap in the assembled genome sequence, has only four SNPs in the Affymetrix array and one in the Illumina array (not shown). Although the region was not detected in the GWAS for Crohn disease [2,16-18], it is obvious that this region was not satisfactorily covered by these arrays. Only the targeted analysis of the region using quantitative methods was able to uncover the link with Crohn disease [71]. Finally, several other studies exploring CNVs in common disorders are being performed and some findings in bipolar disorder [72] and breast cancer [73] have already been reported. Therefore, we expect that there will be a plethora of reports describing new associations between CNVs and common disorders and complex traits in the coming months to years. A common feature of the regions for complex/common disorders identified so far is the presence of both CNVs and segmental duplications. A clear association between duplicons and CNVs in the human genome has been reported [40]. This association is stronger for CNVs that are multiallelic or have a complex pattern. Interestingly, all CNV loci that have been found associated with common disorders are both complex and multiallelic. Thus, the development of assays for common/complex CNV loci could provide good tools for the analysis of common disorders. The mechanisms by which CNVs could contribute to disease are numerous [74]. Due to their location and nature, a significant fraction of CNVs are likely to have functional consequences, either by gene dosage alteration, disruption of genes, positional effects, uncovering deleterious alleles, or modulating the action of other sequences. We still have limited evidence of the role of CNVs in gene expression. Stranger and colleagues [75] have examined RNA levels in lymphoblastoid cell lines from 210 unrelated HapMap individuals and have used CNV data from these samples generated by the Structural Variation Consortium [40] to conclude that 18% of the variation in expression levels of ∼15,000 genes is attributable to copy number differences. This study represents the first attempt to evaluate the genome-wide impact of SNPs and CNVs on gene expression. A potential explanation for the relatively low contribution of CNVs to variability of gene expression as compared to SNPs in the study of Stranger and colleagues [75] is the limited resolution of the arrays used and the wide definition of CNV regions considered in the analysis.

Combination of SNP and CNV Genotyping in Common Disorders

Although a large number of SNPs for regions containing CNVs are listed in dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), most of them lack genotyping frequencies, have not been confirmed by other investigators, or fail during the design of multiplex genotyping assays. Many of these SNPs are located in segmental duplications and they correspond to paralogous sequence variants or SNPs that are copy specific [76]. As a consequence, most of these regions have systematically been excluded from the current high-throughput SNP typing assays. Many investigators in the field of the genetics of common disorders have realized the need to cover other types of variants in their genome scans. Commercial genotyping companies (mainly Affymetrix and Illumina) are redesigning their platforms to allocate probes for CNV regions and they now claim a genome-wide coverage of known and new CNVs. While this reflects the recent attention that CNVs have attracted in the genotyping field, the reliability of the coverage and the capacity of these arrays to discriminate between a wide range of copies of a given CNV has yet to be proven. This discrimination capacity is one of the main challenges to extracting the complexity of genomic structural variability and will be crucial for association studies. On the other hand, companies dedicated to array–comparative genomic hybridization (CGH) production (Nimblegen and Agilent) are developing denser arrays that could explore the complete genome, also offering flexibility in the incorporation of probes for targeted studies. A review about the different platforms available for CNV analysis has recently been published [77]. There are many reasons for and against the use of one over another. While genotyping platforms provide two products for the price of one, CGH arrays provide better signal accuracy, because they compare real samples in the same experiment. The choice depends on the specific status of the project, especially if a GWAS has already been performed with first-generation genotyping arrays, which have poor coverage in CNV regions. In these cases, CGH arrays should provide coverage of CNVs missed by the genotyping platforms. Indeed, several efforts are under way to screen, using CGH platforms, the WTCCC samples already genotyped with Affymetrix arrays (Figure 6).

Figure 6

CNV Characterization Strategies

(A) Scales of resolution at the nucleotide level and maximum number of loci interrogated by the different methods (only the most widely used approaches are shown).

(B) Diagram of different approaches in CNV analysis, either at the genome-wide scale or at individual/multiplex loci. Arrows indicate the deeper analysis that is needed after initial detection by one methodology or another.

DASH, dynamic allele-specific hybridization [80]; PRT, paralogue ratio test [81]; MAQ, multiple amplicon quantification [82]; qPCR, quantitative PCR.

CNV Characterization Strategies

(A) Scales of resolution at the nucleotide level and maximum number of loci interrogated by the different methods (only the most widely used approaches are shown). (B) Diagram of different approaches in CNV analysis, either at the genome-wide scale or at individual/multiplex loci. Arrows indicate the deeper analysis that is needed after initial detection by one methodology or another. DASH, dynamic allele-specific hybridization [80]; PRT, paralogue ratio test [81]; MAQ, multiple amplicon quantification [82]; qPCR, quantitative PCR. It is important to note that all the associations between CNVs and complex disorders reported so far have been unveiled through candidate gene or candidate region approaches. Indeed, only thorough investigations by groups working on the disorders or with specific interest in a concrete variable region have been able to dissect the fine spectrum of variability to provide a link with the phenotypes (Table 3). Although genotyping scans could be able to detect CNV regions, current approaches do not provide any kind of discrimination of the variability spectrum associated to these loci, and are therefore unable to distinguish copy numbers with respect to phenotype. Several methods allow quantification of CNVs, including multiplex ligation-dependent probe amplification (MLPA), multiplex amplification and probe hybridization (MAPH), quantitative multiplex PCR of short fluorescent fragment (QMPSF), dynamic allele-specific hybridization, semiquantitative fluorescence in situ hybridization (SQ-FISH), paralogue ratio test, and multiple amplicon quantification, among others (Figure 6). Precise definition of breakpoints can be achieved by PFGE (pulsed field gel electrophoresis), regular Southern blotting, and sequencing. Ultrasequencing technologies (based on synthesis, GS-FLEX [Roche-454] and 1G Solexa Genetic Analyzer [Solexa-Illumina]; or on ligation, SOLiD [Applied Biosystems]) should also provide this level of resolution, but specific experimental trials have to be developed to achieve a successful resequencing and assembly of regions with the high level of plasticity and identity of CNVs. Tailored approaches to detect the variability in copy number of common CNV loci, and the use of genetic approaches that explore the differences between phenotypes at a whole-genome scale should be pursued. A diagram of genome-wide and locus-specific approaches to detect and analyze CNVs is proposed in Figure 6. Improvements in the field of CNVs are clearly needed both for the genome-wide coverage and for the precise quantification of specific CNVs. Progress in the identification of CNVs associated with complex disorders will likely take place at a rapid pace in the next few months to years. Currently available tools will only be able to disclose variants that, because of their genomic (large rearrangements) and genetic characteristics (de novo cases), are easily discovered [78,79]. Thus, the systematic exploration of multiallelic CNVs, with precise characterization of copy numbers, should become essential when exploring the role of CNV in many traits and diseases. Finally, since many CNVs contain genes with an important role in adaptation to the environment and response to external effects [40], it is tempting to speculate that CNV alleles could have a major role in disease predisposition and response to drugs.

Conclusions

Recent progress in the identification of loci showing association to complex disorders has provided not only a proof of concept of GWASs, but has also led to the identification of several new biological associations. The need for larger sample sets and better coverage of genome variability at the nucleotide level, including resequencing, is likely to be achieved after this initial first round of GWASs. However, the complete spectrum of genomic variability will not be elucidated by this approach. Several CNVs have been shown to be implicated in common disorders, as rare and common genomic changes, providing biological support to several pathophysiological pathways. New types of arrays, covering CNVs and segmental duplications, will facilitate the identification of regions that contain CNVs, but will likely still fail to detect associations with a wide range of variability in copy number. A comprehensive tailored analysis of common and rare CNVs will not only complement GWASs using SNPs and sequencing, but will also provide a new, more powerful tool for examining the genetic components of common disorders and complex traits in humans and other organisms.

82 in total

1. Recent segmental duplications in the human genome.

Authors: Jeffrey A Bailey; Zhiping Gu; Royden A Clark; Knut Reinert; Rhea V Samonte; Stuart Schwartz; Mark D Adams; Eugene W Myers; Peter W Li; Evan E Eichler
Journal: Science Date: 2002-08-09 Impact factor: 47.728

2. Efficiency and power in genetic association studies.

Authors: Paul I W de Bakker; Roman Yelensky; Itsik Pe'er; Stacey B Gabriel; Mark J Daly; David Altshuler
Journal: Nat Genet Date: 2005-10-23 Impact factor: 38.330

3. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene.

Authors: Richard H Duerr; Kent D Taylor; Steven R Brant; John D Rioux; Mark S Silverberg; Mark J Daly; A Hillary Steinhart; Clara Abraham; Miguel Regueiro; Anne Griffiths; Themistocles Dassopoulos; Alain Bitton; Huiying Yang; Stephan Targan; Lisa Wu Datta; Emily O Kistner; L Philip Schumm; Annette T Lee; Peter K Gregersen; M Michael Barmada; Jerome I Rotter; Dan L Nicolae; Judy H Cho
Journal: Science Date: 2006-10-26 Impact factor: 47.728

4. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24.

Authors: Julius Gudmundsson; Patrick Sulem; Andrei Manolescu; Laufey T Amundadottir; Daniel Gudbjartsson; Agnar Helgason; Thorunn Rafnar; Jon T Bergthorsson; Bjarni A Agnarsson; Adam Baker; Asgeir Sigurdsson; Kristrun R Benediktsdottir; Margret Jakobsdottir; Jianfeng Xu; Thorarinn Blondal; Jelena Kostic; Jielin Sun; Shyamali Ghosh; Simon N Stacey; Magali Mouy; Jona Saemundsdottir; Valgerdur M Backman; Kristleifur Kristjansson; Alejandro Tres; Alan W Partin; Marjo T Albers-Akkers; Javier Godino-Ivan Marcos; Patrick C Walsh; Dorine W Swinkels; Sebastian Navarrete; Sarah D Isaacs; Katja K Aben; Theresa Graif; John Cashy; Manuel Ruiz-Echarri; Kathleen E Wiley; Brian K Suarez; J Alfred Witjes; Mike Frigge; Carole Ober; Eirikur Jonsson; Gudmundur V Einarsson; Jose I Mayordomo; Lambertus A Kiemeney; William B Isaacs; William J Catalona; Rosa B Barkardottir; Jeffrey R Gulcher; Unnur Thorsteinsdottir; Augustine Kong; Kari Stefansson
Journal: Nat Genet Date: 2007-04-01 Impact factor: 38.330

5. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon.

Authors: Klaus Fellermann; Daniel E Stange; Elke Schaeffeler; Hartmut Schmalzl; Jan Wehkamp; Charles L Bevins; Walter Reinisch; Alexander Teml; Matthias Schwab; Peter Lichter; Bernhard Radlwimmer; Eduard F Stange
Journal: Am J Hum Genet Date: 2006-07-12 Impact factor: 11.025

6. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.

Authors: Richa Saxena; Benjamin F Voight; Valeriya Lyssenko; Noël P Burtt; Paul I W de Bakker; Hong Chen; Jeffrey J Roix; Sekar Kathiresan; Joel N Hirschhorn; Mark J Daly; Thomas E Hughes; Leif Groop; David Altshuler; Peter Almgren; Jose C Florez; Joanne Meyer; Kristin Ardlie; Kristina Bengtsson Boström; Bo Isomaa; Guillaume Lettre; Ulf Lindblad; Helen N Lyon; Olle Melander; Christopher Newton-Cheh; Peter Nilsson; Marju Orho-Melander; Lennart Råstam; Elizabeth K Speliotes; Marja-Riitta Taskinen; Tiinamaija Tuomi; Candace Guiducci; Anna Berglund; Joyce Carlson; Lauren Gianniny; Rachel Hackett; Liselotte Hall; Johan Holmkvist; Esa Laurila; Marketa Sjögren; Maria Sterner; Aarti Surti; Margareta Svensson; Malin Svensson; Ryan Tewhey; Brendan Blumenstiel; Melissa Parkin; Matthew Defelice; Rachel Barry; Wendy Brodeur; Jody Camarata; Nancy Chia; Mary Fava; John Gibbons; Bob Handsaker; Claire Healy; Kieu Nguyen; Casey Gates; Carrie Sougnez; Diane Gage; Marcia Nizzari; Stacey B Gabriel; Gung-Wei Chirn; Qicheng Ma; Hemang Parikh; Delwood Richardson; Darrell Ricke; Shaun Purcell
Journal: Science Date: 2007-04-26 Impact factor: 47.728

7. Strong association of de novo copy number mutations with autism.

Authors: Jonathan Sebat; B Lakshmi; Dheeraj Malhotra; Jennifer Troge; Christa Lese-Martin; Tom Walsh; Boris Yamrom; Seungtai Yoon; Alex Krasnitz; Jude Kendall; Anthony Leotta; Deepa Pai; Ray Zhang; Yoon-Ha Lee; James Hicks; Sarah J Spence; Annette T Lee; Kaija Puura; Terho Lehtimäki; David Ledbetter; Peter K Gregersen; Joel Bregman; James S Sutcliffe; Vaidehi Jobanputra; Wendy Chung; Dorothy Warburton; Mary-Claire King; David Skuse; Daniel H Geschwind; T Conrad Gilliam; Kenny Ye; Michael Wigler
Journal: Science Date: 2007-03-15 Impact factor: 47.728

8. Genome-wide association study identifies novel breast cancer susceptibility loci.

Authors: Douglas F Easton; Karen A Pooley; Alison M Dunning; Paul D P Pharoah; Deborah Thompson; Dennis G Ballinger; Jeffery P Struewing; Jonathan Morrison; Helen Field; Robert Luben; Nicholas Wareham; Shahana Ahmed; Catherine S Healey; Richard Bowman; Kerstin B Meyer; Christopher A Haiman; Laurence K Kolonel; Brian E Henderson; Loic Le Marchand; Paul Brennan; Suleeporn Sangrajrang; Valerie Gaborieau; Fabrice Odefrey; Chen-Yang Shen; Pei-Ei Wu; Hui-Chun Wang; Diana Eccles; D Gareth Evans; Julian Peto; Olivia Fletcher; Nichola Johnson; Sheila Seal; Michael R Stratton; Nazneen Rahman; Georgia Chenevix-Trench; Stig E Bojesen; Børge G Nordestgaard; Christen K Axelsson; Montserrat Garcia-Closas; Louise Brinton; Stephen Chanock; Jolanta Lissowska; Beata Peplonska; Heli Nevanlinna; Rainer Fagerholm; Hannaleena Eerola; Daehee Kang; Keun-Young Yoo; Dong-Young Noh; Sei-Hyun Ahn; David J Hunter; Susan E Hankinson; David G Cox; Per Hall; Sara Wedren; Jianjun Liu; Yen-Ling Low; Natalia Bogdanova; Peter Schürmann; Thilo Dörk; Rob A E M Tollenaar; Catharina E Jacobi; Peter Devilee; Jan G M Klijn; Alice J Sigurdson; Michele M Doody; Bruce H Alexander; Jinghui Zhang; Angela Cox; Ian W Brock; Gordon MacPherson; Malcolm W R Reed; Fergus J Couch; Ellen L Goode; Janet E Olson; Hanne Meijers-Heijboer; Ans van den Ouweland; André Uitterlinden; Fernando Rivadeneira; Roger L Milne; Gloria Ribas; Anna Gonzalez-Neira; Javier Benitez; John L Hopper; Margaret McCredie; Melissa Southey; Graham G Giles; Chris Schroen; Christina Justenhoven; Hiltrud Brauch; Ute Hamann; Yon-Dschun Ko; Amanda B Spurdle; Jonathan Beesley; Xiaoqing Chen; Arto Mannermaa; Veli-Matti Kosma; Vesa Kataja; Jaana Hartikainen; Nicholas E Day; David R Cox; Bruce A J Ponder
Journal: Nature Date: 2007-06-28 Impact factor: 49.962

9. FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity.

Authors: Manuela Fanciulli; Penny J Norsworthy; Enrico Petretto; Rong Dong; Lorraine Harper; Lavanya Kamesh; Joanne M Heward; Stephen C L Gough; Adam de Smith; Alexandra I F Blakemore; Philippe Froguel; Catherine J Owen; Simon H S Pearce; Luis Teixeira; Loic Guillevin; Deborah S Cunninghame Graham; Charles D Pusey; H Terence Cook; Timothy J Vyse; Timothy J Aitman
Journal: Nat Genet Date: 2007-05-21 Impact factor: 38.330

10. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans.

Authors: Timothy J Aitman; Rong Dong; Timothy J Vyse; Penny J Norsworthy; Michelle D Johnson; Jennifer Smith; Jonathan Mangion; Cheri Roberton-Lowe; Amy J Marshall; Enrico Petretto; Matthew D Hodges; Gurjeet Bhangal; Sheetal G Patel; Kelly Sheehan-Rooney; Mark Duda; Paul R Cook; David J Evans; Jan Domin; Jonathan Flint; Joseph J Boyle; Charles D Pusey; H Terence Cook
Journal: Nature Date: 2006-02-16 Impact factor: 49.962

62 in total

1. Association of genetic copy number variations at 11 q14.2 with brain regional volume differences in an alcohol use disorder population.

Authors: David Boutte; Vince D Calhoun; Jiayu Chen; Amithrupa Sabbineni; Kent Hutchison; Jingyu Liu
Journal: Alcohol Date: 2012-06-23 Impact factor: 2.405

2. Ohnologs in the human genome are dosage balanced and frequently associated with disease.

Authors: Takashi Makino; Aoife McLysaght
Journal: Proc Natl Acad Sci U S A Date: 2010-05-03 Impact factor: 11.205

3. GWAS: heritability missing in action?

Authors: Angus J Clarke; David N Cooper
Journal: Eur J Hum Genet Date: 2010-03-17 Impact factor: 4.246

4. RNA-mediated epigenetic regulation of DNA copy number.

Authors: Mariusz Nowacki; Joanna E Haye; Wenwen Fang; Vikram Vijayan; Laura F Landweber
Journal: Proc Natl Acad Sci U S A Date: 2010-11-15 Impact factor: 11.205

5. Copy number variation of individual cattle genomes using next-generation sequencing.

Authors: Derek M Bickhart; Yali Hou; Steven G Schroeder; Can Alkan; Maria Francesca Cardone; Lakshmi K Matukumalli; Jiuzhou Song; Robert D Schnabel; Mario Ventura; Jeremy F Taylor; Jose Fernando Garcia; Curtis P Van Tassell; Tad S Sonstegard; Evan E Eichler; George E Liu
Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043

Review 6. Translational mini-review series on the immunogenetics of gut disease: immunogenetics of coeliac disease.

Authors: P C Dubois; D A van Heel
Journal: Clin Exp Immunol Date: 2008-08 Impact factor: 4.330

Review 7. Genetic epidemiology in aging research.

Authors: M Daniele Fallin; Amy Matteini
Journal: J Gerontol A Biol Sci Med Sci Date: 2009-01-23 Impact factor: 6.053

Review 8. The HapMap and genome-wide association studies in diagnosis and therapy.

Authors: Teri A Manolio; Francis S Collins
Journal: Annu Rev Med Date: 2009 Impact factor: 13.739

9. A genome-wide association study on copy-number variation identifies a 11q11 loss as a candidate susceptibility variant for colorectal cancer.

Authors: C Fernandez-Rozadilla; J B Cazier; I Tomlinson; A Brea-Fernández; M J Lamas; M Baiget; L A López-Fernández; J Clofent; L Bujanda; D Gonzalez; L de Castro; K Hemminki; X Bessa; M Andreu; R Jover; R Xicola; X Llor; V Moreno; A Castells; S Castellví-Bel; A Carracedo; C Ruiz-Ponte
Journal: Hum Genet Date: 2013-11-12 Impact factor: 4.132

10. Missingness in the T1DGC MHC fine-mapping SNP data: association with HLA genotype and potential influence on genetic association studies.

Authors: I James; E McKinnon; S Gaudieri; G Morahan
Journal: Diabetes Obes Metab Date: 2009-02 Impact factor: 6.577