Literature DB >> 26163405

Complex and multi-allelic copy number variation in human disease.

Abstract

Hundreds of copy number variants are complex and multi-allelic, in that they have many structural alleles and have rearranged multiple times in the ancestors who contributed chromosomes to current humans. Not only are the relationships of these multi-allelic CNVs (mCNVs) to phenotypes generally unknown, but many mCNVs have not yet been described at the basic levels-alleles, allele frequencies, structural features-that support genetic investigation. To date, most reported disease associations to these variants have been ascertained through candidate gene studies. However, only a few associations have reached the level of acceptance defined by durable replications in many cohorts. This likely stems from longstanding challenges in making precise molecular measurements of the alleles individuals have at these loci. However, approaches for mCNV analysis are improving quickly, and some of the unique characteristics of mCNVs may assist future association studies. Their various structural alleles are likely to have different magnitudes of effect, creating a natural allelic series of growing phenotypic impact and giving investigators a set of natural predictions and testable hypotheses about the extent to which each allele of an mCNV predisposes to a phenotype. Also, mCNVs' low-to-modest correlation to individual single-nucleotide polymorphisms (SNPs) may make it easier to distinguish between mCNVs and nearby SNPs as the drivers of an association signal, and perhaps, make it possible to preliminarily screen candidate loci, or the entire genome, for the many mCNV-disease relationships that remain to be discovered.

Entities: Chemical Disease Gene Mutation Species

Keywords: CNV genotyping; association; ddPCR; mCNV; multi-allelic copy number variation; optical mapping

Mesh：

Year: 2015 PMID： 26163405 PMCID： PMC4576757 DOI： 10.1093/bfgp/elv028

Source DB: PubMed Journal: Brief Funct Genomics ISSN： 2041-2649 Impact factor: 4.241

Introduction

Human genomes have thousands of deletion and duplication polymorphisms larger than 1 kb. These so-called copy number variations (CNVs) cause many segments (collectively spanning as much as 0.78% of base pairs [1]) to differ in copy number between any two individuals’ genomes and can impact phenotypes by causing gene dosage and structure to vary among individuals. Rare and de novo CNVs have well-known roles in disease; many associate to disease phenotypes with strong odds ratios (2–30) [2-4], though typically with partial penetrance and variable expressivity. However, most of the CNV in any individual’s genome arises from a reservoir of polymorphisms that are common, ancient and stably inherited [5]. A majority of these inherited CNVs are simple, bi-allelic CNVs originating from a single ancestral deletion or duplication. Analyses suggest that the majority of these are benign, with a subset appearing to have modest effects on phenotypes, similar to the effects of other common variants [6]. An intriguing and understudied subset of common CNVs consists of loci that have many structural alleles and have rearranged multiple (perhaps many) times in human ancestors. A recent genome-wide survey based on whole genome sequencing (WGS) data from Phase 1 of the 1000 Genomes Project [7] found 1356 of these CNVs, out of a total of 8659 CNVs found in the genome [8]. These multi-allelic CNVs (mCNVs) vary widely in copy number, in patterns that imply the existence of three, four, five or more segregating alleles. Of the 1356 mCNVs found, 121 appeared to have four or more alleles, and 45 appeared to have five or more [8]. When mCNVs have been visualized by fiber fluorescence in situ hybridization (FISH), they have often been found to involve tandem or inverted duplications of a genomic segment [9-12]. Some of these duplications have been estimated (from sequencing data) to have up to 50 copies, though the great majority appear to be present in copy numbers of 0–12 [6, 8, 13]. Though mCNVs are a minority of all structural variants, they account for 88% of human variation in gene dosage [8]. Furthermore, mCNVs are disproportionately likely to encompass genes, and the great majority of gene-encompassing mCNVs affect the RNA expression levels of the genes they contain [8]. Whereas the analysis of simple forms of CNV is today mature—measurements using molecular analysis (for rare CNVs) or statistical imputation (for common CNVs) are now routine in genetic studies [5, 14–17]—complex and multi-allelic forms of CNV represent a frontier in genome analysis. Not only are the relationships of mCNVs to phenotypes generally unknown, but also most mCNVs still need to be described at the basic levels—alleles, allele frequencies, molecular features—that support genetic study. Fundamental challenges lie in ascertaining the structural forms of each locus, defining the alleles that are present and developing molecular and computational strategies to accurately analyze them with the scale and precision required to conclusively infer their relationships with phenotypes.

Candidate gene studies of mCNV associations

To date, most reported disease-to-mCNV associations have been ascertained through candidate gene studies. As a result, a handful of genes have received most of the research attention, likely because of their already-known or hypothesized roles in diseases of interest. These genes include FCGR3B (binds the Fc region of gamma immunoglobulins), CCL3L1 [ligand of the co-receptor for the human immunodeficiency virus (HIV)], beta-defensins (cluster of microbicidal and cytotoxic peptides), HBA1/2 (α-chain of hemoglobin) and C4 (part of the complement pathway) [18-31]. The cohort sizes in these studies have ranged from 50 to 2807, with a trend toward the initial studies having fewer samples and the attempted replication studies having more (Table 1).

Table 1.

Notable mCNV disease associations and their replication studies

Notable mCNV disease associations and their replication studies To date, the study of mCNV-to-disease associations has resembled the study of single-nucleotide polymorphism (SNP) associations in the pre-genome-wide association study (GWAS) era. Before about 2005, SNP studies focused on candidate genes and variants that were typed in small cohorts. Such studies had a sobering track record: thousands of associations were reported, yet only a handful replicated in other candidate-gene studies or in later well-powered genome-wide association studies [32-34]. In retrospect, science was not good at guessing which genes contribute to genetically complex phenotypes; it took unbiased genome-wide surveys to identify such genes. In the end, publication biases (particularly the increased likelihood of publication and visibility for positive results, relative to negative results), combined with modest statistical thresholds and large numbers of hypotheses being tested across the field, made it likely that many studies would find nominal levels of association—even in the absence of real underlying genetic relationships. These sobering lessons are worth considering when thinking about the trajectory of disease-mCNV analysis. Like SNP candidate-gene studies a decade ago, only a handful mCNV-to-disease associations have reached the level of acceptance defined by replications in independent cohorts by independent groups of investigators [20, 23, 27, 28, 35]. A compelling example of the challenges of replicating can be found in the Wellcome Trust Case Control (WTCCC) study, which used an array-based CNV genotyping technology to perform a GWAS of thousands of CNVs in eight common diseases. Despite good copy number measurements (as discussed below) and a larger sample size than earlier studies (approximately 2000 cases), the WTCCC study did not replicate three previously published associations (FCGR3B on rheumatoid arthritis, CCL3L1 on rheumatoid arthritis and β-defensins on Crohn’s disease). Non-replication has also vexed other associations, another example being CCL3L1’s impact on HIV-related phenotypes [25].

Case study in replication: CCL3L1 and HIV

One of the most well-known mCNV associations was reported in 2005 in Science [25]. CCL3L1, a gene encoding a ligand to the co-receptor for the HIV virus, was found to range from 0 to 14 copies in diploid genomes. Having a below-average CCL3L1 copy number was found to associate with increased HIV susceptibility and faster progression from HIV+ status to acquired immune deficiency syndrome (AIDS) [25]. After publication, many follow-up studies sought to replicate and expand on the results (Table 2). New phenotypes were tested in the same cohorts and new cohorts were tested for the same associations, each having somewhat limited success, resulting in a complicated pattern of replication and non-replication that hinders final interpretation [11, 36–49]. Separate studies have attempted to track down the causes of the diverging results, concluding that certain analytical practices—such as genotyping cases and controls separately and rounding rough copy number measurements to the nearest integer—are likely to have generated false-positive associations [48-53]. Our own analysis suggests an additional pattern: studies finding positive associations have been published visibly and cited many times, while studies finding negative associations (when published at all) have been less visible. Debate about CCL3L1’s impact on HIV is likely to continue, and examples such as CCL3L1 highlight the need for experimental methods and designs that ensure durable association results.

Table 2.

Results of studies assessing whether CCL3L1 copy number affects HIV-related phenotypes

Toward durable association results for mCNVs

Association analysis with precise molecular data

Negative replication studies reporting null results have often cited the imprecision or inaccuracy of the molecular methods used in the original positive-result study as a reason for non-replication [49, 54–56]. For mCNVs, molecular methods have often tended to yield a rough estimate (rather than a precise measurement) of a gene’s copy number, likely because counting copies is much more challenging than determining the presence or absence of an allele. The difference between a copy number call of 4 and 5 is only 20%, a difference that corresponds to a fraction of a polymerase chain reaction (PCR) cycle, making it difficult for real-time quantitative PCR (qPCR; the most frequently used method for analyzing mCNVs in studies to date) to detect these differences with the accuracy required for successful analysis [35, 50, 51, 54, 57, 58]. With the exception of unusual circumstances, such as somatic mosaicism, the number of copies of a genomic segment within an individual’s genome is always an integer. When measurements of copy number are sufficiently imprecise as to form a continuously varying distribution (i.e. a bell-shaped distribution, rather than a distribution with discrete peaks at integers), these measurements will usually hide technical confounds caused by experimental batch effects, DNA-isolation batch effects and other unknown factors [50, 51, 53, 54, 57] (Figure 1). The pitfalls of continuously varying measurements have long been recognized in SNP analysis [52], and poorly clustering SNP assays are systematically discarded during SNP QC. But the lack of better mCNV data has often meant that a similar level of fastidiousness would mean doing no mCNV study at all.

Figure 1

Imprecise copy numbers can hide artifacts. When experimental measurements of a gene's copy number in each genome are a rough estimate (A) rather than a more precise, multi-modally distributed measurement (B), confounding technical influences are challenging to recognize. In these simulated data, Groups 1 and 2 (e.g. cases and controls) appear in the first analysis to exhibit different distributions of copy numbers (P = 4.9 × 10−13); the second, more precise analysis shows that the apparent difference between the groups is entirely technical in nature. A confound causing a 10% shift in the copy numbers of the cases is detectable with the precise copy numbers, but may be mistaken for a real effect with the imprecise calls. Note that this confounding occurs even though the measurements by the two methods are broadly correlated with each other (r2 = 0.90). (A colour version of this figure is available online at: http://bfg.oxfordjournals.org) More precise molecular methods are becoming available, though they have not yet been widely adopted. Notably, the paralog ratio test (PRT), which uses paralogous, copy-number-invariant sequences elsewhere in the genome as embedded controls to carefully calibrate copy number measurements [59], appears to produce highly accurate copy number measurements. PRT has been used effectively in mCNV association studies, often giving copy number measurements with sufficient resolution to detect and remove batch effects [11, 28, 50, 51, 54–56, 58, 60, 61]; in fact, PRT was used to produce one of the most well-supported mCNV-disease results to date, the association of psoriasis with β-defensin gene copy number [28, 60]. We are surprised that PRT has not been more widely adopted, though its application is limited to loci that have copy-number-invariant paralogous sequences elsewhere in the genome. Another emerging technique, droplet digital PCR (ddPCR) [62, 63], also appears to offer a profound improvement over real-time PCR for mCNV analysis and may be applicable to more genomic loci than PRT. In ddPCR, a PCR reaction with primers and fluorescent probes for each sequence of interest (e.g. a CNV and a two-copy-control locus) is partitioned into thousands of nanoliter-sized droplets at a sufficient dilution that most droplets contain just 0 or 1 copy of the locus of interest. After thermocycling, the number of fluorescent droplets is counted, supporting a calculation of the copy number of the target sequence in the DNA sample. The technique has been used to measure the precise integer copy number of copy-number-variable segments within the 17q21.31 inversion region and several other loci [8, 63–66]. Measurements of mCNVs by ddPCR appear to be strongly supported by analysis of the same samples using WGS: measurements from the two techniques exhibit not just a rough correlation (a standard that does not report on artifactual influences), but more importantly, a precise agreement on the integer copy number present in each genome [8, 66]. As large disease studies based on WGS are just beginning, it is simultaneously becoming possible to accurately detect and measure mCNVs genome-wide using new sequencing analysis methods [8, 67]. Though read depth of coverage has long been known to correlate roughly with the copy number of genomic segments [13], recent analytical innovations allow precise calibration of this signal to meet the exacting standards of mCNV genotyping [8, 67]. These methods can be used to measure any particular locus relatively quickly (after the sequencing has been done) and allow for genotyping refinement, in that certain parameters of the analysis can be adjusted and optimized until the copy numbers cluster at integers [8, 67]; this is sometimes necessary because mCNVs can contain elements that frustrate both computational and PCR-based approaches, such as stretches of extensive homology and varying breakpoints [68]. WGS remains expensive though, so it may be some time before WGS studies reach the sample sizes necessary to discover genetic influences on highly polygenic diseases at the significance thresholds required, given genome-wide multiple hypothesis testing.

Understanding the structural alleles

mCNVs are often complex, involving combinations of duplications, deletions, insertions and inversions [5, 10, 13, 66, 68, 69]. For example, the 17q21.31 inversion region, at which genetic markers associate with female fertility [70], recombination rates [70-72] and neurological diseases [73, 74], has nine structural forms that affect five genes through various numbers of duplications, sequence changes and a megabase-scale inversion [66, 69]. The 17q21.31 locus is one of the only complex CNVs for which a long series of complex structural alleles has been inferred; however, initial investigations into other loci, such as the amylase locus, FCGR3B/3A, CCL3L1 and C4 [9–11, 26, 55, 75], suggest that such complexity might be widespread (Figure 2).

Figure 2

Examples of the alleles of complex loci. Boettger et al. [66] identified the common structural haplotypes of the 17q21.31 region using sequence analysis and ddPCR; similar conclusions were reached independently by Steinberg et al. [69]. Usher et al. [76] assembled the haplotypes of the amylase locus using ddPCR, sequence analysis and optical mapping; similar conclusions were reached independently by Carpenter et al. [58]. Both Perry et al. [9] and Aklillu et al. [11] performed fiber FISH experiments on the CCL3L1 locus, inferring the haplotypes displayed. (A colour version of this figure is available online at: http://bfg.oxfordjournals.org) Designing an assay to a single gene within an mCNV without knowing all of the mCNV’s structural forms is analogous to flying blind. Sequence variants that are present on the haplotypes that are not in the human reference sequence can cause inaccurate gene measurements if an assay is in, or crosses a breakpoint of, one such variant. In the case of the amylase and C4 loci, insertions and deletions within the resident genes, as well as the extensive homology of the resident genes, can interfere with the genotyping of a single gene target [10, 27]. In the same vein, at the CCL3L1 locus, a CCL3L pseudogene may interfere with obtaining accurate copy number measurements [51]. Therefore, a challenging yet important first step of any association study will ideally be to identify the actual structural forms of the mCNV of interest. Though this is a challenging problem, it can be assisted by investigations of bacterial artificial chromosomes and cosmids [69, 77, 78], haplotype assembly from sequencing data [66, 79], techniques such as fiber FISH [9], optical mapping [80] or a combination of these approaches. Regardless of the method, knowing the alleles—the fundamental units of most genetic analysis—will be an important basis for conclusions about association. Though identifying the structural forms of a complex CNV is a challenging problem, the scientific yield will likely reward the effort. Wherever an mCNV influences a disease phenotype, its various structural alleles are likely to have different magnitudes of effect (such as varying odds ratios), creating a natural allelic series of growing phenotypic impact. This could in principle be utilized to help determine whether an mCNV or the sequence variants around it are the true drivers of an association signal—a scientific opportunity that is not possible with most SNPs and bi-allelic CNVs, which often have near perfect linkage disequilibrium (LD) with many other variants that hinders fine-mapping and the evaluation of causality. In addition, such an allelic series could give investigators a set of natural predictions about the direction of effect and testable hypotheses, about the extent to which each allele of an mCNV predisposes to a phenotype. These natural allelic series would most likely be based on the number of copies of a particular gene. However, a simple relationship to gene copy number may not be the only effect at an mCNV locus. For example, reduced FCGR3B copy number is associated with systemic lupus erythematosus, an effect that appears to be caused by a fusion gene created by the deletion of FCGR3B on one allele [81]. In this case, a lack of an allelic series of growing phenotypic impact based on FCGR3B copy number, which ranges from about 0 to 5, could have assisted in pinpointing the functional variant [81].

Using information in SNPs and haplotypes

The SNPs near mCNVs may, at many loci, offer substantial information that is mostly unexploited [8]. SNP genotyping is a mature, reliable technology that has already been applied to millions of genomes [82, 83]. While an individual SNP cannot serve as a good proxy for a multi-allelic variant, it is nonetheless likely that the individual structural alleles of an mCNV arose on specific SNP haplotypes. Depending on the mutation rate of the mCNV, the frequency of recombination near the mCNV and the age and number of structural alleles at the locus, the structural alleles may continue to bear relationships to surrounding genetic markers [8, 76]. For some mCNVs, it may be possible to impute their alleles from flanking SNP haplotypes; in other words, using the genotypes of the surrounding SNPs, one may be able to estimate the copy number or structural allele present at the mCNV for a given individual [8, 66]. Unlike imputing SNPs, imputing mCNVs tends to be only partially (rather than perfectly) predictive, and its efficacy depends on the mCNV’s evolutionary history—the more alleles, the higher the copy number, and the wider the copy number range, the more limited imputation’s efficacy appears to be [8]. Importantly, a SNP or SNP haplotype’s ability to capture an mCNV should not be thought of as a binary ‘true’ or ‘false’, but as a continuum. With statistical power arising from both r2 and sample size, and with some cohorts having SNP data from as many as 300 000 individuals [83, 84], even a SNP with a low r2 might be used to evaluate the plausibility of an mCNV’s association with a disease.

Case study in next generation association techniques: AMY1 and obesity

Humans have three amylase genes (AMY2B, AMY2A and AMY1) responsible for digesting starch into sugar. Each amylase gene varies widely in copy number, with AMY1 varying from 2 to 17 copies [10, 58, 76, 85], AMY2A from 0 to 4 [58, 76] and AMY2B from 2 to 6 [58, 76]. Higher AMY1 copy number has been observed in three populations with starch-rich ancestral diets [10], and two recent studies from the same group reported that increased copy number of AMY1 decreases the risk of obesity [75, 86], though in different ways: in the initial study, the association involved a shifting of the entire copy-number distribution, but the result in the follow-up study arose entirely from a small subset of samples (all lean) with extremely high AMY1 copy number. Combined, these studies used almost 5000 samples, much larger than the average candidate-gene study; however, they used qPCR, a technique that has been shown, with PRT and fiber FISH, to give imprecise copy numbers at this locus [58]. Two follow-up studies applying analytical principles similar to those outlined in this review, concluded that the pattern of copy-number variation at the locus was different from that reported in the earlier work [58, 76]. Whole genome sequence analysis, ddPCR and PRT each revealed an intriguing distribution of AMY1 copy number in which odd copy numbers are four times more common than even numbers. This distribution had never been detected with qPCR [10, 75], but optical mapping [76] and fiber FISH [58] confirmed the haplotypes inferred from the new analysis. Moreover, analysis using the data from the improved methods showed that some SNPs do correlate (modestly) with the copy number of AMY1, and that if AMY1 copy number influences body mass index (BMI), these SNPs would have been 99.9% likely to associate with BMI in the GIANT consortium GWAS of >300 000 individuals [84], yet did not [76]. In addition, association analyses using the improved molecular methods in three new cohorts with 99% power to detect the reported effect found no association [76].

An exciting future for mCNVs: toward genome-wide studies of mCNVs in disease

As the study of SNPs did over the past 10 years, the study of mCNVs might soon be able to move toward an effective genome-wide model. Two large-scale, array-based studies have addressed the challenges of obtaining genome-wide association information on mCNVs. The WTCCC analyzed approximately 2000 cases for each of eight common diseases, and Zanda et al. analyzed approximately 4000 families with type 1 diabetes [6, 87]. Though the sample sizes were much larger than earlier mCNV studies, they were smaller than what has been required to find most genetic influences on complex, polygenic phenotypes, and the studies found no novel associations. However, both studies identified CNVs at several loci already implicated in GWAS, which serves as an effective positive control [32, 88, 89]. The WTCCC and Zanda studies provide useful knowledge about how analysis methods can find and cope with potential artifacts. CNVs with duplications that had dispersed onto the sex chromosomes caused false associations when the sex ratio was not matched between cases and controls [6]. In addition, whether the DNA was isolated from blood or cultured cells affected the CNV measurement, causing false associations, particularly at the immunoglobulin heavy chain and T-cell receptor loci [6]. Zanda et al. found an additional artifact: age at sampling. Loci that were affected by somatic rearrangements had time in older people to accumulate mutations, thus skewing the result if there are differences in age between cases and controls [87]. Directly measuring mCNVs need not be the only way to scan for phenotype associations genome-wide. With so much SNP information already available, it could be possible to build a genome-wide catalog of SNP-to-mCNV LD relationships and cross reference that with GWAS data. Querying this catalog for mCNV-associated SNPs with a nominal association to a phenotype could serve as preliminary genome-wide survey for mCNV associations. This would allow geneticists to make full use of already available SNP data sets while WGS data accumulates to an amount that enables systematic and well-powered analyses that reach a larger set of mCNVs. We believe that a large number of mCNV–disease relationships remain to be discovered. Associations in complex, polygenic diseases tend to require very large cohorts (>10 000 samples) to discover novel relationships at genome-wide significance. Disease-mCNV studies on this scale have not been attempted yet. There is reason, though, to expect that such activity will be high-yield; with mCNVs accounting for 88% of human gene dosage variation and shaping RNA expression of the affected genes in almost all cases [8], it is reasonable to expect that there are many undiscovered influences still hiding in our genomes. Key points Multi-allelic CNVs (mCNVs) have the potential to affect phenotypes because of their large contribution to gene-dosage variation and their proclivity for recurrent mutation. mCNVs are complex and have been challenging to measure and characterize experimentally. The many structural forms of mCNVs and their complex relationships with single-nucleotide polymorphisms (SNPs) and SNP haplotypes can obscure their effects in genome-wide association studies. Uncertain copy number measurements hide artifacts in association analyses and have likely contributed to false-positive mCNV association results. New analytical methods, molecular and computational, are starting to enable precise measurements and an understanding of mCNVs that will facilitate more replicable associations and genome-wide scans for association.

Funding

This work was supported by a grant from the National Human Genome Research Institute (R01 HG006855).

110 in total

Review 1. CNVs: harbingers of a rare variant revolution in psychiatric genetics.

Authors: Dheeraj Malhotra; Jonathan Sebat
Journal: Cell Date: 2012-03-16 Impact factor: 41.582

2. Genetic variations in the receptor-ligand pair CCR5 and CCL3L1 are important determinants of susceptibility to Kawasaki disease.

Authors: Jane C Burns; Chisato Shimizu; Enrique Gonzalez; Hemant Kulkarni; Sukeshi Patel; Hiroko Shike; Robert S Sundel; Jane W Newburger; Sunil K Ahuja
Journal: J Infect Dis Date: 2005-06-08 Impact factor: 5.226

3. The human alpha-amylase multigene family consists of haplotypes with variable numbers of genes.

Authors: P C Groot; M J Bleeker; J C Pronk; F Arwert; W H Mager; R J Planta; A W Eriksson; R R Frants
Journal: Genomics Date: 1989-07 Impact factor: 5.736

4. Linkage disequilibrium and association of MAPT H1 in Parkinson disease.

Authors: Lisa Skipper; Kristen Wilkes; Mathias Toft; Matthew Baker; Sarah Lincoln; Mary Hulihan; Owen A Ross; Mike Hutton; Jan Aasly; Matthew Farrer
Journal: Am J Hum Genet Date: 2004-08-03 Impact factor: 11.025

5. CCL3L1 and CCR5 influence cell-mediated immunity and affect HIV-AIDS pathogenesis via viral entry-independent mechanisms.

Authors: Matthew J Dolan; Hemant Kulkarni; Jose F Camargo; Weijing He; Alison Smith; Juan-Manuel Anaya; Toshiyuki Miura; Frederick M Hecht; Manju Mamtani; Florencia Pereyra; Vincent Marconi; Andrea Mangano; Luisa Sen; Rosa Bologna; Robert A Clark; Stephanie A Anderson; Judith Delmar; Robert J O'Connell; Andrew Lloyd; Jeffrey Martin; Seema S Ahuja; Brian K Agan; Bruce D Walker; Steven G Deeks; Sunil K Ahuja
Journal: Nat Immunol Date: 2007-10-21 Impact factor: 25.606

6. CCL3L1 copy number is a strong genetic determinant of HIV seropositivity in Caucasian intravenous drug users.

Authors: Kristi Huik; Maarja Sadam; Tõnis Karki; Radko Avi; Tõnu Krispin; Piret Paap; Kristi Rüütel; Anneli Uusküla; Ave Talu; Katri Abel-Ollo; Irja Lutsar
Journal: J Infect Dis Date: 2010-03 Impact factor: 5.226

7. Copy number variation and evolution in humans and chimpanzees.

Authors: George H Perry; Fengtang Yang; Tomas Marques-Bonet; Carly Murphy; Tomas Fitzgerald; Arthur S Lee; Courtney Hyland; Anne C Stone; Matthew E Hurles; Chris Tyler-Smith; Evan E Eichler; Nigel P Carter; Charles Lee; Richard Redon
Journal: Genome Res Date: 2008-09-04 Impact factor: 9.043

8. Association of β-defensin copy number and psoriasis in three cohorts of European origin.

Authors: Philip E Stuart; Ulrike Hüffmeier; Rajan P Nair; Raquel Palla; Trilokraj Tejasvi; Joost Schalkwijk; James T Elder; Andre Reis; John A L Armour
Journal: J Invest Dermatol Date: 2012-06-28 Impact factor: 8.551

9. Large multiallelic copy number variations in humans.

Authors: Robert E Handsaker; Vanessa Van Doren; Jennifer R Berman; Giulio Genovese; Seva Kashin; Linda M Boettger; Steven A McCarroll
Journal: Nat Genet Date: 2015-01-26 Impact factor: 38.330

10. Structural forms of the human amylase locus and their relationships to SNPs, haplotypes and obesity.

Authors: Christina L Usher; Robert E Handsaker; Tõnu Esko; Marcus A Tuke; Michael N Weedon; Alex R Hastie; Han Cao; Jennifer E Moon; Seva Kashin; Christian Fuchsberger; Andres Metspalu; Carlos N Pato; Michele T Pato; Mark I McCarthy; Michael Boehnke; David M Altshuler; Timothy M Frayling; Joel N Hirschhorn; Steven A McCarroll
Journal: Nat Genet Date: 2015-06-22 Impact factor: 38.330

20 in total

Review 1. Adaptive potential of genomic structural variation in human and mammalian evolution.

Authors: David W Radke; Charles Lee
Journal: Brief Funct Genomics Date: 2015-05-23 Impact factor: 4.241

2. A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification.

Authors: Ren-Hua Chung; Chen-Yu Kang
Journal: Gigascience Date: 2019-05-01 Impact factor: 6.524

3. Targeted capture enrichment and sequencing identifies extensive nucleotide variation in the turkey MHC-B.

Authors: Kent M Reed; Kristelle M Mendoza; Robert E Settlage
Journal: Immunogenetics Date: 2016-01-05 Impact factor: 2.846

Review 4. Salivary Amylase: Digestion and Metabolic Syndrome.

Authors: Catherine Peyrot des Gachons; Paul A S Breslin
Journal: Curr Diab Rep Date: 2016-10 Impact factor: 4.810

5. Sulfotransferase 1A3/4 copy number variation is associated with neurodegenerative disease.

Authors: N J Butcher; M K Horne; G D Mellick; C J Fowler; C L Masters; R F Minchin
Journal: Pharmacogenomics J Date: 2017-04-04 Impact factor: 3.550

Review 6. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies.

Authors: Peter D Stenson; Matthew Mort; Edward V Ball; Katy Evans; Matthew Hayden; Sally Heywood; Michelle Hussain; Andrew D Phillips; David N Cooper
Journal: Hum Genet Date: 2017-03-27 Impact factor: 4.132

7. LongAGE: defining breakpoints of genomic structural variants through optimal and memory efficient alignments of long reads.

Authors: Quang Tran; Alexej Abyzov
Journal: Bioinformatics Date: 2021-05-17 Impact factor: 6.937

8. A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources.

Authors: Lin Yuan; Tao Sun; Jing Zhao; Zhen Shen
Journal: Front Genet Date: 2021-06-29 Impact factor: 4.599

9. Resolving complex structural genomic rearrangements using a randomized approach.

Authors: Xuefang Zhao; Sarah B Emery; Bridget Myers; Jeffrey M Kidd; Ryan E Mills
Journal: Genome Biol Date: 2016-06-10 Impact factor: 13.583

10. Multiallelic copy number variation in the complement component 4A (C4A) gene is associated with late-stage age-related macular degeneration (AMD).

Authors: Felix Grassmann; Stuart Cantsilieris; Anja-Sabrina Schulz-Kuhnt; Stefan J White; Andrea J Richardson; Alex W Hewitt; Brendan J Vote; Denise Schmied; Robyn H Guymer; Bernhard H F Weber; Paul N Baird
Journal: J Neuroinflammation Date: 2016-04-18 Impact factor: 8.322