Literature DB >> 26279618

A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer.

Abstract

Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.

Entities: CellLine Chemical Disease Gene Species

Keywords: Affymetrix high density SNP array; breast cancer; copy number standardization; copy number variants; hidden Markov model

Year: 2015 PMID： 26279618 PMCID： PMC4519351 DOI： 10.4137/CIN.S15203

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Introduction

Genome-wide association studies (GWAS) are useful for the discovery of genetic variants underlying complex human diseases, such as breast cancer and Type II diabetes.1,2 These genetic association studies typically compare the allele/genotype frequency for each single-nucleotide polymorphism (SNP) between cases and controls. Large projects such as the HapMap and 1,000 Genomes have shown that, in addition to single-nucleotide sequence variations (SNVs), structural alterations, such as copy number variants (CNVs), also account for up to 7.3% of the genetic variation among humans and may be involved in the genetic susceptibility to diseases,3–7 including cancers.8,9 CNVs were first identified in the early 2000s,10,11 and have been found to exist pervasively in human genomes.12,13 Two major platforms of DNA microarrays have been commonly used for copy number estimation, namely, Affymetrix high density SNP arrays and Illumina Bead arrays,14,15 relying on the relative intensity, an indirect measurement of hybridization of fluorescently labeled DNA fragments to immobilized probes on the arrays. Sophisticated statistical models are required to accurately infer the actual copy number within samples. In the past few years, several methods have been proposed for copy number inference. For example, smoothing methods were used in early studies in the field,16,17 which fit a smoothing curve for the intensities along the genomic region and use certain threshold to infer copy number levels. The smoothing methods have been shown to be effective in studies for detecting genomic region with copy number changes.18 However, these methods suffer from two limitations, namely, difficulty in locating accurate boundaries and difficulty in significance testing for the alterations.19 Another group of methods adopt certain change-point models for the underlying copy number levels.20,21 A change-point model usually assumes that the SNPs come from segments that are uniformly distributed in human genome, and their underlying copy numbers are piecewise constants with a series of jump points. By maximizing the likelihood function, the parameters as well as the change points can be estimated for copy number inference. Such models are further extended by various formations of hidden Markov models (HMMs).22–25 The HMM assumes that the observed intensities of SNPs are emitted by an underlying Markov chain. It usually explicitly specifies the distribution for the waiting time of copy number changes and the jumping probabilities between copy number states. These methods have emerged as promising tools for copy number inference. Estimation of array intensity values is challenging due to presence of experimental noise, both within an array and among arrays of different samples. For example, it is commonly known that the level of ozone affects hybridization reactions, which can affect interpretation of the results.26,27 In a recent study, we and others proposed a novel method to estimate copy number abundance on a single-array single-SNP basis, referred to as the probe intensity composition representation (PICR).28 This method models the cross-hybridization between DNA sequences via their physical binding affinities. It has shown great potential for differentiating copy number signals from background noises. In this article, we propose to extend the PICR method with hidden Markov modeling for copy number inference, referred to as the PICR-CNV. The estimated copy number abundance at each SNP locus from PICR will first be standardized to achieve parity between multiple samples, to which an HMM will be further applied. Our method has two major advantages: 1) By estimating the CNV abundance through PICR, we expect reduction of background noise in intensity values,28–30 and thus be able to boost the performance of HMM. 2) our method does not require between-samples array normalization, which maintains the data integrity and the independency of individual samples. The proposed method is compatible with Affymetrix high density SNP arrays for detection of CNVs.

Methods

This section is organized as follows: We first describe the design of Affymetrix 500K SNP array. Then we briefly review the estimation of copy number abundance for each single array by using a newly established RICR model.28 We introduce the multi-array standardization of the copy number abundance to achieve equal footing between individuals. Finally, we explain the PICR-CNV by applying an HMM to integrate multiple SNPs for copy number inference.

Design of Affymetrix 500K SNP array

Oligonucleotide microarrays annotate each SNP using a set of 24 probes of 25-mer photolithographically synthesized immobilized nucleic acid sequences. The target sequences are labeled with 3′-fluorescent dye before hybridization to the array, and their abundance are often measured with the fluorescent intensity on the array after hybridization.31–33 In a 500K SNP array, six quartets are adopted to interrogate a single dimorphic SNP site with its possible alleles commonly denoted as A and B. Each quartet consists of four types of probes that are 25 base pairs in length. These probes are designed either perfectly matching (PM) the target sequence or mismatching (MM) at a particular nucleotide site for each allele: perfect match A, mismatch A, perfect match B, and mismatch B, denoted, respectively, as PA, MA, PB, and MB for short. The probe sets are also designed to hybridize with either sense strands (s = 1) or antisense strands (s = –1). The quartets have different shifts (k) for the nucleotide on the probe sequence (k may take the values –4, –3, –2, –1, 0, 1, 2, 3, 4) from the center nucleotide of the probe sequence (k = 0 at position 13 of the 25 base pairs) (see Fig. 1A of Matsuzaki et al.34 for detailed illustration.).

Figure 1

Distribution of the size of identified CNVs based on BRCA GWAS data.

Estimation for copy number abundance by PICR

The PICR method takes into account the cross-hybridization between DNA sequences via a positional-dependent nearest neighbor (PDNN) model.28 In PICR, the florescent intensity of a particular probe set is decomposed into four terms: the baseline intensity (b), the products of allelic copy numbers abundance (N, N) and the binding affinity between target and probe sequences with respect to different alleles (f, f), and a measurement error (ε) [Equation (1)]. The binding affinities (f, f) are inherently determined by the physical property of the DNA sequences. The allelic copy number abundance can then be estimated via a linear regression between the intensities and binding affinities. Each probe set may be perfectly matched or mismatched to either allele as described above (PA, MA, PB, MB).

Multi-array equal footing by standardization

By using PICR, the allelic copy number abundance is estimated on a single-array single-SNP basis. Since all the raw fluorescence intensities are subject to experimental scales, which may vary among arrays, it is essential to achieve equal footing for multiple arrays before any further analysis. We propose to define a standardized copy number abundance (SCN) as where N (N) denotes the allelic copy number abundance for SNP j of subject i, and se(N + N) denotes its estimated standard deviation of N + N via the linear regression model of Equation (1). Assuming the raw intensities are normally distributed among probe sets, these standardized copy numbers are expected to have identical distributions for i = 1,…, N; ∀j = 1,…,K, and hence, are expected to be on the same scale.

PICR-CNV: a hidden Markov model for copy number inference

Modeling strategy and copy number states

As illustrated by Equation (2), our objective is to detect total copy number changes among subjects. We assume that an interrogated locus covering an SNP may have five possible copy numbers states, with its total copy number ranging from 0 to 4 (Table 1). For simplicity, we also refer to the copy number at an interrogated locus as the copy number of the SNP locus in this article. Such copy number states are not observed directly, and hence, are latent. Following the same notation with existing methods,22,24 the inference of these hidden states is based on two types of observations, log R ratios (LRR) and B allele frequencies (BAF), which can be calculated by the estimated allelic copy numbers abundance. We first estimate the standardized copy number abundance for the jth SNP of subject i, and define its LRR as

Table 1

Configuration of five possible copy number states.

STATE (Z)	COPY NUMBER	POSSIBLE GENOTYPES	EXPECTED LRR	EXPECTED BAF
1	0	− (Deletion)	log(0) =−∞	0
2	1	A;B	log₂(1/2) = −1 0	0;1
3	2	AA;AB;BB	log₂(1) = 0	0
4	3	AAA;AAB;ABB;BBB	log (3/2) = 0.585	00.330.671
5	4	AAAA;AAAB;AABB;ABBB;BBBB	log (2) = 1	00.250.50.751

The SCN estimates among controls are regarded as a reference level for each SNP locus. We further define the BAF as where and a, b are the corresponding thresholds for accurate genotyping of SNP j with the PICR. Similar to a few previous studies, an HMM is adopted to integrate LRR and BAF for copy number inference.22,24,25 Our method differs from the existing ones by using standardized copy number abundance to calculate corresponding LRR and BAF rather than the probe intensities.

Transition probability for the hidden copy number states

We assume that the copy number states at SNP loci follow a time-dependent continuous Markov process, with genomic position of SNPs as “time”. The transition probability is dependent on the distance between SNPs. Let z be the underlying copy number state for the jth SNP of subject i, and let d′ be the physical distance between SNP j and SNP j′ on the chromosome based on reference genome. We define the transition probability between the copy number states of SNPs j and j′ as Here, p′ (d′)is the probability for a hidden state s at SNP j to stay at the same state at SNP j′ over a distance of d ′, which is modeled by an exponential distribution with parameter 1/λ. Therefore, λ has the interpretation of the expected “time” (distance) for the copy number at a particular state s. The longer the distance, the less likely the copy number states will remain the same. Similar modeling strategies have been commonly adopted in previous studies.22,24

Emission probability for the observations

Since the copy number states are not observed directly, a set of emission probabilities are used to model the distribution of the observed variables (LRR and BAF) given the copy number states at SNP loci. Similar to a few previous studies, we modeled LRR and BAF by mixture distributions.22,24,25 Denote z, , R, , and B, , as the underlying copy number state, LRR, and BAF for the jth SNP of subject i. We first assume that the LRR and BAF at a particular SNP locus are conditionally independent given its underlying copy number state, so that Further, the emission probability of LRR is modeled with the mixture of a uniform distribution and a normal distribution as where f(., µ, σ) denotes the probability density function for a normal distribution with mean µ and variance σ2. Here, we assume that the genotyping may fail with a small probability of π. Under such a case, LRR is observed as a background noise, which follows a uniform distribution between its possible minimum (R) and maximum values (µ). Otherwise, it follows a normal distribution with a mean µ and variance(σ2) with respect to its underlying copy number states. As illustrated by Table 1, the expected mean and the variance of LRR observations vary by the underlying copy number states. Similarly, the expected values of BAF also vary by the underlying copy number states and the underlying genotypes (Table 1). We model the emission probability of BAF at a particular SNP locus with the mixture of a uniform distribution and normal or truncated normal distributions: where Φ(., µ, σ) denotes the cumulative distribution function for a normal distribution with mean µ and variance σ2; G denotes the total number of all possible genotypes at a SNP locus with copy number state s; and µ and σ are the mean and standard deviation of BAF for a SNP locus with copy number state s and genotype g (Table 1). Further, ψ denotes the prior probability of BAF for copy number state s and genotype g, which can be calculated by a binomial distribution based on the B allele frequency in the population (bpf).22,24,25 For example, an SNP with genotype AAB has copy number 3 and an expected BAF of 1/3. The prior probability of the BAF can be calculated as

Parameter estimation and copy number inference

In practice, we assume π = π = 0.01 as the empirical error rate for genotyping, and λ, 1 ≤ s ≤5 5, are predetermined to account for the size of copy number variants. The set of parameters that need to be estimated includes Ω = {ω(s) = p(z = s) as starting probability; s = 1, 2, 3, 4, 5 P = (p ′) as transition probability; 1 # s,s′ ≤ 5 µ; mean of R; s = 1, 2, 3, 4, 5 σ; standard deviation of R; s = 1, 2, 3, 4, 5 µ; mean of B; s = 1, 2, 3, 4, 5; g = 1,2.G σ; standard deviation of B; s = 1, 2, 3, 4, 5; g = 1, 2.G} The parameters in Ω are optimized by using a forward-backward algorithm, also known as the Baum–Welch algorithm.35 After the parameter estimation, the inference of copy number states is carried out by the Viterbi algorithm.36 The computational algorithms are commonly used in previous studies, and are not detailed here.

Results

Simulation study

In the simulation study, we simulated a segment of the genome with length of 106 base pairs. We first assumed 10K SNPs with their physical position uniformly distributed in the genome. Each SNP was simulated for its underlying copy number state, and the observed probe intensities were measured by LRR and BAF. PICR-CNV was then applied to infer the underlying copy number states. In the simulation, the expected lengths of the copy number states were set at λ3 = 50K for a normal copy number of two copies, and λ = 5K; l = 1, 2, 4, 5 for other copy number states. The transition probability between copy number states was set as The parameters for the emission probability of LRR were set as State: 1 2 3 4 5 µ= (log2(1/10), log2(1/2), log2(1), log2(3/2), log2(2)) σ= (log2(1/10), log2(1/2), log2(1), log2(3/2), log2(2)) The parameters for emission probability of BAF were set as The observation of B was further truncated at 0 and 1. We simulated 100 subjects by using the above model parameters. For each subject, the underlying copy number states and genotypes of 10K SNPs were first simulated in a sequential order according to the transition probabilities. The frequencies of allele B in the population followed a uniform distribution between [0.1, 0.9]. For each SNP, the observations of LRR (R) and BAF (B) were then simulated by using the emission probability according to its underlying copy number states and genotypes. Two subjects were randomly selected to estimate the parameters by using the Baum–Welch algorithm. The estimated parameters were then used to infer the underlying copy number states for all subjects by using the Viterbi algorithm. Owing to computational concerns, the convergence criteria were met when the summation of the absolute change of all parameters was less than 10−3. We calculated the error rates for the inferred the copy number states of all SNPs in all subjects. Because the expected lengths of the copy number variants (λ) were predetermined and may have an impact on the performance of the inference, we also examined the error rates when they were incorrectly specified. The simulation results are summarized in Table 2. It is seen that the proposed method was accurate for inferring the underlying copy number states when λ was correctly specified. The overall error rate for all SNPs is 1.34e–04. When λ was incorrectly specified, the error rate increased with the level of mis-specification. In our simulation, we found that the error rate was not seriously inflated with an up to 10-fold overspecification of λ. It was also noted that the error rate for SNPs with normal states of two copies decreased by the level of overspecification of λ. This was because the normal states of two copies had the largest expected length, and an SNP was more likely to be inferred as two copies when λ was large. On the other hand, the error rate for SNPs with normal state of two copies increased when λ was underspecified. Overall, the error rate was still properly controlled when λ was incorrectly specified.

Table 2

Error rate for inference of copy number states with correctly and incorrectly specified expected length of copy number states.

AVERAGE NO. OF SNP WITH COPY NUMBER STATE IN EACH SUBJECT
HMM STATE	1	2	3	4	5	Total
	557	163	8,875	185	220	10,000
λ used in HMM	Error rates for copy number state inference
λ_True	5.92e–04	1.53e–04	2.37e–05	1.40e–04	1.32e–03	1.34e–04
2λ_Truea	3.97e–03	4.91e–04	1.69e–05	7.01e–04	4.46e–03	3.55e–04
5λ_True	4.18e–03	6.13e–4	1.80e–05	7.01e–04	4.51e–03	3.71e–04
10λ_True	4.38e–03	9.20e–04	1.80e–05	1.08e–03	4.87e–03	4.02e–04
0.5λ_True	9.69e–04	1.53e–04	3.27e–05	1.56e–04	1.32e–03	1.66e–04

Note:

means the model specified λ is 2 times greater than the true λ.

Application to breast cancer data

We also applied the proposed method to study CNVs that are associated with breast cancer development, using a recent GWAS data among a genetically isolated population of Ashkenazi Jews (AJ),37 in which all participants have their four grandparents of Jewish and of Eastern European ancestry. We are limiting our study to the inherited genetic variation, and potential somatic mutations are beyond the scope of our current study. The original study had three phases. The first phase included 249 breast cancer cases without BRCA1 and BRCA2 mutations, and 299 cancer-free AJ women as controls. The second phase was a replicate study using 343 candidate SNPs among 950 AJ cases and 979 AJ controls. The third phase was also a GWAS study that included 243 AJ cases and 187 controls. The participants from phase I and phase III were genotyped with Affymetrix 500K SNP array, while those from phase II were genotyped by Illumina GoldenGate assay. We focused our analysis on the phase I and phase III data. It is also worthwhile to note that samples from phase I were genotyped by using a combination of a commercial version and an early access version of Affymetrix 500K SNP arrays. This mismatch of arrays has imposed additional challenge to the application of existing methods that require between-array normalizations. However, since PICR is a single-array method and does not require multiple array algorithms, the application of PICR is straightforward as long as the raw florescent intensity values are valid. We used phase III as an initial study for the analysis. The proposed method was first applied to 10 randomly selected controls for parameter training. The initial genotype calling was conducted by PICR, and all parameters in Ω were optimized and then used to infer the copy number states among all participants. We first examine the distribution of the sizes of identified CNVs (Fig. 1). The shape of this distribution is consistent with existing studies (Fig. 1 of Li et al.38). For each SNP locus, we further conducted a Kolmogorov–Smirnov (KS) test to compare the inferred copy numbers between cases and controls. The significant regions were selected if three consecutive SNPs showed significant copy number differences at a level of 1e–07. After the region was selected, a global P-value was further calculated by conducting a KS test using the average copy number of the SNPs within the region. The results are summarized in Table 3. The findings included 34 genomic regions from 16 chromosomes. The region with the largest number of significant SNPs was 4q31.23. This region had 10 SNPs showing significant copy number difference between cases and controls. Besides region 4q31.23, two regions, 1p21.1 and 10q21.1, both have seven significant SNPs. Three regions have five SNPs with significant copy number differences, including 6q22.33, 6q27, and 11p12. These results indicate that copy number alterations on chromosome 4, 6, 1, and 11 may have a significant impact on the development of breast cancer.

Table 3

Regions showing significant copy number variation in phase III data and their replication in phase I data.

CHRO.	CYTOBAND	PHYSICAL LOCATIONa	NO. OF SNPs	P-VALUE(PHASE III)b	P-VALUE(PHASE I)c
1	p21.1	102622376–102640646	7	2.62e–13	0.954
1	p12	120292824–120312909	3	7.62e–14	0.999
1	q22	154077091–154106555	3	2.453e–11	0.999
2	p21	45759616–45760637	3	1.106e–08	0.014
2	p12	81196767–81197522	3	7.232e–09	0.977
2	q21.1	131925407–131955270	3	4.872e–13	0.999
3	p14.3	57706175–57839689	3	1.228e–09	0.116
4	q26	117544365–117576957	3	4.577e–11	0.138
4	q31.23	148668320–148697327	10	9.43e–15	7.56e–05
4	q32.3	166885930–166957371	5	6.664e–11	0.189
5	q14.3	84350898–84398999	5	4.330e–14	0.720
5	q22.3	115145252–115178424	4	2.220e–16	0.893
6	q13	75247853–75311831	5	5.218e–15	0.034
6	q22.33	128476625–128533696	6	2.409e–13	0.806
6	q23.2	134651674–134672863	5	3.722e–10	0.999
6	q27	165234976–165247908	6	1.752e–09	0.996
7	q22.1	98318717–98361309	4	4.727e–11	0.103
7	q31.31	118754169–118754169	5	1e–17	0.524
8	q11.22	52786953–52796842	3	4.550e–10	0.840
8	q21.3	90963387–90964181	3	2.862e–08	0.772
8	q24.13	125649171–139914783	3	2.30e–08	0.973
8	q24.3	145891814–145948840	4	3.220e–15	7.96e–04
9	p21.3	22270796–22294230	5	6.249e–09	3.33e–03
10	q21.1	56853055–74432554	7	1.084e–09	0.998
11	p13	36306019–36366302	3	8.95e–11	0.223
11	p12	37905557–37916354	6	2.627e–09	0.968
11	q22.3	104741435–104806689	5	4.152e–14	0.999
12	q23.1	94977527–95052366	4	1.11e–16	1.07e–04
13	q13.3	34828145–34846106	4	8.975e–10	0.428
13	q14.3	51036156–51071687	4	5.268e–12	6.83e–09
13	q33.1	103334252–103344370	5	1.589e–09	0.964
14	q23.1	60136001–60140123	5	1.843e–12	0.996
18	p11.31	3597746–3635894	3	4.268e–10	0.417
X	q27.3	146596395–146646974	4	5.873e–14	0.086

Notes:

location based on Human Genome Assembly NCBI build 36.1.

Phase III included 243 cases and 187 controls.

Phase I included 249 cases and 299 controls.

We also applied the same procedure to the phase I data for replication. The results are also summarized in Table 3. Among the regions identified from phase III data, the copy number changes remained significant at five regions: 4q31.23, 6q13, 12q23.1, 13q14.3, and 2p21. These five regions contained 10, 5, 4, 4, and 3 SNPs, respectively.

Discussion and Conclusion

In this study, we have proposed an HMM-based method (PICR-CNV) for copy number inference. Through simulations, we have shown that the proposed method is highly accurate for copy number inference and robust against mis-specification of the predetermined model parameter. While it is not straightforward to evaluate the copy number inference with real data due to the unknown copy number status, we have evaluated the proposed standardization approach for genotyping accuracy. We applied PICR to 90 HapMap samples with Affymetrix Mapping 100K arrays, and found that the genotyping accuracies were improved by using standardized copy number abundance compared to using raw copy number abundance (99.70% vs 99.63%). Empirically, we also found that the standardized copy number abundance provided better genotype clustering than its alternative (Fig. 2). The proposed method was further illustrated with an application to breast cancer datasets. The analysis of breast cancer data also identified a few genomic regions that were significantly associated with breast cancer development. Most of these identified regions have been reported in the literature for potential involvement in breast cancer. One SNP in the region 4q31.23 has been recently reported to be significantly associated with breast cancer progression.39 A gene ARHGAP10-NR3C2, which was located in the region, was also known to be related to carcinogenesis through structure alteration.40 Possible copy number changes of the region were also observed from cancer cell line data.41 Regions 1p21.1 and 10q21.1 have also been reported repeatedly for potential association with breast cancer. Chromosome arm 1p was suggested to contain multiple tumor suppressor genes.42 Structure alterations of 1p21.1 have been observed from many studies.42–45 Region 10q21.1 also has multiple candidate tumor suppressors, such as ANX7 and CDC2.46,47 Interestingly, for region 6q22.33, it was identified by the initial GWAS as a novel locus for breast cancer development.37 Our analysis also confirmed this finding and also suggested that the copy number changes in the region may also play an important role.

Figure 2

Raw and standardized copy number abundance for a randomly selected HapMap sample (NA12892).

The associations of the identified regions, including 4q31.23, 12q23.1, 13q14.3, and 2p21, were also replicated by using an independent dataset. The region of 4q31.23 was identified by phase III as the one with the largest number of significant SNPs. The long arm of chromosome 6 was reported to be frequently rearranged in human cancers.48–50 The region of 6q13 was among the important regions that showed copy number alterations.51,52 For region 12q23.1, a gene SLC5A8 was identified by a previous study to be affected frequently by structure changes.53,54 This gene was actively involved in the gene pathway for the development of primary human tumors.55,56 The region 13q14.3 has been reported for copy number changes in various cancers, such as prostate cancer and breast cancer.57–60 The structure changes of 2p12 was also suggested to be involved in cancer development.61 While it is biologically plausible that the structure changes of these regions may play an important role in the development of breast cancer, additional studies are needed to further replicate the association and verify the biological functioning and mechanisms. We are also aware that our method may have a few limitations. First, our copy number estimation method is based on the design of Affymetrix 500K SNP arrays. Further extension will be needed before applying it to Illumina platform or Affymetrix 6.0 arrays. Our current study is a secondary analysis of an existing GWAS dataset, extending previous genotype-based association study to copy-number-based association study. The Affymetrix SNPs array has been a major platform for SNP genotyping and copy number estimation. It was adopted by the Wellcome Trust Case Control Consortium (WTCCC) for intensive GWAS of 14,000 cases of seven common diseases and 3,000 shared controls.62 Second, the current study only considered the total copy number changes at each locus. However, copy number changes may still occur without total number changes, such as balanced copy number with preferential loss of heterozygosity (LOH). Further extensions are needed to account for such copy number changes. Third, our method currently focuses on detecting the total copy number changes in an unrelated population. Detecting copy number status for related individuals or paternal (maternal) specific copy numbers is beyond the scope of current study.

60 in total

1. Structure of neurolysin reveals a deep channel that limits substrate access.

Authors: C K Brown; K Madauss; W Lian; M R Beck; W D Tolbert; D W Rodgers
Journal: Proc Natl Acad Sci U S A Date: 2001-03-06 Impact factor: 11.205

2. The evolutionary chromosome translocation 4;19 in Gorilla gorilla is associated with microduplication of the chromosome fragment syntenic to sequences surrounding the human proximal CMT1A-REP.

Authors: P Stankiewicz; S S Park; K Inoue; J R Lupski
Journal: Genome Res Date: 2001-07 Impact factor: 9.043

3. Genome-wide association study provides evidence for a breast cancer risk locus at 6q22.33.

Authors: Bert Gold; Tomas Kirchhoff; Stefan Stefanov; James Lautenberger; Agnes Viale; Judy Garber; Eitan Friedman; Steven Narod; Adam B Olshen; Peter Gregersen; Kristi Kosarin; Adam Olsh; Julie Bergeron; Nathan A Ellis; Robert J Klein; Andrew G Clark; Larry Norton; Michael Dean; Jeff Boyd; Kenneth Offit
Journal: Proc Natl Acad Sci U S A Date: 2008-03-07 Impact factor: 11.205

4. ANX7, a candidate tumor suppressor gene for prostate cancer.

Authors: M Srivastava; L Bubendorf; V Srikantan; L Fossom; L Nolan; M Glasman; X Leighton; W Fehrle; S Pittaluga; M Raffeld; P Koivisto; N Willi; T C Gasser; J Kononen; G Sauter; O P Kallioniemi; S Srivastava; H B Pollard
Journal: Proc Natl Acad Sci U S A Date: 2001-04-03 Impact factor: 11.205

5. Loss of heterozygosity at chromosome 6q in preinvasive and early invasive breast carcinomas.

Authors: S A Chappell; T Walsh; R A Walker; J A Shaw
Journal: Br J Cancer Date: 1997 Impact factor: 7.640

6. Evidence of chromosome regions and gene involvement in inflammatory breast cancer.

Authors: Florence Lerebours; Philippe Bertheau; Ivan Bieche; Keltouma Driouch; Hugues De The; Kamel Hacene; Marc Espie; Michel Marty; Rosette Lidereau
Journal: Int J Cancer Date: 2002-12-20 Impact factor: 7.396

7. Effects of atmospheric ozone on microarray data quality.

Authors: Thomas L Fare; Ernest M Coffey; Hongyue Dai; Yudong D He; Deborah A Kessler; Kristopher A Kilian; John E Koch; Eric LeProust; Matthew J Marton; Michael R Meyer; Roland B Stoughton; George Y Tokiwa; Yanqun Wang
Journal: Anal Chem Date: 2003-09-01 Impact factor: 6.986

8. Genome-wide association study identifies novel breast cancer susceptibility loci.

Authors: Douglas F Easton; Karen A Pooley; Alison M Dunning; Paul D P Pharoah; Deborah Thompson; Dennis G Ballinger; Jeffery P Struewing; Jonathan Morrison; Helen Field; Robert Luben; Nicholas Wareham; Shahana Ahmed; Catherine S Healey; Richard Bowman; Kerstin B Meyer; Christopher A Haiman; Laurence K Kolonel; Brian E Henderson; Loic Le Marchand; Paul Brennan; Suleeporn Sangrajrang; Valerie Gaborieau; Fabrice Odefrey; Chen-Yang Shen; Pei-Ei Wu; Hui-Chun Wang; Diana Eccles; D Gareth Evans; Julian Peto; Olivia Fletcher; Nichola Johnson; Sheila Seal; Michael R Stratton; Nazneen Rahman; Georgia Chenevix-Trench; Stig E Bojesen; Børge G Nordestgaard; Christen K Axelsson; Montserrat Garcia-Closas; Louise Brinton; Stephen Chanock; Jolanta Lissowska; Beata Peplonska; Heli Nevanlinna; Rainer Fagerholm; Hannaleena Eerola; Daehee Kang; Keun-Young Yoo; Dong-Young Noh; Sei-Hyun Ahn; David J Hunter; Susan E Hankinson; David G Cox; Per Hall; Sara Wedren; Jianjun Liu; Yen-Ling Low; Natalia Bogdanova; Peter Schürmann; Thilo Dörk; Rob A E M Tollenaar; Catharina E Jacobi; Peter Devilee; Jan G M Klijn; Alice J Sigurdson; Michele M Doody; Bruce H Alexander; Jinghui Zhang; Angela Cox; Ian W Brock; Gordon MacPherson; Malcolm W R Reed; Fergus J Couch; Ellen L Goode; Janet E Olson; Hanne Meijers-Heijboer; Ans van den Ouweland; André Uitterlinden; Fernando Rivadeneira; Roger L Milne; Gloria Ribas; Anna Gonzalez-Neira; Javier Benitez; John L Hopper; Margaret McCredie; Melissa Southey; Graham G Giles; Chris Schroen; Christina Justenhoven; Hiltrud Brauch; Ute Hamann; Yon-Dschun Ko; Amanda B Spurdle; Jonathan Beesley; Xiaoqing Chen; Arto Mannermaa; Veli-Matti Kosma; Vesa Kataja; Jaana Hartikainen; Nicholas E Day; David R Cox; Bruce A J Ponder
Journal: Nature Date: 2007-06-28 Impact factor: 49.962

9. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

10. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data.

Authors: Stefano Colella; Christopher Yau; Jennifer M Taylor; Ghazala Mirza; Helen Butler; Penny Clouston; Anne S Bassett; Anneke Seller; Christopher C Holmes; Jiannis Ragoussis
Journal: Nucleic Acids Res Date: 2007-03-06 Impact factor: 16.971