| Literature DB >> 25991004 |
Carles Hernandez-Ferrer1,2,3, Ines Quintela Garcia4,5, Katharina Danielski6, Ángel Carracedo7,8,9,10, Luis A Pérez-Jurado11,12,13, Juan R González14,15,16.
Abstract
BACKGROUND: The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies.Entities:
Mesh:
Year: 2015 PMID: 25991004 PMCID: PMC4438530 DOI: 10.1186/s12859-015-0608-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schema of the application of affy2sv to analyze CytoScan data. Part A of the figure shows the work-flows available in the R package affy2sv. These work-flows are composed by two steps: generate intermediate files and generate specific output. First step us done using the function Cyto2APT. The seconds step is done using the functions Cyto2Mad and Cyto2SnpMatrix. Part B of the figure shows the pipelines used to perform the two studies detailed in the article. The two CytoScan HD populations were pre-process using affy2sv and then analyzed using different tools
Results of analyzing Dataset A with aff2sv and snpStats
| NameAFFY | NamedbSNP | CHR | Position | P value | MAFNijmegen | MAFToronto |
|---|---|---|---|---|---|---|
| S-3KHLT | rs2445906 | 8 | 87901 | 7.995824e-10 | 0.4402516 | 0.35731132 |
| S-3FEKM | rs12429439 | 13 | 516673 | 7.957754e-08 | 0.2156250 | 0.09953162 |
| S-3TIFM | rs62459010 | 7 | 688240 | 1.007604e-07 | 0.1957831 | 0.08313253 |
| S-3QSBZ | rs4243640 | 14 | 650233 | 1.317467e-07 | 0.4108280 | 0.25768322 |
| S-3XDND | --- | 1 | 510137 | 2.857901e-07 | 0.1027778 | 0.03154206 |
| S-4FNLK | rs4239595 | 19 | 547436 | 3.786156e-07 | 0.2225806 | 0.10352941 |
| S-4LWCG | rs60081206 | 1 | 483264 | 3.804457e-07 | 0.1655844 | 0.06721698 |
| S-4HMDR | rs10868728 | 9 | 752489 | 3.880796e-07 | 0.4024390 | 0.25817757 |
| S-3FSKC | rs12402205 | 1 | 510072 | 5.298163e-07 | 0.1027778 | 0.03271028 |
| S-3KMMG | --- | 1 | 510066 | 5.298163e-07 | 0.1027778 | 0.03271028 |
Top 10 significant SNPs obtained from the GWAS Toronto vs. Nijmegen (Dataset A) using a complete set of 429 .CEL files from Affymetrix CytoScan HD
Fig. 2Manhattan plot result of comparing the two populations in Dataset A. Manhattan plot result of the GWAS study comparing general population from Nijmegen versus Toronto (Dataset A), performed with affy2sv and snpStats. It shows the log10 of the p-value given to each SNP in chromosome 1 to X
Results of analyzing Dataset A with aff2sv and MAD
| IniProbe | EndProbe | LenProbe | CHR | LRR | Bdev | State | Sample | Pop |
|---|---|---|---|---|---|---|---|---|
| 219677 | 33114837 | 279 | 8 | −0.14 | 0.088 | 2 | CyHD_022112T_SS199_400554WB | T |
| 43717666 | 78010194 | 285 | 18 | 0.15 | 0.07 | 3 | CyHD_022112T_SS199_400554WB | T |
| 20520198 | 107105043 | 1140 | 14 | 0 | 0.252 | 1 | N_Blood_control99 | N |
Mosaic events detected by MAD (T = 7, MinSegLen = 100) on the 627 .CEL files from Afymetrix CytoScan HD corresponding to the two general populations of Nijmegen and Toronto in Dataset A. Each column of the table has its own meaning. IniProbe and EndProbe place the mosaic event on the chromosome given by column CHR. The column LenProbe informs of the number of probes in the region detected as a mosaic event, the columns LRR and Bdev are the measures used to detect the mosaic event and to make a previous attempt to classify it. state shows the result of this classification (being 1 = uniparental disomy (UPD), 2 = Deletion, 3 = Duplication, 4 = Trisomy and 5. = loss of heterozygosity (LOH)). sample tells on which sample the mosaic event was found, and the population on which one of both populations, Toronto (T) or Nijmegen (N), the sample belongs to
Fig. 3Two of the three mosaic events detected by MAD of Dataset A after being pre-processed by affy2sv. The plots show two mosaic events found by MAD after the pre-process Dataset A with affy2sv Each plot represents the whole chromosome where the mosaic event is located. The black dots show the value of the LRR for each single SNP while the red points show the value of the BAF; placing at the top the ones corresponding to AA allele (with a value close to 1), at the middle the ones corresponding to AB allele (with a value around 0.5) and at the bottom the ones corresponding to BB allele (with a value close to 0). Part A shows a 33 Mb mosaic deletion at terminal 8p. Part B shows a 25 Mb mosaic duplication at terminal 18q. The presence of both events in the same sample (from Toronto general population) indicated that the individual carries an unbalanced chromosomal translocation (8p; 18q) in a proportion of cells
Results of analyzing Dataset A with aff2sv and R-GADA
| IniProbe | EndProbe | LenProbe | MeanAmp | CHR | State | Sample |
|---|---|---|---|---|---|---|
| 66690197 | 71078462 | 183 | 0.953 | 3 | −1 | 1F549 |
| 94236184 | 117023549 | 447 | 0.237 | X | −1 | 1J014 |
| 52942 | 15383670 | 602 | 0.086 | 17 | −1 | 1J567 |
| 17309881 | 21217575 | 121 | 0.132 | 22 | 1 | 1K397 |
| 17309881 | 21421319 | 127 | 0.377 | 22 | −1 | 2A419 |
| 143559 | 15049329 | 495 | 0.890 | 18 | −1 | 2B595 |
| 17309881 | 21364849 | 124 | 0.185 | 22 | 1 | 2D325 |
| 65997819 | 69181942 | 80 | 0.049 | 4 | −1 | 2F584 |
| 22759438 | 32409066 | 444 | 0.130 | 15 | −1 | 2G029 |
| 17309881 | 20192331 | 109 | 0.362 | 22 | −1 | 2G159 |
| 83885323 | 86767689 | 117 | 0.265 | 7 | 1 | 2H598 |
| 4723882 | 27966028 | 709 | 1.055 | 5 | −1 | 2L046 |
| 143559 | 11602053 | 416 | 0.072 | 18 | −1 | 2L217 |
| 22759438 | 32409066 | 444 | 0.172 | 15 | −1 | 3A913 |
| 12585825 | 23193309 | 337 | 0.837 | 8 | −1 | 3B558 |
| 218476969 | 249191732 | 1144 | 0.070 | 1 | −1 | 3C103 |
| 144131822 | 159100528 | 652 | 0.971 | 7 | −1 | 8D582 |
| 134476 | 18433821 | 591 | 0.466 | 20 | 1 | 8D582 |
| 15529890 | 21862551 | 208 | 0.367 | 8 | 1 | P609 |
CNVs detected by R-GADA (T = 7, MinSegLen = 100) on the 315 .CEL files from Afymetrix CytoScan HD corresponding to the population diagnosed with intellectual disability (ID) in Dataset B. The table is the result of the exportation of the object created by R-GADA. The columns IniProbe, EndProbe, chromosome and LenProbe tells us how many probes are contained din the region detected as CNV event, the column sample shows the sample's name containing the CNV. The value given by MeanAmp is used to try to classify the event (in gain or loss), the result of this classification is seen in State (1: gain; −1: loss)
Fig. 4Two CNV events found in the Dataset B population (diagnosed with intellectual disability) with R-GADA and calling of 8p23.1 inversion in the same population with invClust. Part A shows a subject from Dataset B having an interstitial gain on chromosome 7q. The black dots show the value of the LRR for each single SNP while the red points show the value of the BAF; placing at the top the ones corresponding to AA allele (with a value close to 1), at the middle the ones corresponding to AB allele (with a value around 0.5) and at the bottom the ones corresponding to BB allele (with a value close to 0). Part B shows another subject from the same Dataset B having a interstitial loss on chromosome 8p. Part C Shows the calling of the inversion 8p23.1 in the entry dataset. The three groups correspond to each genotype; being the blue points the individuals that contain the inversion in both alleles, the green corresponds to the heterozygous individuals and the red ones are the individuals without the inversion. The plot is obtained after performing a MDS reduction over the population. 2-D density curves indicate the probability of belonging to each genotype being the more closed circles the highest probability
Fig. 5The three type of plots affy2sv can draw on CytoScan samples to perform a visual QC. Plot A shows the log2 of the intensities of both alleles for a single SNP across all the population. In the case, a random probe (A-4DTYM) was selected and drawn across the population diagnosed with intellectual disability (Dataset B). Plot B shows the values corresponding to the log2 of the intensity of both alleles for all the probes in a random subject (3C136, from Dataset B). Plot C draws the strength and the contrast of all the probes for a random individual (3C136), being the strength log(A + B) and contrast (A-B)/(A + B)