| Literature DB >> 31269027 |
Carlos Ruiz-Arenas1,2,3, Alejandro Cáceres1,3, Marcos López-Sánchez4,5, Ignacio Tolosana1,3, Luis Pérez-Jurado4,6,7,8, Juan R González1,2,3.
Abstract
Polymorphic inversions contribute to adaptation and phenotypic variation. However, large multi-centric association studies of inversions remain challenging. We present scoreInvHap, a method to genotype inversions from SNP data for genome-wide association studies (GWASs), overcoming important limitations of current methods and outperforming them in accuracy and applicability. scoreInvHap calls individual inversion-genotypes from a similarity score to the SNPs of experimentally validated references. It can be used on different sources of SNP data, including those with low SNP coverage such as exome sequencing, and is easily adaptable to genotype new inversions, either in humans or in other species. We present 20 human inversions that can be reliably and easily genotyped with scoreInvHap to discover their role in complex human traits, and illustrate a first genome-wide association study of experimentally-validated human inversions. scoreInvHap is implemented in R and it is freely available from Bioconductor.Entities:
Mesh:
Year: 2019 PMID: 31269027 PMCID: PMC6608898 DOI: 10.1371/journal.pgen.1008203
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
20 human-inversions that can be genotyped from SNP data using scoreInvHap.
| Loc. | Length (Kb) | Inv. Freq. | N haps | Hap-Inv | ||||
|---|---|---|---|---|---|---|---|---|
| inv1_004 | 1p22.1 | 0.77 | 11.23 | 2 | 99.8 | 99.8% | 99.6% | 98.4% |
| inv1_008 | 1q31.3 | 1.2 | 19.68 | 2 | 99.6 | 99.6% | 99.6% | 99.0% |
| inv2_002 | 2p22.3 | 0.72 | 15.11 | 2 | 99.8 | 99.8% | 72.2% | 72.2% |
| inv2_013 | 2q22.1 | 4.25 | 71.47 | 2 | 100 | 100% | 100.0% | 100.0% |
| inv3_003 | 3q26.1 | 2.28 | 56.16a | 4 | 100 | 100% | 62.2% | 62.2% |
| inv6_002 | 6p21.33 | 0.87 | 63.12 | 2 | 100 | 100% | 100.0% | 100.0% |
| inv6_006 | 6q23.1 | 4.12 | 6.56 | 2 | 99.8 | 99.8% | 87.3% | 87.3% |
| inv7_003 | 7p14.3 | 5.25 | 23.56 | 2 | 99.4 | 99.4% | 60.4% | 60.2% |
| inv7_005 | 7p11.2 | 73.9 | 50.39 | 4 | 100 | 100% | 50.0% | 50.0% |
| inv7_011 | 7q11.22 | 12.7 | 63.52 | 2 | 100 | 100% | 100.0% | 97.7% |
| inv7_014 | 7q36.1 | 2.08 | 19.88 | 2 | 99.2 | 98.4% | 98.4% | 96.2% |
| inv8_001 | 8p23.1 | 3,925 | 57.95 | 2 | 100 | 100% | 100.0% | 100.0% |
| inv11_001 | 11p12 | 4.75 | 15.81 | 2 | 98.8% (503) | 97.0% | 71.0% | 71.0% |
| inv11_004 | 11q13.2 | 1.38 | 34.39 | 3 | 95.8% (503) | 95.4% | 95.8% | 63.6% |
| inv12_004 | 12q13.11 | 19.3 | 7.46 | 2 | 99.8% (503) | 99.6% | 85.5% | 85.5% |
| inv12_006 | 12q21.2 | 1.03 | 36.98 | 3 | 93.4% (503) | 93.4% | 65.0% | 65.0% |
| inv14_005 | 14q23.3 | 0.86 | 29.42 | 2 | 99.6 | 99.6% | -c | 56.9% |
| inv17_007 | 17q21.31 | 711 | 23.96 | 2 | 99.5 | 99.8% | 99.8% | 97.4% |
| inv21_005 | 21q21.3 | 1.06 | 51.29 | 4 | 99.4 | 99.4% | 99.4% | 90.1% |
| invX_006 | Xq13.2 | 90.8 | 13.3b | 4 | 97.4 | 97.4% | 76.3% | 76.3% |
Loc.: Citogenetic location. Length: inversion length in kb in hg19. Inv. Freq.: Frequency of the inverted allele in the European individuals of the 1000 Genomes Project Phase 3. N Haps: Number of different haplotypes groups supported by the inversion. Hap-Inv conc.: Percentage of concordance between validated inversion genotypes and haplotype-genotypes clusters in the first PCs of SNPs in the inverted region, showing the level experimental support for inferences based on haplotype to inversion mappings. N: number of individuals with experimental validation of inversion genotypes. a. Inversion genotypes that do not follow Hardy-Weinberg equilibrium. b. inversion frequency was equal for males and females. c. invClust could not genotype this inversion.
Fig 1Representation of haplotype-genotype clustering mapped to inversion-genotypes.
Disks represent the expected haplotype-genotypes/clusters that can be found in a MDS analysis of SNPs in inverted regions. Di are the distances between the clusters that illustrate the equidistance of the heterozygous individuals from the homozygous groups. (A) Inversions supporting two haplotype groups (A and B). Two haplotype groups form three haplotype-genotypes in the first MDS component that map to the inversion-genotypes shown in color (green: standard homozygous, red: heterozygous, blue: inverted homozygous). (B) The first two MDS components show six possible haplotype-genotype groups for three haplotype groups (A, B and C). The homozygous group for the standard allele shows two possible haplotype subpopulations (A and C). (C-D) If a fourth haplotype group (D) is supported by the inversion, the clustering pattern on the first three MDS components should reveal a tetrahedron pattern where the inverted allele can be mapped to either one (1+3), two (2+2) or three (3+1) haplotype groups.
Comparison between scoreInvHap, invClust and PFIDO.
| Data type | Measure | Dataset | invClust | PFIDO | scoreInvHap | |||
|---|---|---|---|---|---|---|---|---|
| 8p23.1 | 17q21.31 | 8p23.1 | 17q21.31 | 8p23.1 | 17q21.31 | |||
| Mendelian Errors proportion | AGP | 0.1% | 0% | - | - | 0.2% | 0% | |
| SSC 1Mv1 | 0.2% | 0% | 0.2% | 2.8% | 0.2% | 0% | ||
| SSC 1Mv3 | 0.2% | 0% | 0.1% | 0.3% | 0.1% | 0% | ||
| SSC Omni | 0.2% | 0% | 0.1% | 0% | 0.3% | 0% | ||
| Mendelian Errors proportion | AGP | 0.1% | 0% | - | - | 0.5% | 0% | |
| SSC 1Mv1 | 0% | 0% | 0.2% | 0.9% | 1.1% | 0% | ||
| SSC 1Mv3 | 0.2% | 0% | 0.4% | 0.9% | 0.5% | 0% | ||
| SSC Omni | 0.2% | 0% | 0.3% | 1.3% | 0.9% | 0% | ||
Fig 2Inversion-genotype frequencies of inv-8p23.1 in exome data as obtained by three genotyping methods.
(A) inversion-genotype frequencies for inv-8p23.1 reported by scoreInvHap and PFIDO in Aberdeen and Gallagher datasets compared with the inversion genotype frequencies of the European individuals in the 1000 Genomes Project. EUR is the frequency in the European individuals of the 1000 Genomes Project. Error bars include the 95% confidence interval of the estimated frequencies. scoreInvHap-Gallagher: inversion-genotype frequencies obtained by scoreInvHap in the Gallagher dataset. scoreInvHap-Aberdeen: inversion-genotype frequencies obtained by scoreInvHap in Aberdeen dataset. PFIDO-Gallagher: inversion-genotype frequencies obtained by PFIDO in Gallagher dataset. PFIDO-Aberdeen: inversion-genotype frequencies obtained by PFIDO in Aberdeen dataset. (B) First two MDS components of inv-8p23.1 SNPs in Aberdeen and Gallagher datasets showing that invClust could not return any genotype classification of the individuals.
Fig 3Association studies of inversions genotyped with scoreInvHap.
(A) Association of inversions inv-8p23.1 and inv-17q21.31 with autism and schizophrenia using exome data in ten studies of UK10K. Horizontal line means no effect. (B) Genome-wide association study of 15 inversions on breast cancer. The horizontal line indicates nominal significance.