| Literature DB >> 26764160 |
Joaquín Dopazo1, Alicia Amadoz2, Marta Bleda3, Luz Garcia-Alonso2, Alejandro Alemán3, Francisco García-García2, Juan A Rodriguez4, Josephine T Daub4, Gerard Muntané4, Antonio Rueda5, Alicia Vela-Boza5, Francisco J López-Domingo5, Javier P Florido5, Pablo Arce5, Macarena Ruiz-Ferrer6, Cristina Méndez-Vidal7, Todd E Arnold8, Olivia Spleiss9, Miguel Alvarez-Tejado10, Arcadi Navarro11, Shomi S Bhattacharya12, Salud Borrego7, Javier Santoyo-López5, Guillermo Antiñolo13.
Abstract
Recent results from large-scale genomic projects suggest that allele frequencies, which are highly relevant for medical purposes, differ considerably across different populations. The need for a detailed catalog of local variability motivated the whole-exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. Like in other studies, a considerable number of rare variants were found (almost one-third of the described variants). There were also relevant differences in allelic frequencies in polymorphic variants, including ∼10,000 polymorphisms private to the Spanish population. The allelic frequencies of variants conferring susceptibility to complex diseases (including cancer, schizophrenia, Alzheimer disease, type 2 diabetes, and other pathologies) were overall similar to those of other populations. However, the trend is the opposite for variants linked to Mendelian and rare diseases (including several retinal degenerative dystrophies and cardiomyopathies) that show marked frequency differences between populations. Interestingly, a correspondence between differences in allelic frequencies and disease prevalence was found, highlighting the relevance of frequency differences in disease risk. These differences are also observed in variants that disrupt known drug binding sites, suggesting an important role for local variability in population-specific drug resistances or adverse effects. We have made the Spanish population variant server web page that contains population frequency information for the complete list of 170,888 variant positions we found publicly available (http://spv.babelomics.org/), We show that it if fundamental to determine population-specific variant frequencies to distinguish real disease associations from population-specific polymorphisms.Entities:
Keywords: disease variants; exome sequencing; pharmacogenomic variants.; population variability
Mesh:
Year: 2016 PMID: 26764160 PMCID: PMC4839216 DOI: 10.1093/molbev/msw005
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Variants in the Exonic Regions of the MGP Spanish Population.
| All Variants | Private MGP Variants | |||||
|---|---|---|---|---|---|---|
| Total Variants | Average Variants per Individual | Average Variants per Individual (homozygous) | Total Variants | Average Variants per Individual | Average Variants per Individual (homozygous) | |
| Exome positions with SNV | 170,888 | 18,875.8 | 6,906 | 63,243 | 835.8 | 59.4 |
| Exome monoallelic positions | 170,370 | 18,871.6 | 6,906 | 63,143 | 835.9 | 59.4 |
| Exome multiallelic positions | 518 | 4.2 | 0 | 100 | 0.8 | 0 |
| Exome SNV | 171,406 | 18,880.1 | 6,906 | 63,343 | 836.7 | 59.4 |
| Singletons | 54,214 | 202 | 59.4 | 54,214 | 202 | 59.4 |
| Nonsynonymous SNV | 97,589 | 9,193.7 | 3,335.5 | 40,564 | 538.6 | 41 |
| Synonymous SNV | 73,011 | 9,734 | 3,596.5 | 21,857 | 287.2 | 18 |
| Stop gain SNV | 1,852 | 95.8 | 22 | 1,060 | 15.9 | 0.4 |
| Stop loss SNV | 178 | 29.4 | 12 | 71 | 0.6 | 0.1 |
| Splicing SNV | 4,217 | 417.2 | 154.8 | 1,842 | 25.1 | 2 |
| LoF SNV | 32,736 | 1,163.8 | 211.2 | 17,314 | 141.8 | 3.3 |
| LoF stricta SNV | 12,639 | 352.6 | 51.4 | 7,136 | 51 | 0.3 |
aAll three pathogenicity predictors (SIFT, Polyphen, and conservation score) reported these SNVs as pathogenic, in contrast with loss-of-function (LoF) in which only two pathogenicity predictions were required to consider the variant as pathogenic.
FAccumulative number of new variants contributed by individuals. The red line represents the number of variants found as the number of sequenced individuals increase. The green line represents the number of already known variants among all the variants found. The blue line represents the number of new variants not present in the 1000G populations. New variants are decomposed into polymorphic variants (present in more than one individual in the MGP population) represented by the blue dashed line, and rare variants (present in only one MGP individual), represented by the blue dotted line.
Variants Associated with Diseases That Have Allele Frequencies in the Spanish Population at Least 2-Fold Higher Than in the 1000G Populations.
| MGP | 1000G | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| chr | Start | CT | R | A | R/R | R/A | A/A | MAF | R/R | R/A | A/A | MAF | Ratio | Ratio E | HGMD_disease |
| 22 | 36688178 | ns | G | A | 263 | 4 | 0 | 0.0075 | 1,078 | 0 | 0 | 0 | NA | NA | Epstein syndrome? |
| 6 | 42141500 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,078 | 0 | 0 | 0 | NA | NA | Cone dystrophy, autosomal dominant |
| 22 | 36688178 | syn | G | A | 263 | 4 | 0 | 0.0075 | 1,078 | 0 | 0 | 0 | NA | NA | Epstein syndrome? |
| 17 | 36104650 | ns | C | A | 264 | 3 | 0 | 0.0056 | 1,078 | 0 | 0 | 0 | NA | NA | Diabetes, Maturity onset diabetes of the young (MODY) |
| 15 | 48704816 | ns | G | A | 262 | 5 | 0 | 0.0094 | 1,077 | 1 | 0 | 5.00E-04 | 18.8000 | NA | Marfan syndrome |
| 6 | 162206852 | ns | G | A | 262 | 5 | 0 | 0.0094 | 1,077 | 1 | 0 | 5.00E-04 | 18.8000 | NA | Parkinsonism. juvenile. autosomal recessive |
| 1 | 216420460 | ns | C | A | 263 | 4 | 0 | 0.0075 | 1,077 | 1 | 0 | 5.00E-04 | 15.0000 | NA | Retinitis pigmentosa. recessive. no hearing loss |
| 17 | 41199716 | sg | A | T | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.2000 | NA | Ovarian cancer |
| 19 | 8436373 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,076 | 2 | 0 | 9.00E-04 | 8.33333 | NA | Lower plasma triglyceride level |
| 11 | 18050850 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,076 | 2 | 0 | 9.00E-04 | 8.33333 | NA | Attention deficit hyperactivity disorder |
| 3 | 123376066 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,076 | 2 | 0 | 9.00E-04 | 8.33333 | NA | Aortic dissections? |
| 7 | 107329557 | ns | T | C | 262 | 5 | 0 | 0.0094 | 1,075 | 3 | 0 | 0.0014 | 6.71429 | NA | Pendred syndrome |
| 7 | 99032559 | ns | G | A | 250 | 17 | 0 | 0.0318 | 1,064 | 13 | 1 | 0.007 | 4.54286 | 12.23076 | Complex I deficiency |
| 18 | 2937867 | ns | C | A | 260 | 7 | 0 | 0.0131 | 1,075 | 3 | 0 | 0.0014 | 9.35714 | 10.07692 | Psoriasis |
| 15 | 58957371 | ns | C | G | 261 | 6 | 0 | 0.0112 | 1,076 | 2 | 0 | 9.00E-04 | 12.4444 | 8.615384 | Alzheimer disease. late onset |
| 15 | 42684875 | nc | C | T | 261 | 6 | 0 | 0.0112 | 1,072 | 6 | 0 | 0.0028 | 4.00000 | 8.615384 | Muscular dystrophy. limb girdle |
| 19 | 4159747 | ns | G | A | 262 | 5 | 0 | 0.0094 | 1,076 | 2 | 0 | 9.00E-04 | 10.4444 | 7.230769 | Hypertriglyceridemia |
| 12 | 6143978 | ns | C | T | 262 | 5 | 0 | 0.0094 | 1,075 | 3 | 0 | 0.0014 | 6.71429 | 7.230769 | Von Willebrand. Normandy variant |
| 1 | 115221116 | ns | C | A | 257 | 10 | 0 | 0.0187 | 1,073 | 5 | 0 | 0.0023 | 8.13043 | 7.192307 | Adenosine monophosphate deaminase deficiency |
| 12 | 32994073 | ns | G | A | 257 | 10 | 0 | 0.0187 | 1,073 | 5 | 0 | 0.0023 | 8.13043 | 7.192307 | Arrhythmogenic right ventricular dysplasia/cardiomyopathy |
| 4 | 5627493 | syn | G | T | 258 | 9 | 0 | 0.0169 | 1,074 | 3 | 1 | 0.0023 | 7.34783 | 6.5 | Ellis-van Creveld syndrome |
| 2 | 71825797 | ns | C | G | 263 | 4 | 0 | 0.0075 | 1,077 | 1 | 0 | 5.00E-04 | 15.0000 | 5.769230 | Muscular dystrophy. limb girdle 2B |
| 13 | 52534410 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,077 | 1 | 0 | 5.00E-04 | 15.0000 | 5.769230 | Wilson disease |
| 8 | 145699735 | ns | G | C | 263 | 4 | 0 | 0.0075 | 1,077 | 1 | 0 | 5.00E-04 | 15.0000 | 5.769230 | Congenital heart defects |
| 1 | 196709833 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,076 | 2 | 0 | 9.00E-04 | 8.33333 | 5.769230 | Factor H deficiency |
| 1 | 94568686 | ns | C | T | 263 | 4 | 0 | 0.0075 | 1,076 | 2 | 0 | 9.00E-04 | 8.33333 | 5.769230 | Stargardt disease |
| 5 | 110454719 | ns | A | G | 255 | 12 | 0 | 0.0225 | 1,074 | 4 | 0 | 0.0019 | 11.8421 | 5.625 | Glaucoma. primary open angle |
| 11 | 67799622 | ns | C | T | 260 | 7 | 0 | 0.0131 | 1,076 | 2 | 0 | 9.00E-04 | 14.5555 | 5.038461 | Complex I deficiency |
| 8 | 100832259 | ns | A | G | 260 | 7 | 0 | 0.0131 | 1,074 | 4 | 0 | 0.0019 | 6.89474 | 5.038461 | Cohen síndrome |
| 1 | 183532364 | ns | T | A | 260 | 7 | 0 | 0.0131 | 1,071 | 7 | 0 | 0.0032 | 4.09375 | 5.038461 | Chronic granulomatous disease |
| 1 | 76211574 | ns | C | A | 264 | 3 | 0 | 0.0056 | 1,078 | 0 | 0 | 0 | NA | 4.307692 | Medium chain acyl CoA dehydrogenase deficiency |
| 9 | 120475248 | ns | G | A | 261 | 6 | 0 | 0.0112 | 1,076 | 2 | 0 | 9.00E-04 | 12.444 | 4.307692 | Meningococcal disease? |
| 22 | 45691554 | ns | C | T | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.200 | 4.307692 | Renal adysplasia |
| 11 | 88924465 | ns | C | A | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.200 | 4.307692 | Albinism. oculocutaneous 1 |
| 14 | 21811213 | ns | A | G | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.200 | 4.307692 | Leber congenital amaurosis |
| 14 | 21811213 | ns | A | G | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.200 | 4.307692 | Retinitis pigmentosa? |
| 13 | 48939088 | ns | C | T | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.200 | 4.307692 | Retinoblastoma |
| 14 | 23862646 | ns | C | A | 264 | 3 | 0 | 0.0056 | 1,077 | 1 | 0 | 5.00E-04 | 11.200 | 4.307692 | Cardiomyopathy. dilated |
| 5 | 70945029 | ns | T | C | 261 | 6 | 0 | 0.0112 | 1,074 | 4 | 0 | 0.0019 | 5.89474 | 4.307692 | Complex I deficiency |
| 18 | 2925359 | ns | C | T | 245 | 22 | 0 | 0.0412 | 1,061 | 17 | 0 | 0.0079 | 5.21519 | 4.12 | Psoriasis |
| 6 | 31729925 | ns | C | T | 225 | 40 | 2 | 0.0824 | 980 | 87 | 11 | 0.0506 | 1.62846 | 4.12 | Leukemia. risk. association with |
| 17 | 56348226 | sp | T | G | 259 | 8 | 0 | 0.015 | 1,075 | 3 | 0 | 0.0014 | 10.7142 | 3.75 | Myeloperoxidase deficiency |
| 1 | 203194834 | ns | C | T | 262 | 5 | 0 | 0.0094 | 1,076 | 2 | 0 | 9.00E-04 | 10.444 | 3.615384 | Chitotriosidase deficiency |
| 16 | 16259579 | syn | G | A | 262 | 5 | 0 | 0.0094 | 1,075 | 3 | 0 | 0.0014 | 6.71429 | 3.615384 | Pseudoxanthoma elasticum |
| 2 | 44513202 | ns | T | C | 262 | 5 | 0 | 0.0094 | 1,075 | 3 | 0 | 0.0014 | 6.71429 | 3.615384 | Cystinuria |
| 2 | 71738977 | ns | G | A | 262 | 5 | 0 | 0.0094 | 1,074 | 4 | 0 | 0.0019 | 4.94737 | 3.615384 | Muscular dystrophy. limb girdle/Miyoshi myopathy |
| 13 | 32914592 | ns | C | T | 262 | 5 | 0 | 0.0094 | 1,074 | 4 | 0 | 0.0019 | 4.94737 | 3.615384 | Breast and/or ovarian cancer? |
| 1 | 2234791 | ns | C | T | 260 | 7 | 0 | 0.0131 | 1,073 | 5 | 0 | 0.0023 | 5.69565 | 3.275 | Cleft lip? |
| 15 | 75012987 | ns | G | T | 233 | 33 | 1 | 0.0655 | 1,048 | 29 | 1 | 0.0144 | 4.54861 | 3.275 | Colorectal cancer. reduced risk. association with |
| 17 | 73837042 | ns | T | C | 263 | 4 | 0 | 0.0075 | 1,076 | 2 | 0 | 9.00E-04 | 8.33333 | 2.884615 | Hemophagocytic lymphohistiocytosis. Familial |
| 1 | 12064892 | ns | G | A | 261 | 6 | 0 | 0.0112 | 1,073 | 4 | 1 | 0.0028 | 4.00000 | 2.8 | Charcot-Marie-Tooth disease 2a |
| 21 | 44317156 | ns | A | C | 254 | 13 | 0 | 0.0243 | 1,070 | 8 | 0 | 0.0037 | 6.56757 | 2.43 | Complex I deficiency |
| 21 | 44317156 | sp | A | C | 254 | 13 | 0 | 0.0243 | 1,070 | 8 | 0 | 0.0037 | 6.56757 | 2.43 | Complex I deficiency |
| 18 | 58038832 | ns | T | G | 254 | 13 | 0 | 0.0243 | 1,069 | 9 | 0 | 0.0042 | 5.78571 | 2.43 | Obesity. autosomal dominant? |
| 12 | 6103650 | ns | G | A | 254 | 13 | 0 | 0.0243 | 1,069 | 9 | 0 | 0.0042 | 5.78571 | 2.43 | Von Willebrand disease 1? |
| 17 | 33430313 | ns | T | C | 254 | 13 | 0 | 0.0243 | 1,065 | 13 | 0 | 0.006 | 4.05000 | 2.43 | Breast cancer. increased risk. association with |
| 13 | 49281554 | ns | A | G | 254 | 13 | 0 | 0.0243 | 1,061 | 17 | 0 | 0.0079 | 3.07595 | 2.43 | Atopy. association with |
| 12 | 6458350 | ns | A | G | 242 | 25 | 0 | 0.0468 | 1,055 | 22 | 1 | 0.0111 | 4.21622 | 2.34 | Ischemic cerebrovascular events. association with |
| 17 | 42463054 | ns | G | C | 255 | 12 | 0 | 0.0225 | 1,068 | 10 | 0 | 0.0046 | 4.89130 | 2.25 | Glanzmann thrombasthenia |
| 12 | 22017410 | sp | C | T | 255 | 12 | 0 | 0.0225 | 1,066 | 12 | 0 | 0.0056 | 4.01786 | 2.25 | Myocardial infarction. association with |
| 12 | 22017410 | ns | C | T | 255 | 12 | 0 | 0.0225 | 1,066 | 12 | 0 | 0.0056 | 4.01786 | 2.25 | Myocardial infarction. association with |
| 1 | 158624528 | ns | G | T | 232 | 34 | 1 | 0.0674 | 1,042 | 34 | 2 | 0.0176 | 3.82955 | 2.246666 | Spherocytosis. association with? |
| 22 | 46614274 | ns | C | G | 224 | 40 | 3 | 0.0861 | 1,027 | 49 | 2 | 0.0246 | 3.50000 | 2.1525 | Elevated plasma lipid conc. assoc. in diabetes |
| 5 | 82491674 | ns | T | C | 233 | 34 | 0 | 0.0637 | 1,042 | 36 | 0 | 0.0167 | 3.81437 | 2.123333 | Lung cancer. susceptibility to. association with |
| 19 | 36341311 | ns | T | A | 256 | 11 | 0 | 0.0206 | 1,072 | 6 | 0 | 0.0028 | 7.35714 | 2.06 | Focal segmental glomerulosclerosis |
| 5 | 151202476 | ns | C | T | 256 | 11 | 0 | 0.0206 | 1,066 | 12 | 0 | 0.0056 | 3.67857 | 2.06 | Hyperekplexia |
| 5 | 110428060 | ns | T | C | 256 | 11 | 0 | 0.0206 | 1,064 | 14 | 0 | 0.0065 | 3.16923 | 2.06 | Glaucoma. primary open angle. association with? |
| 13 | 78475230 | ns | C | T | 256 | 11 | 0 | 0.0206 | 1,064 | 14 | 0 | 0.0065 | 3.16923 | 2.06 | Hirschsprung disease |
| 1 | 227170648 | syn | C | T | 257 | 9 | 1 | 0.0206 | 1,063 | 15 | 0 | 0.007 | 2.94286 | 2.06 | Ubiquinone deficiency with cerebellar ataxia |
Note.—The first column indicates the chromosome; the second column indicates the position of the variant; the third column labeled CT, contains the consequence type, which are ns, nonsynonymous SNV; syn, synonymous; sg, stop gain; sp, variant affecting splicing; nc, ncRNA_exonic; the fourth column, labeled R, contains the reference allele in the position; the fifth column, labeled A, contains the alternative allele; the three following columns (sixth, seventh, and eight), labeled R/R, R/A, and A/A contain the number of individuals in which a reference homozygote (R/R), heterozygote (R/A) or an alternative homozygote (A/A) are found in the Spanish population, respectively; the ninth column, labeled MAF, contains the alternative allele frequency in the Spanish population; the three following columns (tenth, 11th, and 12th) contain the number of individuals in which a reference homozygote (R/R), heterozygote (R/A), or an alternative homozygote (A/A) are found in the 1000G populations; the 13 column, labeled Ratio, contains the ratio between the Spanish and the 1000G MAFs, the 14th column, labeled Ratio E, contains the ratio between the Spanish MAF and the 1000G MAFs of populations with European ancestry only; and finally, the 15th column, labeled as HGMD disease, contains the description of the disease caused by the variant, which can be a causal effect, or an association (when the description ends in “association with”) and can also be uncertain (then, the definition includes a question mark).
FComparison of allelic frequencies described in HGMD between the MGP Spanish population and the 1000 genomes populations in four diseases with more than five variants. Upper left panel shows the frequencies in the Spanish MGP samples found for all the variants associated with Alzheimer disease in HGMD (X axis) versus the corresponding frequencies observed in all the individuals of the 1000 genomes populations (Y axis). Upper right panel presents a similar plot for variants described in HGMD as associated to leukemia risk. Lower left and right panels depict the same relationship for two rare diseases, Marfan syndrome, and age-related macular degeneration, respectively.
FComparison of the relative prevalence and MAFs for several of the diseases showing the most extreme differences in allelic frequencies. The two first bars in each disease represent the log2 of the ratios of prevalence of the disease (DALYs) in Spain with respect to the corresponding prevalence in Central and East Europe, respectively, and the third bar represents the log2 of the ratio of the MAF of alleles of the disease in Spain and the corresponding MAF in the European populations of 1000G. The diseases are abbreviated as: Alzheimer (AD), Attention deficit hyperactivity disorder (ADHD), Parkinson (PD), Psoriasis (PSO), and Cardiovascular diseases (CAD).
FEffect of filtering out variants with high MAFs using frequency data inferred either from the available databases (1000G) or from the MGP Spanish population sequenced here. (A) Pedigree of the family studied with seven members affected by adRP. (B) Segregation analysis across the family was carried out, followed by a step filtering out the variants found in a reference population with a MAF incompatible with the observed prevalence of adRP. The plot represents the number of candidate variants that segregate with the family as a growing number of affected members were used to select the variants (from one to seven) and when two reference populations (1000G, pale blue and the MGP Spanish local population, dark blue) were used to filter out variants with MAFs that were too high to be compatible with the prevalence observed for the disease (>0.001 in 1000G and >0.004 in the MGP population, respectively). The filtering effect on the local Spanish population was drastically more stringent than for the 1000G population.