| Literature DB >> 30858532 |
T Nutile1, D Ruggiero1,2, A F Herzig3,4, A Tirozzi2, S Nappo5, R Sorice1, F Marangio1, C Bellenguez6,7,8, A L Leutenegger3,4, M Ciullo9,10.
Abstract
The present study describes the genetic architecture of the isolated populations of Cilento, through the analysis of exome sequence data of 245 representative individuals of these populations. By annotating the exome variants and cataloguing them according to their frequency and functional effects, we identified 347,684 variants, 67.4% of which are rare and low frequency variants, and 1% of them (corresponding to 319 variants per person) are classified as high functional impact variants; also, 39,946 (11.5% of the total) are novel variants, for which we determined a significant enrichment for deleterious effects. By comparing the allele frequencies in Cilento with those from the Tuscan population from the 1000 Genomes Project Phase 3, we highlighted an increase in allele frequency in Cilento especially for variants which map to genes involved in extracellular matrix formation and organization. Furthermore, among the variants showing increased frequency we identified several known rare disease-causing variants. By different population genetics analyses, we corroborated the status of the Cilento populations as genetic isolates. Finally, we showed that exome data of Cilento represents a useful local reference panel capable of improving the accuracy of genetic imputation, thus adding power to genetic studies of human traits in these populations.Entities:
Mesh:
Year: 2019 PMID: 30858532 PMCID: PMC6411969 DOI: 10.1038/s41598-019-41022-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Cilento variants. The percentage of variants found in the Cilento whole-exome sequencing study, categorized by functional impact and minor allele frequency.
Figure 2Cilento novel variants. Percentage of Cilento novel variants (in pink) and shared variants (in blue) according to Minor Allele Frequency category and functional impact. The shared variants are grouped according to the number of reference databases in which they were found (indicated by different blue shades).
Figure 3Functional enrichment of Cilento novel variants. The analysis was performed comparing novel variants with those shared with at least one reference database. The size of the circles represents the significance level of the two-sided test based on asymptotic normal distribution (Fisher exact test for the following categories: HIGH/MAF > 5%, MODERATE/MAF > 5%, and LOW/MAF > 5%). The x-axis indicates the fold enrichment, the vertical line indicates no enrichment (the proportions in the NOVEL and the SHARED set are equal). NS = not significant.
List of the 13 over-represented pathways in common between the ConsensusPathDB analyses performed on the three villages.
| Pathway name | Pathway size | Campora | Gioi | Cardile | Genes contained in common | Increased variants in common | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Genes contained | p-value | q-value | Genes contained | p-value | q-value | Genes contained | p-value | q-value | ||||
| Axon guidance | 357 | 154 (43.3%) | 2.53E-06 | 1.36E-03 | 126 (35.4%) | 1.97E-05 | 5.97E-03 | 171 (48.0%) | 1.09E-05 | 2.37E-03 | 53 | 18 |
| Beta1 integrin cell surface interactions | 66 | 36 (54.5%) | 1.02E-04 | 2.14E-02 | 32 (48.5%) | 4.98E-05 | 8.98E-03 | 40 (60.6%) | 8.20E-05 | 1.01E-02 | 13 | 3 |
| Collagen biosynthesis and modifying enzymes | 67 | 42 (62.7%) | 1.78E-07 | 1.39E-04 | 31 (46.3%) | 1.94E-04 | 1.75E-02 | 41 (61.2%) | 4.88E-05 | 7.64E-03 | 20 | 5 |
| Collagen chain trimerization | 44 | 33 (75.0%) | 4.57E-09 | 1.73E-05 | 27 (61.4%) | 5.55E-07 | 5.01E-04 | 33 (75.0%) | 3.20E-07 | 1.25E-04 | 20 | 5 |
| Collagen formation | 91 | 53 (58.2%) | 1.56E-07 | 1.39E-04 | 40 (44.0%) | 1.02E-04 | 1.22E-02 | 59 (64.8%) | 6.27E-08 | 3.06E-05 | 26 | 5 |
| ECM-receptor interaction - Homo sapiens | 82 | 44 (53.7%) | 3.10E-05 | 8.36E-03 | 43 (52.4%) | 1.70E-07 | 2.05E-04 | 54 (65.9%) | 1.04E-07 | 4.50E-05 | 20 | 5 |
| Extracellular matrix organization | 293 | 136 (46.4%) | 8.15E-08 | 1.39E-04 | 118 (40.3%) | 1.81E-08 | 6.55E-05 | 165 (56.3%) | 9.41E-12 | 3.68E-08 | 62 | 9 |
| Focal adhesion - Homo sapiens | 199 | 91 (45.7%) | 2.27E-05 | 6.58E-03 | 78 (39.2%) | 1.48E-05 | 5.92E-03 | 105 (52.8%) | 3.78E-06 | 9.86E-04 | 33 | 9 |
| Integrin | 124 | 62 (50.0%) | 1.67E-05 | 5.36E-03 | 59 (47.6%) | 9.41E-08 | 1.70E-04 | 77 (62.1%) | 1.13E-08 | 1.11E-05 | 35 | 7 |
| Protein digestion and absorption - Homo sapiens | 90 | 48 (53.3%) | 1.70E-05 | 5.36E-03 | 41 (45.6%) | 3.05E-05 | 6.94E-03 | 50 (55.6%) | 2.54E-04 | 2.24E-02 | 24 | 6 |
| Stimuli-sensing channels | 102 | 56 (54.9%) | 1.03E-06 | 6.49E-04 | 41 (40.2%) | 8.21E-04 | 4.36E-02 | 60 (58.8%) | 5.92E-06 | 1.45E-03 | 24 | 6 |
| Transport of small molecules | 666 | 265 (39.9%) | 3.17E-06 | 1.50E-03 | 206 (31.0%) | 6.83E-04 | 3.74E-02 | 324 (48.8%) | 1.38E-10 | 2.69E-07 | 86 | 19 |
| Vesicle-mediated transport | 620 | 237 (38.2%) | 2.73E-04 | 4.29E-02 | 205 (33.1%) | 1.18E-05 | 5.92E-03 | 295 (47.6%) | 2.38E-08 | 1.55E-05 | 74 | 22 |
p-values are calculated according to the hypergeometric test based on the number of genes present in both the pathway-based set and input list of genes. q-values represent the p-values corrected for multiple testing using the false discovery rate method. The last two columns in the table represent the genes used as input for the analyses and the increased allele frequency variants located in those genes, that are in common between the three isolates.
Rare disease causing variants, reported as Pathogenic in ClinVar database, increased in allele frequency in at least one Cilento isolate.
| Disease | Orphanet Number | Orphanet classification | Gene | Variant | Allele Frequency | Fold increase | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Campora | Gioi | Cardile | TSI | Campora | Gioi | Cardile | |||||
| 2-methylbutyryl-CoA dehydrogenase deficiency | 79157 | inborn error of metabolism; neurological disease | ACADSB | rs58639322 |
|
| / | 0.005 |
|
| / |
| Autosomal recessive isolated neurosensory deafness type DFNB | 90636 | otorhinolaryngologic disease | MYO15A | rs121908970 | / |
| 0.009 | / | / | / | / |
| Behcet’s syndrome | 117 | neurological disease; skin disease; renal disease; eye disease; systemic and rheumatological disease; circulatory system disease | ADA2 | rs146597836 |
| / |
| 0.005 |
| / |
|
| Butyrylcholinesterase | 132 | inborn error of metabolism; neurological disease | BCHE | rs28933390 |
| 0.005 |
| 0.019 |
| 0.3 | 2.3 |
| Carnitine palmitoyltransferase II deficiency | 157 | inborn error of metabolism; neurological disease | CPT2 | rs74315294 | 0.005 | / |
| 0.005 | 1.2 | / |
|
| Cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) | 136 | neurological disease; eye disease | NOTCH3 | rs201680145 | / | / |
| / | / | / | / |
| Corneal dystrophy Fuchs endothelial | 98974 | eye diseases | ZEB1 | rs118020901 |
| / | / | / | / | / | / |
| Cowden syndrome | 201 | gastroenterological disease; skin disease; neoplastic disease; developmental anomalies during embryogenesis | SEC. 23B | rs36023150 | / |
| 0.009 | 0.009 | / |
| 0.9 |
| delta- beta Thalassemia | 231237 | hematological disease | HBD | rs35152987 |
| / | 0.009 | / | / | / | / |
| Emery-Dreifuss muscular dystrophy | 261 | cardiac disease; neurological disease | SYNE1 | rs119103248 | 0.011 | / |
| 0.005 | 2.3 | / |
|
| Hereditary chronic pancreatitis | 676 | gastroenterological disease; endocrine disease | CFTR | rs1800111 |
| 0.011 |
| / | / | / | / |
| Keratoconus | 156071 | eye disease | ZNF469 | rs281865162 | / |
| 0.009 | 0.005 | / |
| 1.8 |
| Leber congenital amaurosis 4 | 65 | eye disease; ciliopathy | AIPL1 | rs62637014 | 0.011 | / |
| / | / | / | / |
| Leber congenital amaurosis 6 | 65 | eye disease; ciliopathy | RPGRIP1 | rs17103671 | / | 0.011 |
| 0.005 | / | 2.3 |
|
| Microphthalmia syndromic 9 | 2470 | eye disease; respiratory disease; surgical thoracic and abdominal disease; developmental anomalies during embryogenesis | STRA6 | rs118203962 |
| / | / | 0.005 |
| / | / |
| Odontoonychodermal dysplasia | 99798 | odontological disease | WNT10A | rs121908120 | 0.011 |
|
| 0.005 | 2.3 |
|
|
| Primary ciliary dyskinesia | 244 | respiratory disease; infertility disorder; ciliopathy | RSPH1 | rs138320978 | 0.005 |
| 0.017 | / | / | / | / |
| Pseudoxanthoma elasticum | 758 | eye disease; skin disease; renal disease; neurological disease; cardiac disease; circulatory system disease; developmental anomalies during embryogenesis | ABCC6 | rs72653706 |
| 0.016 |
| / | / | / | / |
| Rare hereditary thrombophilia | 217454 | hematological disease; systemic and rheumatological disease; bone disease | F5 | rs6025 |
| / |
| 0.005 |
| / |
|
| Tyrosinemia type I | 882 | inborn errors of metabolism; neurological disease; hepatic disease; renal disease; neoplastic disease | FAH | rs11555096 | 0.016 | 0.005 |
| 0.005 | 3.5 | 1.1 |
|
Fold increases ≥5 and allele frequencies ≥0.0223 are reported in bold.
Figure 4PCA analysis. Principal components analysis of Campora, Gioi and Cardile, combined with the Tuscan (TSI) population from the 1000 Genomes Phase 3 v5 reference panel. The analysis was performed using (a) common (MAF > 5%) and (b) rare and low frequency (MAF ≤ 5%) variants in common between the Cilento isolates and TSI. We compare the first and second principal components (PC1 and PC2, respectively).
Figure 5Runs Of Homozygosity (ROH) analysis. (a) Mean and (b) total length (Mb) of ROH in Campora, Gioi, Cardile and TSI populations. Only ROH with length >1 Mb are shown.
Figure 6Improvement of imputation. Comparison of IMPUTE2 imputation quality score ‘info’ between two imputation strategies: firstly using the 1000 Genomes Phase3 v5 reference panel (‘1KG_Ph3’ – red) and secondly using a combination of the 1000 Genomes Phase3 v5 reference panel and a local reference panel of phased exome data from Cilento (‘1KG_Ph3 + WES’ – blue). Imputation was performed on the entirety of chromosome 10. Mean ‘info’ scores for 50 MAF bins are presented and results are split between all imputed variants (left) and only imputed exonic variants (right).