| Literature DB >> 30765821 |
Elena A Vidal1,2,3, Tomás C Moyano1,2, Bernabé I Bustos1,4, Eduardo Pérez-Palma1,4, Carol Moraga1,2, Eleodoro Riveras1,5, Alejandro Montecinos1,2, Lorena Azócar1,5, Daniela C Soto1,2, Mabel Vidal1,2, Alex Di Genova1,6, Klaus Puschel7, Peter Nürnberg8, Stephan Buch9, Jochen Hampe9, Miguel L Allende1,10, Verónica Cambiazo1,11, Mauricio González1,11, Christian Hodar1,11, Martín Montecino1,4, Claudia Muñoz-Espinoza1,12, Ariel Orellana1,12, Angélica Reyes-Jara1,11, Dante Travisany1,6, Paula Vizoso1,13, Mauricio Moraga14,15, Susana Eyheramendy16, Alejandro Maass1,7, Giancarlo V De Ferrari17,18, Juan Francisco Miquel19,20, Rodrigo A Gutiérrez21,22.
Abstract
Whole human genome sequencing initiatives help us understand population history and the basis of genetic diseases. Current data mostly focuses on Old World populations, and the information of the genomic structure of Native Americans, especially those from the Southern Cone is scant. Here we present annotation and variant discovery from high-quality complete genome sequences of a cohort of 11 Mapuche-Huilliche individuals (HUI) from Southern Chile. We found approximately 3.1 × 106 single nucleotide variants (SNVs) per individual and identified 403,383 (6.9%) of novel SNVs events. Analyses of large-scale genomic events detected 680 copy number variants (CNVs) and 4,514 structural variants (SVs), including 398 and 1,910 novel events, respectively. Global ancestry composition of HUI genomes revealed that the cohort represents a sample from a marginally admixed population from the Southern Cone, whose main genetic component derives from Native American ancestors. Additionally, we found that HUI genomes contain variants in genes associated with 5 of the 6 leading causes of noncommunicable diseases in Chile, which may have an impact on the risk of prevalent diseases in Chilean and Amerindian populations. Our data represents a useful resource that can contribute to population-based studies and for the design of early diagnostics or prevention tools for Native and admixed Latin American populations.Entities:
Year: 2019 PMID: 30765821 PMCID: PMC6376018 DOI: 10.1038/s41598-019-39391-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Details of HUI individuals selected for this study and genome sequencing statistics.
| Assembly ID | Sex | Mapuche surnames | Age at screening | Called genome fraction | Genome coverage 30X | Exome coverage 30X | Mapping yield (Gb) |
|---|---|---|---|---|---|---|---|
| GS000011194 | F | 4 | 22 | 0.97 | 0.87 | 0.98 | 180.69 |
| GS000011195 | F | 4 | 47 | 0.97 | 0.87 | 0.98 | 181.74 |
| GS000011196 | F | 4 | 42 | 0.97 | 0.85 | 0.98 | 180.88 |
| GS000011198 | F | 4 | 46 | 0.97 | 0.84 | 0.98 | 180.61 |
| GS000011200 | F | 4 | 18 | 0.97 | 0.84 | 0.98 | 177.51 |
| GS000011201 | F | 4 | 53 | 0.96 | 0.83 | 0.98 | 175.46 |
| GS000011215 | F | 4 | 43 | 0.97 | 0.88 | 0.98 | 180.44 |
| GS000012210* | M | 4 | 49 | 0.96 | 0.86 | 0.99 | 266.52 |
| GS000012242* | F | 4 | 27 | 0.96 | 0.92 | 0.99 | 357.46 |
| GS000020403 | F | 4 | 39 | 0.97 | 0.82 | 0.98 | 151.00 |
| GS000020711 | F | 3 | 60 | 0.97 | 0.84 | 0.98 | 177.72 |
Called genome fraction: Fraction of the reference genome with full (diploid) calls in the sequenced sample following assembly; Genome coverage 30X: Fraction of the reference genome bases where coverage is greater than or equal to 30X; Exome coverage 30X: Fraction of the reference exome bases where coverage is greater than or equal to 30X; Mapping yield: Total base-pairs of sequence reads mapped to the reference genome. Samples marked with an asterisk were sequenced twice, thus they have greater mapping yields and average coverage than the other samples.
Figure 1Genetic and structural variants in Mapuche-Huilliche genomes. Circos plot of the spatial distribution of SNV densities (i), deletions and insertions (ii), structural variant (SV) loses and gains (iii), copy number variant (CNV) losses and gains (iv), inversions (v) and translocations (vi). Light or dark colors in different tracks indicate known or novel variants, respectively. Tandem (red lines) and distal duplications (blue arrows) are shown within the inner circle of the plot. Translocation events are shown as green arrows.
Figure 2Ancestry analysis of HUI and Chilean Latino individuals. (A) ADMIXTURE plots for K = 5 (Continental model) and K = 10 (minimum error model). All 3,706 samples included are depicted as vertical thin bars colored by their corresponding ancestry percentage. HUI genomes are highlighted at the left with thicker bars followed by Chilean Latino genotyped individuals and samples included in 1kGP-phase 3, which are clustered in 5 super-populations (AMR, EUR, EAS, SAS and AFR). For K = 5, the colors were defined as follows: Red for “Amerindian”, yellow for “European”, blue for “East Asian”, green for “South Asian” and purple for “African”. For K = 10, light colors are used to show subcomponents within super-populations EUR, EAS, SAS and AFR. Grey color is used to represent the AMR component common to PEL, MXL, CLM and PUR populations but almost absent in HUI. Bottom thick bars define key colors used in the PCA. (B) Principal Component (PC) analysis including the same set of samples (colored dots) and markers. Color legend and number of samples belonging to each super population defined in (A) is provided in the legend inside brackets. Left Panel: PC1 vs. PC2, right panel: PC3 vs. PC4. Percentage of variance explained by each component is given in parenthesis in the corresponding axis.
Figure 3Analysis of the genetic distance (Fst) between HUI cohort and 1kGP-phase 3 population. (A) World map showing all 26 populations from 1kGP-phase 3 coming from the 5 super populations (AFR, SAS, EAS, EUR and AMR) and their Weir and Cockerham’s Fst statistic (weighted Fst) from yellow to red according to their genetic distance obtained from the comparison with the HUI sequenced individuals. This figure was created on Adobe Illustrator® CS5 (https://www.adobe.com/) based on a figure made available under the Creative Commons CC0 1.0 Universal Public Domain Dedication (Blank map of the world Equirectangular, https://en.wikipedia.org/wiki/File:BlankMap-World6-Equirectangular.svg) (B) Violin plots comparing SNV density between HUI and other 26 populations from 1kGP-phase 3. Fst distributions are sorted by decreasing genetic distance from HUI (top to bottom). Vertical bars on each population plot indicate 95th percentile cutoff. SuperPop = Super populations from 1kGP-phase 3: AFR = Africans, AMR = Admixed Americans, ASN = Asians, EUR = Europeans.
Disease enrichment analyses of genes with potential deleterious SNVs.
| ID | Name | Hypergeometric enrichment (C, O, E, R) | FDR |
|---|---|---|---|
| umls:C0022521 | Kartagener Syndrome | 28, 11, 0.89, 12.41 | 3.26E-07 |
| umls:C0031117 | Peripheral Neuropathy | 293, 27, 9.27, 2.91 | 2.55E-04 |
| umls:C0041755 | Adverse reaction to drug | 54, 9, 1.71, 5.27 | 9.38E-03 |
| umls:C0339527 | Leber Congenital Amaurosis | 22, 6, 0.7, 8.62 | 9.38E-03 |
| umls:C0026850 | Muscular Dystrophy | 8, 4, 0.25, 15.8 | 9.38E-03 |
| umls:C0238198 | Gastrointestinal Stromal Tumors | 8, 4, 0.25, 15.8 | 9.38E-03 |
| umls:C0029422 | Osteochondrodysplasias | 15, 5, 0.47, 10.53 | 9.38E-03 |
| umls:C0007193 | Cardiomyopathy, Dilated | 47, 8, 1.49, 5.38 | 1.03E-02 |
| umls:C0020445 | Hypercholesterolemia, Familial | 16, 5, 0.51, 9.87 | 1.03E-02 |
| umls:C0027672 | Neoplastic Syndromes, Hereditary | 48, 8, 1.52, 5.27 | 1.06E-02 |
Enriched diseases list identified in the gene set with potential deleterious SNVs in the HUI genomes. The hypergeometric column lists contain: C: number of reference genes in the disease category, O: number of genes enriched, E: the expected number in the category and R: ratio of enrichment. FDR: P-value adjusted by Benjamin-Hochberg multiple test.