| Literature DB >> 33083011 |
Cristiane S Rocha1,2, Rodrigo Secolin1,2, Maíra R Rodrigues1,2, Benilton S Carvalho2,3, Iscia Lopes-Cendes1,2.
Abstract
The development of precision medicine strategies requires prior knowledge of the genetic background of the target population. However, despite the availability of data from admixed Americans within large reference population databases, we cannot use these data as a surrogate for that of the Brazilian population. This lack of transferability is mainly due to differences between ancestry proportions of Brazilian and other admixed American populations. To address the issue, a coalition of research centres created the Brazilian Initiative on Precision Medicine (BIPMed). In this study, we aim to characterise two datasets obtained from 358 individuals from the BIPMed using two different platforms: whole-exome sequencing (WES) and a single nucleotide polymorphism (SNP) array. We estimated allele frequencies and variant pathogenicity values from the two datasets and compared our results using the BIPMed dataset with other public databases. Here, we show that the BIPMed WES dataset contains variants not included in dbSNP, including 6480 variants that have alternative allele frequencies (AAFs) >1%. Furthermore, after merging BIPMed WES and SNP array data, we identified 809,589 variants (47.5%) not present within the 1000 Genomes dataset. Our results demonstrate that, through the incorporation of Brazilian individuals into public genomic databases, BIPMed not only was able to provide valuable knowledge needed for the implementation of precision medicine but may also enhance our understanding of human genome variability and the relationship between genetic variation and disease predisposition.Entities:
Keywords: Medical genomics; Molecular medicine
Year: 2020 PMID: 33083011 PMCID: PMC7532430 DOI: 10.1038/s41525-020-00149-6
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Fig. 1Comparison between principal components (PCs) of WES and SNP array data.
a, b Scatterplots indicate the two first principal components identified using WES and SNP array datasets (black) and 1000 Genome populations, including Europeans (EUR = blue), sub-Saharan Africans (AFR = red), admixed Americans (AMR = green), East Asians (EAS = orange), and South Asians (SAS = purple). c, d Correlation plots assessing the relationship between the WES and SNP array for PC1 (a) and PC2 (b). Each point represents one individual. Solid lines indicate the best fit of the data via local regression (LOESS) with a 95% confidence interval shown by the grey area.
Fig. 2Comparison of Euclidean distance estimations between the BIPMed datasets and continental populations from the 1000 Genome project (1 KGP).
Estimates were based on minor allele frequency (MAF) from BIPMed WES (a) and SNP array data (b). The red and blue colours in the legend indicate positive and negative correlations between two populations, respectively. Values closer to 1.0 indicate a more significant correlation between the two populations. AFR Sub-Saharan Africans, AMR Admixed Americans, EAS East Asians, EUR Europeans, SAS South Asians.
Distribution of minimum allele frequencies (MAF) among variants.
| MAF distribution | Common in BIPMed | Rare in BIPMed | Total overlap | |
|---|---|---|---|---|
| Common in EUR | 595,874 (72.9%) | 7493 (1.0%) | 817,240 (100%) | 2.2e−16 |
| Rare in EUR | 75,584 (9.2%) | 138,289 (16.9%) | ||
| Common in AFR | 604,349 (74.0%) | 65,565 (8.0%) | 817,240 (100%) | 2.2e−16 |
| Rare in AFR | 67,109 (8.2%) | 80,217 (9.8%) | ||
| Common in AMR | 637,098 (78.0%) | 12,132 (1.5%) | 817,240 (100%) | 2.2e−16 |
| Rare in AMR | 34,360 (4.2%) | 133,650 (16.3%) |
The 817,240 variants are classified by their population of origin and include Europeans (EUR), Africans (AFR), and admixed Americans (AMR), according to the 1000 Genomes Project. We defined common variants as those with a MAF higher than or equal to 0.01, and rare variants otherwise. We counted total variants produced using high-density SNP genotyping and whole-genome sequencing and removed high-density SNP genotypes with MAF values less than 0.01 to avoid bias from genotyping errors. P values were calculated using Fisher’s exact test.
Distribution of birth location of BIPMed reference individuals within five Brazilian geographic regions.
| Brazilian region | Number of individuals |
|---|---|
| North | 1 (0.24%) |
| Northeast | 20 (5.59%) |
| Centre West | 3 (0.84%) |
| Southeast | 177 (49.44%) |
| South | 10 (2.29%) |
| Unknown | 147 (41.06%) |