Literature DB >> 29903722

EPIGEN-Brazil Initiative resources: a Latin American imputation panel and the Scientific Workflow.

Wagner C S Magalhães^1,2, Nathalia M Araujo¹, Thiago P Leal¹, Gilderlanio S Araujo¹, Paula J S Viriato¹, Fernanda S Kehdy^1,3, Gustavo N Costa⁴, Mauricio L Barreto^4,5, Bernardo L Horta⁶, Maria Fernanda Lima-Costa⁷, Alexandre C Pereira⁸, Eduardo Tarazona-Santos¹, Maíra R Rodrigues^1,9.

Abstract

EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel-the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs (info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources.

Entities: Chemical

Mesh：

Year: 2018 PMID： 29903722 PMCID： PMC6028131 DOI： 10.1101/gr.225458.117

Source DB: PubMed Journal: Genome Res ISSN： 1088-9051 Impact factor: 9.438

The EPIGEN-Brazil Initiative (https://epigen.grude.ufmg.br/) is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present how we are addressing two challenges to global dissemination of precision medicine and to the development of the bioinformatics know-how to support it. These challenges are (1) the persistent and severe underrepresentation of non-European individuals in human genome diversity studies and well-designed genetic epidemiology studies (Alexander et al. 2009; Bustamante et al. 2011; Check Hayden 2016; Popejoy and Fullerton 2016); and (2) the lack of transparency and reproducibility in the entire scientific process, including bioinformatics protocols (Iqbal et al. 2016). The underrepresentation of globally diverse individuals in genomic studies is not simply due to lack of their enrollment in these studies. Much more compelling is the need for a more global distribution of research groups with a strong background in genomics and bioinformatics, leading and performing this kind of study. In this context, the overarching goal of the EPIGEN-Brazil Initiative is to study the genomic diversity and its effects on complex phenotypes in Brazil, the most populous Latin American country (Borges et al. 2016; Lima-Costa et al. 2016; Marques et al. 2017). Brazil's more than 200 million inhabitants are the product of admixture that occurred during the last 500 years between Amerindians, Europeans, Africans, and their descendants. Interestingly, Brazil was the largest destiny of the African diaspora, and we have recently shown that Brazilians host on their genomes the diversity of African groups that have not yet been included in population genomics studies, such as Bantu Angola and Mozambique populations, two sources of the slave trade that originated in territories controlled by the Portuguese Crown (Kehdy et al. 2015). The EPIGEN-Brazil Initiative is studying 6487 Brazilians from the three largest population-based cohorts of the country (Fig. 1; Supplemental Table S1; Supplemental Material Sections 1, 2.1): (1) Salvador-SCAALA in northeast Brazil, with predominant African ancestry (18 years of follow-up) (Barreto et al. 2006); (2) the Bambuí Cohort Study of Aging in Minas Gerais in the southeast of the country (15 years of follow-up) (Lima-Costa et al. 2011); and (3) the 1982 Pelotas Birth-Cohort Study in southern Brazil (30 years of follow-up) (Victora and Barros 2006).

Figure 1.

Continental admixture of the EPIGEN-Brazil population-based cohorts. Ancestry was estimated using the ADMIXTURE software (Alexander et al. 2009), as in Kehdy et al. (2015). European, African, and Native American ancestry are, respectively: 42.8%, 50.8%, and 6.4% in Salvador; 78.5%, 14.8%, and 6.7% in Bambuí; and 76.1%, 15.9%, and 8% in Pelotas. Figure adapted from Kehdy et al. (2015). The EPIGEN-Brazil Initiative is a strategic project funded by the Brazilian Ministry of Health, and it integrates research areas well established in the country, such as epidemiology, public health, and human genetics (Salzano and Freire-Maia 1967; Barreto 2004; Salzano 2018) with bioinformatics, that is a vigorous emerging area in Brazil. To address the need for more global research groups, one of the main goals of the EPIGEN-Brazil Initiative is to strengthen research capabilities in these research areas in Brazil, and we are training dozens of graduate students and postdoctoral researchers from Brazil and other Latin American countries. In Latin America, we are collaborating with the National Institute of Health from Peru to study the genomic diversity of the Peruvian population (Harris et al. 2017), which differs from the Brazilian population in having a predominant Native American ancestry.

The failing on diversity of human genomics and the EPIGEN-Brazil imputation panel

Imputation is the prediction of missing genotypes based on the pattern of linkage disequilibrium of a reference panel. For GWAS and fine-mapping studies, cosmopolitan public panels for imputation exist, such as the 1000 Genomes Project (1KGP) Phase 3 (Sudmant et al. 2015), based on whole-genome sequencing (WGS) data. In addition to the 1092 individuals from Phase 1, Phase 3 of the 1KGP panel has incorporated 1412 new individuals, including four new populations from Africa, one from admixed Latin America, two from East Asia, and five from South Asia, each with 61–113 individuals (Supplemental Table S3; Supplemental Material Section 2.2.2). Notwithstanding this improvement in the coverage of global genetic diversity, studies continue to show that imputation accuracy may be improved by using WGS or high-density SNP data from individuals with similar genetic background to the target population (Thornton and Bermejo 2014; Ahmad et al. 2017; Mitt et al. 2017). However, for studies performed in non-European populations, WGS or high-density array data are still rare. Next we present a new imputation panel specific for admixed Brazilian and Latin American populations and show that the inclusion of high-density array data from the Brazilian population improve imputation quality in respect to the use of the 1KGP (Phase 3) panel alone.

Addressing lack of transparency and reproducibility of genomic studies

A second challenge faced by global dissemination of bioinformatics and the know-how to support precision medicine is the lack of transparency and reproducibility of the entire scientific process (Iqbal et al. 2016). This limits the worldwide flow of bioinformatics knowledge necessary to build and train research groups with a solid bioinformatics background. Although there are several claims for more transparency and reproducibility of all the scientific process in biomedical literature (Sandve et al. 2013; Kolker et al. 2014; Iqbal et al. 2016), advances from genomic initiatives to share bioinformatics protocols are still rare. A still valid and compelling claim and concept were formulated by Bourne (2010), proposing to move away from the classical scientific articles to a more interactive publication of Scientific Workflows. Bourne defined a Scientific Workflow as “part process and part container for content (or pointers to that content), that is significantly broader and more integrated than what is sent for publication today, namely, a manuscript and supplemental information in an essentially computationally unusable form.” Thus, a Scientific Workflow is a more complex concept than, and should not be confused with, a bioinformatics Workflow/Pipeline Management System such as Taverna (Wolstencroft et al. 2013) or Galaxy (Afgan et al. 2016), although the latter may be used to implement Scientific Workflows. Here, we present the EPIGEN-Brazil Scientific Workflow (http://www.ldgh.com.br/scientificworkflow), a tool for transparent and reproducible bioinformatics analyses, and exemplify it in the context of our EPIGEN-5M+1KGP imputation panel. Our Scientific Workflow includes four self-contained components—scientific publications, flowcharts, masterscripts, and documents—that represent different stages of the scientific process. The scientific publications include both the final research products and the scientific hypotheses. The flowcharts are conceptual visualizations of research tasks performed as part of scientific publications, and the masterscripts are the operational computational execution (programs) of tasks represented by the flowcharts. Documents comprise other information such as technical reports, workshop presentations, and intermediate results.

Results and discussion

Imputation experiments

We genotyped 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative (90, 88, and 87 individuals randomly selected from the Salvador, Bambuí, and Pelotas cohorts, respectively) (Fig. 1; Supplemental Table S2; Supplemental Material Section 2.2.1). We present a new imputation reference panel (hereafter, the EPIGEN-5M+1KGP panel), which is the fusion of the haplotypes derived from the EPIGEN-5M data set with the public 1KGP Phase 3 imputation panel (Supplemental Table S4; Supplemental Fig. S1; Supplemental Material Sections 2.3, 2.4, 2.5.1). Hereafter, the 1KGP Phase 3 panel will be simply called 1KGP. In the context of GWAS and fine-mapping studies in Brazilian and other Latin American populations with a predominant mix of European and African ancestries, we tested whether using the EPIGEN-5M+1KGP imputation panel improves imputation in respect to the 1KGP imputation panel alone. The EPIGEN-5M+1KGP and the 1KGP imputation panels have a similar number of variants and allele frequency spectra (Fig. 2A; Supplemental Fig. S2), although the EPIGEN-5M+1KGP has 14,970 more SNPs and 530 (∼10%) more haplotypes than the 1KGP imputation panel (5538 versus 5008 haplotypes, respectively) (Supplemental Table S4). More importantly, after phase inference (Supplemental Tables S5, S6; Supplemental Material Section 2.5.2), when we imputed a target SNPs data set (the 6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) (Fig. 1; Kehdy et al. 2015) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total and 788,873 additional high confidence SNPs (info score ≥0.8) than when using the 1KGP panel alone (Fig. 2B; Supplemental Tables S7, S8; Supplemental Material Section 2.5.3). Thus, the major effect of the inclusion of the EPIGEN-5M data set in a new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. Particularly, the EPIGEN-5M+1KGP panel improves imputation quality in respect to 1KGP across a wide range of allele frequencies (Fig. 2C; Supplemental Figs. S3–S6). Therefore, imputation quality (i.e., info score) improves with the inclusion of the EPIGEN-5M data set even if it derives from high-density array data, rather than from WGS (which would be optimal). Imputation quality improves whether we input the entire EPIGEN-Brazil target data set or each of the cohorts separately. This suggests that the assembled EPIGEN-5M+1KGP imputation panel performs better than the 1KGP panel for a variety of study sizes, admixture levels, and post-Columbian demographic histories. Moreover, because high-density array data improve imputation quality, the 2.2 million SNPs data set previously published by Kehdy et al. (2015) may also be used for imputation for GWAS performed in Latin American populations with lower-density arrays.

Figure 2.

Comparison between the 1000 Genomes Project (1KGP) and EPIGEN-5M+1KGP imputation reference panels for autosomal chromosomes. The EPIGEN-5M+1KGP panel is the fusion of the haplotypes derived from the EPIGEN-5M data set (the genotyping of 265 EPIGEN-Brazil individuals for 4.3 million SNPs) with the public 1KGP Phase 3 imputation panel. (A) Allele frequency spectrum of variants by their minor allele frequency (MAF) in each imputation reference panel. The number of SNPs is described in each category, and the percentages are calculated dividing the number of SNPs in each MAF class by the total number of SNPs of each imputation reference panel (top). (B) Distribution of the info score quality metric for imputation results. The dashed vertical line indicates the 0.8 threshold info score value, and the horizontal line indicates the highest number of SNPs info score ≥0.8 achieved by a reference panel. (C) Imputation quality (mean info score) as a function of MAF for the target data set after imputation with each of the tested reference panels (MAF bin sizes of 0.01). The case of the EPIGEN-5M+1KGP imputation panel exemplifies the applicability of the Scientific Workflow (Supplemental Material Section 3). All methodological steps to obtain the panel are delineated in Methods and are also visualized as a Scientific Workflow flowchart in http://www.ldgh.com.br/scientificworkflow/flowcharts.php (Fig. 3). The corresponding masterscripts that computationally operationalize the flowchart are available at http://www.ldgh.com.br/scientificworkflow/master_scripts.php (Supplemental Material Section 3; Supplemental Figs. S7, S8).

Figure 3.

Flowchart of the whole imputation process (see the EPIGEN-Brazil Scientific Workflow: http://www.ldgh.com.br/scientificworkflow/flowcharts.php). (A) Overview of the complete imputation process. (B,C) Two previous tasks may be required for imputation if it is necessary to create or merge reference panels. The Reference Panel Creation task (B, and orange color process in A) converts a data set of unphased genotypes into a reference panel, producing the EPIGEN-5M Reference Panel of haplotypes from the EPIGEN-5M data set. The Merge Reference Panels task (C, and pink color process in A) produces combinations of two different panels using IMPUTE2 software, generating the EPIGEN-5M+1KGP Reference Panel. The imputation process itself consists of three main tasks: pre-phasing, haplotype phase inference, and imputation. The pre-phasing task (D, and green color processes in A) performs strand alignment between target and reference panel using software SHAPEIT2, PLINK, and the scripting language AWK. Haplotype phase inference task (yellow color processes in A) of the target data set uses the methodology implemented in the software SHAPEIT2, generating .haps and .sample files (target data set aligned and phased with the Reference Panel). The latter files serve as input for the imputation task (red color processes in A) conducted with software IMPUTE2, following the “best practices” guidelines in the software documentation. In conclusion, although high-coverage WGS data from populations underrepresented in genomic studies are the optimal source of haplotypes to be used for imputation in genome-wide/fine-mapping association studies, we show here that, in the absence of this kind of data, high-density array data from a few hundreds of individuals from the same populations, used together with the public 1KGP data set, is an alternative to improve imputation quality. Therefore, we expect that the EPIGEN-5M+1KGP imputation panel will allow for better GWAS, admixture mapping/fine-mapping studies in Latin American populations with ancestries that are similar to the Brazilian population studied by the EPIGEN-Brazil Initiative. We also use the EPIGEN-5M+1KGP imputation panel to exemplify our implementation of the concept of Scientific Workflow, in sensu Bourne (2010), which has the goal of making publicly available as much of the scientific process as possible. Since the Scientific Workflow represents different steps of the scientific process, from project development to publication, and with different levels of abstraction and detail, it emerges as a concrete initiative that moves us toward more transparency and reproducibility in bioinformatics analyses.

Methods

Imputation overview

Target data set

The EPIGEN-2.5M data set comprises 2,235,109 SNPs for 6487 Brazilians from three population-based cohorts (1309, 1442, and 3736 individuals from Salvador, Bambuí, and Pelotas, respectively) (Supplemental Table S1, published in Kehdy et al. 2015). EPIGEN-Brazil genome-wide data genotyped for the Illumina Omni 2.5M array are available in the European Nucleotide Archive under EPIGEN Committee Controlled Access mode.

Reference panels

We used two reference panels: (1) the public 1000 Genomes Project Phase 3 haplotypes, version 20130502, (1KGP) (Sudmant et al. 2015); and (2) The EPIGEN-5M+1KGP reference panel, which is the merge of the 1KGP panel and our unpublished EPIGEN-5M panel, bearing 14,970 more SNPs than the public panel solely. The EPIGEN-5M data set was genotyped with the Illumina HumanOmni5-4v1 array. After quality control, the data set comprises 4,102,271 SNPs for 265 Brazilians from the three cohorts (90, 88, and 87 individuals from Salvador, Bambuí, and Pelotas, respectively) (Supplemental Table S2). We used SHAPEIT2 (Delaneau et al. 2013) to infer the chromosome phase of the EPIGEN-5M data set (Supplemental Tables S4–S8).

Pre-phasing between the target and reference panels

We used SHAPEIT2 (Delaneau et al. 2013) to check the consistency of the SNP's strand of the target and the reference panels with the human genome reference sequence (GRCh37/hg19), and PLINK software (Purcell et al. 2007) to flip the strands in case of inconsistencies. Because our data are genotyped with the highest-density array (Omni 5.0) and not NGS-based, a new alignment to GRCh38 would not significantly affect the conclusions.

Haplotype phase inference of the target data set

We phased the target EPIGEN-2.5M data set using (1) the 1KGP haplotypes as phasing references, for the imputation with the 1KGP reference panel; and (2) the EPIGEN-5M data set as phasing reference, for the imputation with the EPIGEN-5M+1KGP reference panel.

Imputation

We performed the imputation using IMPUTE2 v.2.3.2 (Howie et al. 2009) on chromosome chunks of 7 Mb, with additional 250 kb of buffer on both sides (these were used for imputation inference but omitted from the results). We used the effective size parameter (Ne) set to 20,000 and the IMPUTE2 info score as a metric of imputation quality (Supplemental Fig. S1).

Data access

The data generated in this study have been submitted to the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) under accession number PRJEB9080 in EPIGEN Committee Controlled Access mode. All imputation tasks were performed using our Perl masterscript available as Supplemental Material (Supplemental Scripts) and also at our Scientific Workflow website (http://www.ldgh.com.br/scientificworkflow/master_scripts.php). The EPIGEN-5M+1KGP imputation panel in haplotype format is freely available at http://www.ldgh.com.br/scientificworkflow/documents.html.

Brazilian EPIGEN Consortium

Isabela O. Alvim,13 Victor Borda,13,14 Mateus H. Gouveia,13,15 Moara Machado,13,16 Rennan G. Moreira,13,17 Fernanda Rodrigues-Soares,13 Hanaisa P. Sant Anna,13 Meddly L. Santolalla,13 Marilia O. Scliar,13 Giordano B. Soares-Souza,13 Roxana Zamudio,13 Camila Zolini13,18

27 in total

1. Cohort profile: the Bambui (Brazil) Cohort Study of Ageing.

Authors: Maria Fernanda Lima-Costa; Josélia O A Firmo; Elizabeth Uchoa
Journal: Int J Epidemiol Date: 2010-08-30 Impact factor: 7.196

2. Genomics for the world.

Authors: Carlos D Bustamante; Esteban González Burchard; Francisco M De la Vega
Journal: Nature Date: 2011-07-13 Impact factor: 49.962

3. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

4. Fast model-based estimation of ancestry in unrelated individuals.

Authors: David H Alexander; John Novembre; Kenneth Lange
Journal: Genome Res Date: 2009-07-31 Impact factor: 9.043

5. Improved whole-chromosome phasing for disease and population genetic studies.

Authors: Olivier Delaneau; Jean-Francois Zagury; Jonathan Marchini
Journal: Nat Methods Date: 2013-01 Impact factor: 28.547

6. Toward more transparent and reproducible omics studies through a common metadata checklist and data publications.

Authors: Eugene Kolker; Vural Özdemir; Lennart Martens; William Hancock; Gordon Anderson; Nathaniel Anderson; Sukru Aynacioglu; Ancha Baranova; Shawn R Campagna; Rui Chen; John Choiniere; Stephen P Dearth; Wu-Chun Feng; Lynnette Ferguson; Geoffrey Fox; Dmitrij Frishman; Robert Grossman; Allison Heath; Roger Higdon; Mara H Hutz; Imre Janko; Lihua Jiang; Sanjay Joshi; Alexander Kel; Joseph W Kemnitz; Isaac S Kohane; Natali Kolker; Doron Lancet; Elaine Lee; Weizhong Li; Andrey Lisitsa; Adrian Llerena; Courtney Macnealy-Koch; Jean-Claude Marshall; Paola Masuzzo; Amanda May; George Mias; Matthew Monroe; Elizabeth Montague; Sean Mooney; Alexey Nesvizhskii; Santosh Noronha; Gilbert Omenn; Harsha Rajasimha; Preveen Ramamoorthy; Jerry Sheehan; Larry Smarr; Charles V Smith; Todd Smith; Michael Snyder; Srikanth Rapole; Sanjeeva Srivastava; Larissa Stanberry; Elizabeth Stewart; Stefano Toppo; Peter Uetz; Kenneth Verheggen; Brynn H Voy; Louise Warnich; Steven W Wilhelm; Gregory Yandl
Journal: OMICS Date: 2014-01

7. Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations.

Authors: Fernanda S G Kehdy; Mateus H Gouveia; Moara Machado; Wagner C S Magalhães; Andrea R Horimoto; Bernardo L Horta; Rennan G Moreira; Thiago P Leal; Marilia O Scliar; Giordano B Soares-Souza; Fernanda Rodrigues-Soares; Gilderlanio S Araújo; Roxana Zamudio; Hanaisa P Sant Anna; Hadassa C Santos; Nubia E Duarte; Rosemeire L Fiaccone; Camila A Figueiredo; Thiago M Silva; Gustavo N O Costa; Sandra Beleza; Douglas E Berg; Lilia Cabrera; Guilherme Debortoli; Denise Duarte; Silvia Ghirotto; Robert H Gilman; Vanessa F Gonçalves; Andrea R Marrero; Yara C Muniz; Hansi Weissensteiner; Meredith Yeager; Laura C Rodrigues; Mauricio L Barreto; M Fernanda Lima-Costa; Alexandre C Pereira; Maíra R Rodrigues; Eduardo Tarazona-Santos
Journal: Proc Natl Acad Sci U S A Date: 2015-06-29 Impact factor: 11.205

8. Ten simple rules for reproducible computational research.

Authors: Geir Kjetil Sandve; Anton Nekrutenko; James Taylor; Eivind Hovig
Journal: PLoS Comput Biol Date: 2013-10-24 Impact factor: 4.475

9. Reproducible Research Practices and Transparency across the Biomedical Literature.

Authors: Shareen A Iqbal; Joshua D Wallach; Muin J Khoury; Sheri D Schully; John P A Ioannidis
Journal: PLoS Biol Date: 2016-01-04 Impact factor: 8.029

10. An integrated map of structural variation in 2,504 human genomes.

Authors: Peter H Sudmant; Tobias Rausch; Eugene J Gardner; Robert E Handsaker; Alexej Abyzov; John Huddleston; Yan Zhang; Kai Ye; Goo Jun; Markus Hsi-Yang Fritz; Miriam K Konkel; Ankit Malhotra; Adrian M Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark J P Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y K Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki; Taejeong Bae; Eliza Cerveira; Peter Chines; Zechen Chong; Laura Clarke; Elif Dal; Li Ding; Sarah Emery; Xian Fan; Madhusudan Gujral; Fatma Kahveci; Jeffrey M Kidd; Yu Kong; Eric-Wubbo Lameijer; Shane McCarthy; Paul Flicek; Richard A Gibbs; Gabor Marth; Christopher E Mason; Androniki Menelaou; Donna M Muzny; Bradley J Nelson; Amina Noor; Nicholas F Parrish; Matthew Pendleton; Andrew Quitadamo; Benjamin Raeder; Eric E Schadt; Mallory Romanovitch; Andreas Schlattl; Robert Sebra; Andrey A Shabalin; Andreas Untergasser; Jerilyn A Walker; Min Wang; Fuli Yu; Chengsheng Zhang; Jing Zhang; Xiangqun Zheng-Bradley; Wanding Zhou; Thomas Zichner; Jonathan Sebat; Mark A Batzer; Steven A McCarroll; Ryan E Mills; Mark B Gerstein; Ali Bashir; Oliver Stegle; Scott E Devine; Charles Lee; Evan E Eichler; Jan O Korbel
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

10 in total

1. The Brazilian Initiative on Precision Medicine (BIPMed): fostering genomic data-sharing of underrepresented populations.

Authors: Cristiane S Rocha; Rodrigo Secolin; Maíra R Rodrigues; Benilton S Carvalho; Iscia Lopes-Cendes
Journal: NPJ Genom Med Date: 2020-10-02 Impact factor: 8.617

2. The genetic structure and adaptation of Andean highlanders and Amazonians are influenced by the interplay between geography and culture.

Authors: Víctor Borda; Isabela Alvim; Marla Mendes; Carolina Silva-Carvalho; Giordano B Soares-Souza; Thiago P Leal; Vinicius Furlan; Marilia O Scliar; Roxana Zamudio; Camila Zolini; Gilderlanio S Araújo; Marcelo R Luizon; Carlos Padilla; Omar Cáceres; Kelly Levano; César Sánchez; Omar Trujillo; Pedro O Flores-Villanueva; Michael Dean; Silvia Fuselli; Moara Machado; Pedro E Romero; Francesca Tassi; Meredith Yeager; Timothy D O'Connor; Robert H Gilman; Eduardo Tarazona-Santos; Heinner Guio
Journal: Proc Natl Acad Sci U S A Date: 2020-12-04 Impact factor: 12.779

3. Genetic signatures of gene flow and malaria-driven natural selection in sub-Saharan populations of the "endemic Burkitt Lymphoma belt".

Authors: Mateus H Gouveia; Andrew W Bergen; Victor Borda; Kelly Nunes; Thiago P Leal; Martin D Ogwang; Edward D Yeboah; James E Mensah; Tobias Kinyera; Isaac Otim; Hadijah Nabalende; Ismail D Legason; Sununguko Wata Mpoloka; Gaonyadiwe George Mokone; Patrick Kerchan; Kishor Bhatia; Steven J Reynolds; Richard B Birtwum; Andrew A Adjei; Yao Tettey; Evelyn Tay; Robert Hoover; Ruth M Pfeiffer; Robert J Biggar; James J Goedert; Ludmila Prokunina-Olsson; Michael Dean; Meredith Yeager; M Fernanda Lima-Costa; Ann W Hsing; Sarah A Tishkoff; Stephen J Chanock; Eduardo Tarazona-Santos; Sam M Mbulaiteye
Journal: PLoS Genet Date: 2019-03-08 Impact factor: 5.917

Review 4. Genetic Epidemiology of Breast Cancer in Latin America.

Authors: Valentina A Zavala; Silvia J Serrano-Gomez; Julie Dutil; Laura Fejerman
Journal: Genes (Basel) Date: 2019-02-18 Impact factor: 4.096

5. Origins, Admixture Dynamics, and Homogenization of the African Gene Pool in the Americas.

Authors: Mateus H Gouveia; Victor Borda; Thiago P Leal; Rennan G Moreira; Andrew W Bergen; Fernanda S G Kehdy; Isabela Alvim; Marla M Aquino; Gilderlanio S Araujo; Nathalia M Araujo; Vinicius Furlan; Raquel Liboredo; Moara Machado; Wagner C S Magalhaes; Lucas A Michelin; Maíra R Rodrigues; Fernanda Rodrigues-Soares; Hanaisa P Sant Anna; Meddly L Santolalla; Marília O Scliar; Giordano Soares-Souza; Roxana Zamudio; Camila Zolini; Maria Catira Bortolini; Michael Dean; Robert H Gilman; Heinner Guio; Jorge Rocha; Alexandre C Pereira; Mauricio L Barreto; Bernardo L Horta; Maria F Lima-Costa; Sam M Mbulaiteye; Stephen J Chanock; Sarah A Tishkoff; Meredith Yeager; Eduardo Tarazona-Santos
Journal: Mol Biol Evol Date: 2020-06-01 Impact factor: 16.240

6. Genetics of cognitive trajectory in Brazilians: 15 years of follow-up from the Bambuí-Epigen Cohort Study of Aging.

Authors: Mateus H Gouveia; Cibele C Cesar; Meddly L Santolalla; Hanaisa P Sant Anna; Marilia O Scliar; Thiago P Leal; Nathalia M Araújo; Giordano B Soares-Souza; Wagner C S Magalhães; Ignacio F Mata; Cleusa P Ferri; Erico Castro-Costa; Sam M Mbulaiteye; Sarah A Tishkoff; Daniel Shriner; Charles N Rotimi; Eduardo Tarazona-Santos; Maria Fernanda Lima-Costa
Journal: Sci Rep Date: 2019-12-02 Impact factor: 4.379

7. Genomic Regions 10q22.2, 17q21.31, and 2p23.1 Can Contribute to a Lower Lung Function in African Descent Populations.

Authors: Héllen Fonseca; Thiago M da Silva; Mariana Saraiva; Meddly L Santolalla; Hanaisa P Sant'Anna; Nathalia M Araujo; Natália P Lima; Raimon Rios; Eduardo Tarazona-Santos; Bernardo L Horta; Alvaro Cruz; Mauricio L Barreto; Camila A Figueiredo
Journal: Genes (Basel) Date: 2020-09-04 Impact factor: 4.096

Review 8. Cancer health disparities in racial/ethnic minorities in the United States.

Authors: Valentina A Zavala; Paige M Bracci; John M Carethers; Luis Carvajal-Carmona; Nicole B Coggins; Marcia R Cruz-Correa; Melissa Davis; Adam J de Smith; Julie Dutil; Jane C Figueiredo; Rena Fox; Kristi D Graves; Scarlett Lin Gomez; Andrea Llera; Susan L Neuhausen; Lisa Newman; Tung Nguyen; Julie R Palmer; Nynikka R Palmer; Eliseo J Pérez-Stable; Sorbarikor Piawah; Erik J Rodriquez; María Carolina Sanabria-Salas; Stephanie L Schmit; Silvia J Serrano-Gomez; Mariana C Stern; Jeffrey Weitzel; Jun J Yang; Jovanny Zabaleta; Elad Ziv; Laura Fejerman
Journal: Br J Cancer Date: 2020-09-09 Impact factor: 9.075

9. Association Analysis of Candidate Variants in Admixed Brazilian Patients With Genetic Generalized Epilepsies.

Authors: Felipe S Kaibara; Tânia K de Araujo; Patricia A O R A Araujo; Marina K M Alvim; Clarissa L Yasuda; Fernando Cendes; Iscia Lopes-Cendes; Rodrigo Secolin
Journal: Front Genet Date: 2021-07-08 Impact factor: 4.599

10. Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes.

Authors: Andrés Jiménez-Kaufmann; Amanda Y Chong; Adrián Cortés; Consuelo D Quinto-Cortés; Selene L Fernandez-Valverde; Leticia Ferreyra-Reyes; Luis Pablo Cruz-Hervert; Santiago G Medina-Muñoz; Mashaal Sohail; María J Palma-Martinez; Gudalupe Delgado-Sánchez; Norma Mongua-Rodríguez; Alexander J Mentzer; Adrian V S Hill; Hortensia Moreno-Macías; Alicia Huerta-Chagoya; Carlos A Aguilar-Salinas; Michael Torres; Hie Lim Kim; Namrata Kalsi; Stephan C Schuster; Teresa Tusié-Luna; Diego Ortega Del-Vecchyo; Lourdes García-García; Andrés Moreno-Estrada
Journal: Front Genet Date: 2022-01-03 Impact factor: 4.599

10 in total