Literature DB >> 27742697

Visualizing the geography of genetic variants.

Joseph H Marcus1, John Novembre1,2.   

Abstract

Summary: One of the key characteristics of any genetic variant is its geographic distribution. The geographic distribution can shed light on where an allele first arose, what populations it has spread to, and in turn on how migration, genetic drift, and natural selection have acted. The geographic distribution of a genetic variant can also be of great utility for medical/clinical geneticists and collectively many genetic variants can reveal population structure. Here we develop an interactive visualization tool for rapidly displaying the geographic distribution of genetic variants. Through a REST API and dynamic front-end, the Geography of Genetic Variants (GGV) browser ( http://popgen.uchicago.edu/ggv/ ) provides maps of allele frequencies in populations distributed across the globe. Availability and Implementation: GGV is implemented as a website ( http://popgen.uchicago.edu/ggv/ ) which employs an API to access frequency data ( http://popgen.uchicago.edu/freq_api/ ). Python and javascript source code for the website and the API are available at: http://github.com/NovembreLab/ggv/ and http://github.com/NovembreLab/ggv-api/ . Contact: jnovembre@uchicago.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 27742697      PMCID: PMC5408806          DOI: 10.1093/bioinformatics/btw643

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Genetics researchers often face the problem that they have identified one or many genetic variants of interest using an approach such as a genome-wide association study and then would like to know the geographic distribution of the variant. For example, the researcher may hope to address: (i) implications for genomic medicine (e.g. Is a risk allele geographically localized to a certain patient population? What population should be studied to observe variant carriers? (Rosenberg )); or (ii) the evolutionary history of the variant in question (e.g. does the variant correlate with a known environmental factor in a manner suggestive of some geographically localized selection pressure? (Novembre and Di Rienzo, 2009)). A simple geographic map of the distribution of a genetic variant can be incredibly insightful for these questions. Contemporary population genetics researchers are also faced with the challenge of large, high-dimensional datasets. For example, it is not uncommon for a researcher in human genetics to have a dataset comprised of thousands of individuals measured at hundreds of thousands or even millions of single nucleotide variants (SNVs). One common approach to visualizing such high-dimensional data is to compress the SNV dimensions down to a small number of latent factors, using a method such as principal components analysis (Patterson ), or a model-based clustering method such as STRUCTURE (Pritchard ). While these methods are extremely valuable, researchers can use them too often without inspecting the underlying variant data in more detail. A natural approach to gaining more insight to the overall structure of a population genetic dataset is to visually inspect what geographic patterns arise in allele frequency maps. Unfortunately, generating geographic allele frequency maps is time-consuming for the average researcher as it requires a combination of data-wrangling methods (Kandel ) and map-making techniques that are unfamiliar to most. Our aim here is to produce a tailored system for rapidly constructing informative geographic maps of allele frequency variation. Our work is inspired by past tools such as the ALFRED database (Rajeevan ) and the maps available on the HGDP Selection browser (Pickrell ) whose allele frequency output and plots have been used in research articles (e.g. Coop et al., 2009; Pickrell ), books (e.g. Dudley and Karczewski, 2013), and have been made available on the UCSC Genome Browser (available under the HGDP Allele Freq track of the browser, Kent ). Taking advantage of recent advances in web-based visualization tools (Bostock ), we aim to address the significant visualization challenges that are inherit in the production of geographic allele frequency maps for large population genomic datasets, including dynamic interaction, display of rare genetic variation, and representation of uncertainty in estimated allele frequencies due to variable sample sizes.

2 Approach

The Geography of Genetic Variants browser (GGV) uses the scalable vector graphics and mapping utilities of D3.js (Bostock ). The front-end provides legends for the map and various configuration boxes to allow users to query different datasets or choose visualization options. In order to allow for easy access to commonly used public genomic datasets, such as the 1000 Genomes project (The 1000 Genomes Project Consortium, 2015) or Human Genome Diversity project (Li ), we have developed a REST API (Grinberg, 2014). The API allows retreival of SNVs by position, rsid (Sherry ) or at random. After a query, the GGV displays the allele frequencies as a collection of pie charts where each represents the frequency of the globally minor allele in a single population (Fig. 1).
Fig. 1.

Example screenshot from the Geography of Genetic Variants browser using The 1000 Genomes Project Consortium (2015) data. Each pie chart represents a population with the blue slice of the pie displaying the frequency of the global minor allele

Example screenshot from the Geography of Genetic Variants browser using The 1000 Genomes Project Consortium (2015) data. Each pie chart represents a population with the blue slice of the pie displaying the frequency of the global minor allele We implement a variety of features: (1) Rare variants. Many alleles are rare (e.g. The 1000 Genomes Project Consortium, 2015), and displaying them can be a challenge with proportional scales that range from zero to one. To address this challenge we re-scale frequencies, so that small frequencies become visible. Specifically, we use a frequency scale that is indicated in a legend below the map and represented by varying color in the pie charts (Fig. 1, Supplementary Fig. S1, Supplementary Table S1). (2) Uncertainty in frequency data. We use varying transparency in a population’s pie chart: estimated frequencies with higher levels of sampling error (e.g. those from samples with n < 30) are made more transparent, and hence less visible, on the map (Fig. 1, Supplementary Fig. S2). (3) Overlapping populations. We use force-directed layouts of the populations such that no two points are overlapping each other, and yet the points will be pulled towards their true origins (Fig. 1, Supplementary Fig. S3). Also, by hovering the mouse cursor over any population, a user can see the population labels and precise frequency information. By allowing rapid generation of allele frequency maps, we hope to facilitate the interpretation of variant function and history by practicing geneticists. Also, for students of human diversity, it is often difficult to conceptualize classic statements regarding how most variation in humans is shared among populations (Lewontin, 1972) and how the fixation index F is relatively low globally (10–15% The 1000 Genomes Project Consortium, 2015). We hope that the ability to query random variants from major human population genetic samples will allow students to appreciate the structure of human genetic diversity in an approachable and intuitive form.

Acknowledgements

We acknowledge the Research Computer Center at the University of Chicago, especially Jeff Tharsen and Alex Mueller, for on-going support and development, as well as John Zekos for server administration support and members of the Novembre Lab.

Funding

Support for this work was provided by the National Institutes of Health via the Big Data to Knowledge initiative (1U01 CA198933-0, JN) and the National Institute of General Medical Sciences under training grant award number T32GM007197 (JHM). Conflict of Interest: none declared. Click here for additional data file.
  12 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

3.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

4.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

Review 5.  Genome-wide association studies in diverse populations.

Authors:  Noah A Rosenberg; Lucy Huang; Ethan M Jewett; Zachary A Szpiech; Ivana Jankovic; Michael Boehnke
Journal:  Nat Rev Genet       Date:  2010-05       Impact factor: 53.242

Review 6.  Spatial patterns of variation due to natural selection in humans.

Authors:  John Novembre; Anna Di Rienzo
Journal:  Nat Rev Genet       Date:  2009-10-13       Impact factor: 53.242

7.  Signals of recent positive selection in a worldwide sample of human populations.

Authors:  Joseph K Pickrell; Graham Coop; John Novembre; Sridhar Kudaravalli; Jun Z Li; Devin Absher; Balaji S Srinivasan; Gregory S Barsh; Richard M Myers; Marcus W Feldman; Jonathan K Pritchard
Journal:  Genome Res       Date:  2009-03-23       Impact factor: 9.043

8.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

9.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

10.  The role of geography in human adaptation.

Authors:  Graham Coop; Joseph K Pickrell; John Novembre; Sridhar Kudaravalli; Jun Li; Devin Absher; Richard M Myers; Luigi Luca Cavalli-Sforza; Marcus W Feldman; Jonathan K Pritchard
Journal:  PLoS Genet       Date:  2009-06-05       Impact factor: 5.917

View more
  44 in total

1.  Association of Schizophrenia Risk With Disordered Niacin Metabolism in an Indian Genome-wide Association Study.

Authors:  Sathish Periyasamy; Sujit John; Raman Padmavati; Preeti Rajendren; Priyadarshini Thirunavukkarasu; Jacob Gratten; Anna Vinkhuyzen; Allan McRae; Elizabeth G Holliday; Dale R Nyholt; Derek Nancarrow; Andrew Bakshi; Gibran Hemani; Deborah Nertney; Heather Smith; Cheryl Filippich; Kalpana Patel; Javed Fowdar; Duncan McLean; Srinivasan Tirupati; Arunkumar Nagasundaram; Prasad Rao Gundugurti; Krishnamurthy Selvaraj; Jayaprakash Jegadeesan; Lynn B Jorde; Naomi R Wray; Matthew A Brown; Rachel Suetani; Jean Giacomotto; Rangaswamy Thara; Bryan J Mowry
Journal:  JAMA Psychiatry       Date:  2019-10-01       Impact factor: 21.596

Review 2.  Importance of Genetic Studies of Cardiometabolic Disease in Diverse Populations.

Authors:  Lindsay Fernández-Rhodes; Kristin L Young; Adam G Lilly; Laura M Raffield; Heather M Highland; Genevieve L Wojcik; Cary Agler; Shelly-Ann M Love; Samson Okello; Lauren E Petty; Mariaelisa Graff; Jennifer E Below; Kimon Divaris; Kari E North
Journal:  Circ Res       Date:  2020-06-04       Impact factor: 17.367

3.  A positively selected FBN1 missense variant reduces height in Peruvian individuals.

Authors:  Samira Asgari; Yang Luo; Ali Akbari; Gillian M Belbin; Xinyi Li; Daniel N Harris; Martin Selig; Eric Bartell; Roger Calderon; Kamil Slowikowski; Carmen Contreras; Rosa Yataco; Jerome T Galea; Judith Jimenez; Julia M Coit; Chandel Farroñay; Rosalynn M Nazarian; Timothy D O'Connor; Harry C Dietz; Joel N Hirschhorn; Heinner Guio; Leonid Lecca; Eimear E Kenny; Esther E Freeman; Megan B Murray; Soumya Raychaudhuri
Journal:  Nature       Date:  2020-05-13       Impact factor: 49.962

4.  Genetic susceptibility to severe childhood asthma and rhinovirus-C maintained by balancing selection in humans for 150 000 years.

Authors:  Mary B O'Neill; Guillaume Laval; João C Teixeira; Ann C Palmenberg; Caitlin S Pepperell
Journal:  Hum Mol Genet       Date:  2020-03-27       Impact factor: 6.150

Review 5.  The Future of Genomic Studies Must Be Globally Representative: Perspectives from PAGE.

Authors:  Stephanie A Bien; Genevieve L Wojcik; Chani J Hodonsky; Christopher R Gignoux; Iona Cheng; Tara C Matise; Ulrike Peters; Eimear E Kenny; Kari E North
Journal:  Annu Rev Genomics Hum Genet       Date:  2019-04-12       Impact factor: 8.929

6.  An Atlas of Genetic Variation Linking Pathogen-Induced Cellular Traits to Human Disease.

Authors:  Liuyang Wang; Kelly J Pittman; Jeffrey R Barker; Raul E Salinas; Ian B Stanaway; Graham D Williams; Robert J Carroll; Tom Balmat; Andy Ingham; Anusha M Gopalakrishnan; Kyle D Gibbs; Alejandro L Antonia; Joseph Heitman; Soo Chan Lee; Gail P Jarvik; Joshua C Denny; Stacy M Horner; Mark R DeLong; Raphael H Valdivia; David R Crosslin; Dennis C Ko
Journal:  Cell Host Microbe       Date:  2018-08-08       Impact factor: 21.023

7.  Reply to Gilchrist et al.: Possible roles for VAC14 in multiple infectious diseases.

Authors:  Monica I Alvarez; Dennis C Ko
Journal:  Proc Natl Acad Sci U S A       Date:  2018-03-27       Impact factor: 11.205

8.  Beyond drugs: the evolution of genes involved in human response to medications.

Authors:  Silvia Fuselli
Journal:  Proc Biol Sci       Date:  2019-10-23       Impact factor: 5.349

Review 9.  Recent advances in the study of fine-scale population structure in humans.

Authors:  John Novembre; Benjamin M Peter
Journal:  Curr Opin Genet Dev       Date:  2016-09-20       Impact factor: 5.578

Review 10.  Evolutionary and population (epi)genetics of immunity to infection.

Authors:  Luis B Barreiro; Lluis Quintana-Murci
Journal:  Hum Genet       Date:  2020-04-13       Impact factor: 4.132

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.