Literature DB >> 31566222

GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals.

Dongmei Tian1,2, Pei Wang1,2,3, Bixia Tang1,2, Xufei Teng1,2,3, Cuiping Li1,2, Xiaonan Liu1,2,4, Dong Zou1,2, Shuhui Song1,2,3,5, Zhang Zhang1,2,3,4,5.   

Abstract

GWAS Atlas (https://bigd.big.ac.cn/gwas/) is a manually curated resource of genome-wide variant-trait associations for a wide range of species. Unlike existing related resources, it features comprehensive integration of a high-quality collection of 75 467 variant-trait associations for 614 traits across 7 cultivated plants (cotton, Japanese apricot, maize, rapeseed, rice, sorghum and soybean) and two domesticated animals (goat and pig), which were manually curated from 254 publications. We integrated these associations into GWAS Atlas and presented them in terms of variants, genes, traits, studies and publications. More importantly, all associations and traits were annotated and organized based on a suite of ontologies (Plant Trait Ontology, Animal Trait Ontology for Livestock, etc.). Taken together, GWAS Atlas integrates high-quality curated GWAS associations for animals and plants and provides user-friendly web interfaces for data browsing and downloading, accordingly serving as a valuable resource for genetic research of important traits and breeding application.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2020        PMID: 31566222      PMCID: PMC6943065          DOI: 10.1093/nar/gkz828

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Genome-wide association study (GWAS) is a key technique for exploiting the genetic basis of complex traits and diseases by detecting genotype-phenotype associations in a group of individuals or natural inbred lines (1). As a result, GWAS has been widely applied in model organisms, primarily in human (2,3) and Arabidopsis (4). Nowadays, with the rapid development of phenotyping and genotyping technologies, a number of high-quality genotype-phenotype associations have also been identified in plants and animals, including maize, rice, sorghum, cotton, soybean, goat, pig, etc. (5,6). Taking corn as an example, diverse traits ranging from molecular to cellular (i.e. metabolites) and from individual morphological scale (agronomic, yield or reproductive characteristics) to the interaction with different environmental factors (biotic or abiotic stress tolerance), have been comprehensively studied and associated with different genetic variants (5,7). Therefore, a resource that integrates GWAS associations, as well as their associated information, is of fundamental significance to systematically capture the whole picture of genotype-phenotype associations and improve our understanding of genetic architecture of complex quantitative traits. Over the past decade, several resources, such as GWASdb (8,9), GWAS Catalog (3,10,11), GWASCentral (12) and AraGWAS Catalog (13), have been developed to provide publicly available GWAS associations. Among them, GWAS Catalog and AraGWAS Catalog are two popular representatives. GWAS Catalog, a dedicated resource for human that includes a total of 149 855 significant associations (P < 1 × 10-5) derived from 7230 studies in 4085 publications (as of August 2019) (3), has been widely used by the global community for deciphering genetic basis and molecular mechanisms of human phenotypes and diseases. AraGWAS Catalog (13), a curated and standardized GWAS associations database for Arabidopsis, has integrated more than 167 traits and 222 000 variant-trait associations, making it an important and useful resource for the Arabidopsis community. Despite this, existing resources focused mainly on human and other model organisms and did not comprehensively integrate GWAS associations identified in both crops and domesticated animals. Therefore, a specialized resource that houses comprehensive GWAS associations as well as their associated information in a variety of plants and animals is highly desirable. Here, we present GWAS Atlas (https://bigd.big.ac.cn/gwas), a curated resource incorporating high-quality GWAS associations with particular focus on plants and animals. In the current release, it houses a total of 75 467 GWAS associations in seven cultivated plants and two domesticated animals manually curated from 254 publications. Moreover, all associations and traits are annotated and organized based on a suite of ontologies. To facilitate data access and query, GWAS Atlas is equipped with friendly web interfaces to provide genetic markers and genes that have significant associations (P < 10-3) with specific traits of interest. Thus, GWAS Atlas would function as a valuable resource for studying complex agronomic traits and conducting genomic breeding applications in plants and animals.

IMPLEMENTATION

GWAS Atlas is implemented using MySQL (http://www.mysql.org; a free and popular relational database management system) and Apache Tomcat Server (http://tomcat.apache.org; an open source software implementation of Java Servlet and Java Server Pages). Web user interfaces are developed using JSP (Java Server Pages; a technology facilitating rapid development of dynamic web pages based on the Java programming language), HTML5, CSS3, AJAX (Asynchronous JavaScript and XML; a set of web development techniques to create asynchronous applications without interfering with the display and behavior of the existing page), JQuery (a cross-platform and feature-rich JavaScript library; http://jquery.com, version 3.2.1) as well as BootStrap (an open source toolkit for developing web projects with HTML, CSS and JS; https://getbootstrap.com, version 3.3.7). For dynamic data visualization, ECharts (a declarative framework for rapid construction of web-based visualization; http://echarts.baidu.com, version 4.1.0) is adopted to generate interactive charts.

DATA CURATION

To provide high-quality information curated from GWAS publications, we set up a standardized curation process involving four major steps, viz., literature search, information retrieval, integration & annotation and database construction (Figure 1). Briefly speaking, first, we perform literature search in PubMed using species name and GWAS as keywords and accordingly obtain a total of 1850 publications. Among them, 1767 publications published after 2009 are retained. Publications are eligible for inclusion in GWAS Atlas if they contain significant GWAS associations with necessary description on biological traits. Consequently, a total of 254 publications are qualified and their basic bibliographic information (e.g. title, journal, year, citation) are automatically obtained through Europe PMC API (https://europepmc.org/developers/) (14,15). Then, we manually curate the study and genotype-to-phenotype (G2P) association information from publications. As one publication may contain multiple studies with different experimental designs, we record species name, sampling spot, year, condition, population, sample size, genotyping technology, association model, association number, and PMID for each study. Regarding GWAS association, we collect species name, genome version, genomic position, variant ID, traits, GWAS association P-value, R2 and mapped genes. Considering the possible inconsistency of reference genome versions used in different studies, all genomic variants for a given species are annotated based on the latest version. Finally, to unify the representation of biological traits, trait entities are mapped to a suite of reference ontologies (PTO, Plant Trait Ontology; ATOL, Animal Trait Ontology for Livestock) (16–18) and species-specific ontology (CO, Crop Ontology) (19) using the ‘term search’ in Planteome API and Livestock Ontologies. Since not all curated traits are included in existing ontologies, we additionally establish PPTO (Plant Phenotype and Trait Ontology) and APTO (Animal Phenotype and Trait Ontology) by integrating more comprehensive terms based on Open Biological and Biomedical Ontologies (OBO) format.
Figure 1.

Data curation process adopted by GWAS Atlas.

Data curation process adopted by GWAS Atlas.

DATABASE CONTENT AND USAGE

Based on the standardized curation process, GWAS Atlas integrates a high-quality collection of 75 467 variant-trait associations for 614 traits across seven cultivated plants and two domesticated animals, which were manually curated from 254 publications (Table 1). To facilitate users in browsing these data, GWAS Atlas provides eight modules, where data are organized and presented in terms of species, associations, variants, genes, traits, ontologies, studies and publications, respectively.
Table 1.

Data statistics in GWAS Atlas (as of 10 August 2019)

Species# Publications# Studies# Traits# Associations# Variants# Genes
Cotton (Gossypium hirsutum)4232421 9556115991
Japanese apricot (Prunus mume)12917401432541
Maize (Zea mays)8630820528 31022 96913 654
Rape seed (Brassica napus)268967239617091791
Rice (Oryza sativa)9645628419 52414 52812 803
Sorghum (Sorghum bicolor)135241754652688
Soybean (Glycine max)154640422337354
Goat (Capra hircus)461240409
Pig (Sus scrofa)91035326284122
Data statistics in GWAS Atlas (as of 10 August 2019) The ‘Associations’ module provides a comprehensive overview of associated hits (P < 10-3) for each species, where data are organized in a tabular table (Figure 2A) and displayed in a heatmap at a chromosome-scale (Figure 2B). Each association contains its variant ID, chromosomal position, consequence type(s), mapped gene(s), associated trait, P-value and PMID. And links to external web sites are provided to help users easily access more details about variants and genes. Besides, a set of filters in light of P-value, trait and variant position are provided to help users narrow down the browsing list. Moreover, associations could also be sorted by a variety of keywords, including VarID, traits, species, P-Value and R2%. In the ‘Variants’ (Figure 2C) and ‘Genes’ modules (Figure 2D), we summarize all associated hits detected in non-redundant variants and genes (or in close proximity to genes) and group the results by variant ID and gene name, respectively. Detailed information can be obtained by clicking on any variant ID or gene ID. Through sorting by trait or association count, users can swiftly get shared genes or sites with multiple different effects.
Figure 2.

Screenshots for Associations (A), GWAS heatmap across chromosomes (B), Variants (C) and Genes (D). Variant-trait associations in cotton are used to depict the GWAS heatmap, where each row represents a trait and each dot illustrates an associated hit. The log(P) is adopted to reflect the statistical significance.

Screenshots for Associations (A), GWAS heatmap across chromosomes (B), Variants (C) and Genes (D). Variant-trait associations in cotton are used to depict the GWAS heatmap, where each row represents a trait and each dot illustrates an associated hit. The log(P) is adopted to reflect the statistical significance. The ‘Traits’ module provides an overview of all collected traits in GWAS Atlas. For each trait, both general details (e.g. trait label, trait ID, description) and summary information (e.g. association, study, publication) are recorded in the trait table (Figure 3A). As one trait may have multiple associated variants and genes, users could easily access these data by clicking on their hyperlinks. Besides, these traits are sortable by trait label, trait ID and numbers of associations, studies and publications. According to the current collection in GWAS Atlas (as of August 2019), one of the most extensively studied traits to date is plant height, involving 68 studies, 4238 variants, 2207 genes and 4316 associations. Moreover, we provide an interactive dynamic visualization for each species, to display the count distribution of sub-traits and their corresponding associations.
Figure 3.

Screenshots for Traits (A), Ontologies (B), Studies (C) and Publications (D).

Screenshots for Traits (A), Ontologies (B), Studies (C) and Publications (D). To facilitate structured querying and visualization, we also develop the ‘Ontologies’ module (Figure 3B) to organize all traits based on a suite of ontologies: PTO, CO and PPTO for plants, and ATOL and APTO for animals (see details in Data Curation). In each ontology module, traits with associations (the number of associations shown in bracket) are displayed in hierarchical structure on the left panel, where users can explore the ontology hierarchy and associated data using the ‘drill-down’ browser. When a trait term is selected, basic descriptive information on trait, association, study and publication will be automatically mapped and displayed on the right panel, where users could view the detailed information for different species. Therefore, the mapping between GWAS traits and ontology terms in GWAS Atlas would be very useful for identifying new potential genetic variants by providing all related associations across different species. The ‘Studies’ module displays an overview of all GWAS studies, involving an abundant collection of related information that includes study population, sample size, sampling spot, condition, etc. (Figure 3C). Additionally, for each publication, its bibliographic details (e.g. title, year, journal, PubMed ID, citation) are collectively summarized in the ‘Publications’ module (Figure 3D). In all modules, hyperlinks to external databases, such as GVM (20) in BIGD (21), NCBI gene, PubMed, PTO (16) are provided to offer convenient access to additional information. In addition, to ease data downloading, all query results that are displayed on web pages can be exported as a tab-delimited file (MS-Excel, CSV, TXT) or a JSON-format file. To fully benefit the global scientific community, all relevant data in GWAS Atlas are open access and publicly available at https://bigd.big.ac.cn/gwas/downloads.

DISCUSSION AND FUTURE DIRECTIONS

GWAS Atlas incorporates a large number of high-quality variant-trait associations in multiple plants and animals through manual curation. It equips with friendly web interfaces for browse, search and visualization, thus enabling users to easily maneuver the GWAS associations and uncover molecular mechanisms underlying complex traits. Therefore, GWAS Atlas would be helpful for fully capturing pleiotropic loci and better understanding the similarities and differences in genetic mechanisms between different species and/or traits. So far, the current release of GWAS Atlas includes 9 species, 614 traits and 75 467 variant-trait associations. Accordingly, future directions are integration of more GWAS findings from a broader range of species and continuously updating associations as well as related information. To further evaluate the effect of variants on other biological processes and facilitate the discovery of potential molecular mechanism for these variant sites, we plan to add additional curated information (such as RNA binding protein and microRNA binding site). Besides, as several ontologies used in this study have not been determined their parent-child relationships, we will further optimize and improve their relationships according to the ontology standards and by reference of PTO and ATO. Moreover, we plan to develop the submission functionality to allow users to submit their own data. Meanwhile, we call for worldwide collaborations to work together to build GWAS Atlas into a valuable resource covering more comprehensive associations and traits across a wider range of plants and animals.
  21 in total

1.  Contributions to an animal trait ontology.

Authors:  B Hulsegge; M A Smits; M F W te Pas; H Woelders
Journal:  J Anim Sci       Date:  2012-01-06       Impact factor: 3.159

2.  Complement factor H polymorphism in age-related macular degeneration.

Authors:  Robert J Klein; Caroline Zeiss; Emily Y Chew; Jen-Yue Tsai; Richard S Sackler; Chad Haynes; Alice K Henning; John Paul SanGiovanni; Shrikant M Mane; Susan T Mayne; Michael B Bracken; Frederick L Ferris; Jurg Ott; Colin Barnstable; Josephine Hoh
Journal:  Science       Date:  2005-03-10       Impact factor: 47.728

3.  Animal trait ontology: The importance and usefulness of a unified trait vocabulary for animal species.

Authors:  L M Hughes; J Bao; Z-L Hu; V Honavar; J M Reecy
Journal:  J Anim Sci       Date:  2008-02-13       Impact factor: 3.159

Review 4.  Crop genome-wide association study: a harvest of biological relevance.

Authors:  Hai-Jun Liu; Jianbing Yan
Journal:  Plant J       Date:  2018-12-17       Impact factor: 6.417

5.  Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the Crop Ontology developed by the crop communities of practice.

Authors:  Rosemary Shrestha; Luca Matteis; Milko Skofic; Arllet Portugal; Graham McLaren; Glenn Hyman; Elizabeth Arnaud
Journal:  Front Physiol       Date:  2012-08-25       Impact factor: 4.566

6.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

Authors:  Annalisa Buniello; Jacqueline A L MacArthur; Maria Cerezo; Laura W Harris; James Hayhurst; Cinzia Malangone; Aoife McMahon; Joannella Morales; Edward Mountjoy; Elliot Sollis; Daniel Suveges; Olga Vrousgou; Patricia L Whetzel; Ridwan Amode; Jose A Guillen; Harpreet S Riat; Stephen J Trevanion; Peggy Hall; Heather Junkins; Paul Flicek; Tony Burdett; Lucia A Hindorff; Fiona Cunningham; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

Review 7.  Chapter 11: Genome-wide association studies.

Authors:  William S Bush; Jason H Moore
Journal:  PLoS Comput Biol       Date:  2012-12-27       Impact factor: 4.475

8.  GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies.

Authors:  Tim Beck; Robert K Hastings; Sirisha Gollapudi; Robert C Free; Anthony J Brookes
Journal:  Eur J Hum Genet       Date:  2013-12-04       Impact factor: 4.246

9.  The AraGWAS Catalog: a curated and standardized Arabidopsis thaliana GWAS catalog.

Authors:  Matteo Togninalli; Ümit Seren; Dazhe Meng; Joffrey Fitz; Magnus Nordborg; Detlef Weigel; Karsten Borgwardt; Arthur Korte; Dominik G Grimm
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

10.  Database Resources of the BIG Data Center in 2019.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  26 in total

1.  Database Resources of the National Genomics Data Center in 2020.

Authors: 
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

Review 2.  Rice functional genomics: decades' efforts and roads ahead.

Authors:  Rongzhi Chen; Yiwen Deng; Yanglin Ding; Jingxin Guo; Jie Qiu; Bing Wang; Changsheng Wang; Yongyao Xie; Zhihua Zhang; Jiaxin Chen; Letian Chen; Chengcai Chu; Guangcun He; Zuhua He; Xuehui Huang; Yongzhong Xing; Shuhua Yang; Daoxin Xie; Yaoguang Liu; Jiayang Li
Journal:  Sci China Life Sci       Date:  2021-12-07       Impact factor: 6.038

3.  Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022.

Authors: 
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

4.  webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study.

Authors:  Chen Cao; Jianhua Wang; Devin Kwok; Feifei Cui; Zilong Zhang; Da Zhao; Mulin Jun Li; Quan Zou
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

Review 5.  Seeing the forest through the trees: prioritising potentially functional interactions from Hi-C.

Authors:  Ning Liu; Wai Yee Low; Hamid Alinejad-Rokny; Stephen Pederson; Timothy Sadlon; Simon Barry; James Breen
Journal:  Epigenetics Chromatin       Date:  2021-08-28       Impact factor: 4.954

Review 6.  Genome-Wide Association Study Statistical Models: A Review.

Authors:  Mohsen Yoosefzadeh-Najafabadi; Milad Eskandari; François Belzile; Davoud Torkamaneh
Journal:  Methods Mol Biol       Date:  2022

7.  Integrating GWAS, linkage mapping and gene expression analyses reveals the genetic control of growth period traits in rapeseed (Brassica napus L.).

Authors:  Tengyue Wang; Lijuan Wei; Jia Wang; Ling Xie; Yang Yang Li; Shuyao Ran; Lanyang Ren; Kun Lu; Jiana Li; Michael P Timko; Liezhao Liu
Journal:  Biotechnol Biofuels       Date:  2020-08-03       Impact factor: 6.040

Review 8.  Pathway analysis for genome-wide genetic variation data: Analytic principles, latest developments, and new opportunities.

Authors:  Micah Silberstein; Nicholas Nesbit; Jacquelyn Cai; Phil H Lee
Journal:  J Genet Genomics       Date:  2021-02-26       Impact factor: 4.275

9.  Emerging issues in genomic selection.

Authors:  Ignacy Misztal; Ignacio Aguilar; Daniela Lourenco; Li Ma; Juan Pedro Steibel; Miguel Toro
Journal:  J Anim Sci       Date:  2021-06-01       Impact factor: 3.159

Review 10.  The 27th annual Nucleic Acids Research database issue and molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.