| Literature DB >> 36082306 |
Jacqueline A L MacArthur1,2, Annalisa Buniello1, Laura W Harris1, James Hayhurst1, Aoife McMahon1, Elliot Sollis1, Maria Cerezo1, Peggy Hall3, Elizabeth Lewis1, Patricia L Whetzel1, Orli G Bahcall4, Inês Barroso5, Robert J Carroll6, Michael Inouye7,8,9, Teri A Manolio3, Stephen S Rich10, Lucia A Hindorff3, Ken Wiley3, Helen Parkinson1.
Abstract
Genome-wide association studies (GWASs) have enabled robust mapping of complex traits in humans. The open sharing of GWAS summary statistics (SumStats) is essential in facilitating the larger meta-analyses needed for increased power in resolving the genetic basis of disease. However, most GWAS SumStats are not readily accessible because of limited sharing and a lack of defined standards. With the aim of increasing the availability, quality, and utility of GWAS SumStats, the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog organized a community workshop to address the standards, infrastructure, and incentives required to promote and enable sharing. We evaluated the barriers to SumStats sharing, both technological and sociological, and developed an action plan to address those challenges and ensure that SumStats and study metadata are findable, accessible, interoperable, and reusable (FAIR). We encourage early deposition of datasets in the GWAS Catalog as the recognized central repository. We recommend standard requirements for reporting elements and formats for SumStats and accompanying metadata as guidelines for community standards and a basis for submission to the GWAS Catalog. Finally, we provide recommendations to enable, promote, and incentivize broader data sharing, standards and FAIRness in order to advance genomic medicine.Entities:
Year: 2021 PMID: 36082306 PMCID: PMC9451133 DOI: 10.1016/j.xgen.2021.100004
Source DB: PubMed Journal: Cell Genom ISSN: 2666-979X
Figure 1.Workshop attendees
(A and B) Breakdown of workshop attendees by stakeholder category (A) and planned uses of GWAS SumStats (B) for 37 workshop attendees (35 for planned uses) who completed the pre-workshop survey. Attendees were able to select multiple stakeholder categories and planned uses.
Recommended standard reporting elements for GWAS SumStats
| Data element | Column header | Mandatory/Optional |
|---|---|---|
|
| ||
| variant id | variant_id | One form of variant ID is mandatory, either rsID or chromosome, base pair location, and genome build[ |
| chromosome | chromosome | |
| base pair location | base_pair_location | |
| p value | p_value | Mandatory |
| effect allele | effect_allele | Mandatory |
| other allele | other_allele | Mandatory |
| effect allele frequency | effect_allele_frequency | Mandatory |
| effect (odds ratio or beta) | odds_ratio or beta | Mandatory |
| standard error | standard_error | Mandatory |
| upper confidence interval | ci_upper | Optional |
| lower confidence interval | ci_lower | Optional |
Data elements have been recommended as mandatory if >50% of pre-workshop survey respondents indicated that preference.
We agreed that other variant ID formats should be supported. Implementation of those standards will be addressed by the working group “Data Content and Format.”
FAIR indicators
| Core FAIR principle | FAIR principle | FAIR indicator |
|---|---|---|
|
| ||
| Findable | F1. (meta)data are assigned a globally unique and persistent identifier | Each GWAS is assigned a unique identifier that can be resolved externally through |
| F2. data are described with rich metadata (defined by R1 below) | Each GWAS is described by the metadata elements listed in “Proposed metadata standard reporting elements”[ | |
| F3. metadata clearly and explicitly include the identifier of the data it describes | Metadata include the accession ID and are linked to the GWAS SumStats they describe | |
| F4. (meta)data are registered or indexed in a searchable resource | GWAS is searchable in the GWAS Catalog by accession ID, trait, publication, author, or locus (variant, gene, cytogenetic, or chr:bp-bp region) | |
| Accessible | A1. (meta)data are retrievable by their identifier using a standardized communications protocol | Metadata can be easily viewed on the GWAS Catalog web interface, with a specific page for each GWAS, accessible through a stable URL, which includes the accession ID, with a download link for the SumStats Metadata that can be retrieved from the GWAS Catalog’s REST API ( |
| A1.1 the protocol is open, free, and universally implementable | The GWAS Catalog ( | |
| A1.2 the protocol allows for an authentication and authorization procedure, where necessary | Not applicable | |
| A2. metadata are accessible, even when the data are no longer available | Metadata will remain accessible via the accession ID, even if the SumStats are no longer available. Archived versions of GWAS Catalog metadata are available | |
| Interoperable | I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation | Metadata are accessible from the GWAS Catalog REST API ( |
| I2: (Meta)data use vocabularies that follow the FAIR principles | Traits are represented using Experimental Factor Ontology[ | |
| I3. (meta)data include qualified references to other (meta) data | Links are provided to relevant external data, e.g., Europe PMC, and to relevant GWAS Catalog data, e.g., the trait and publication pages | |
| Reusable | R1. (meta)data are richly described with a plurality of accurate and relevant attributes | Each GWAS is described by the metadata elements listed in “Proposed metadata standard reporting elements”[ |
| R1.1. (meta)data are released with a clear and accessible data usage license | All GWAS Catalog data are made available through EMBL-EBI’s standard terms of use ( | |
| R1.2. (meta)data are associated with detailed provenance | Each GWAS is linked to a source publication that can be accessed by either a digital object identifier (DOI) or ID (PMID) | |
| R1.3. (meta)data meet domain relevant community standards | Metadata and SumStats are made available using the standards agreed to in this workshop[ | |
Our recommended FAIR indicators for GWAS SumStats. We list each core FAIR principle and the associated indicators and provide examples of how they are implemented in the GWAS Catalog.
This indicator is not currently met in full by the GWAS Catalog. The data standards agreed to in this workshop require extensions or modifications to GWAS Catalog data content or formats, which we plan to implement soon.