| Literature DB >> 30353151 |
Paul Lacaze1, Mark Pinese2,3, Warren Kaplan4, Andrew Stone4, Marie-Jo Brion4, Robyn L Woods5, Martin McNamara6, John J McNeil5, Marcel E Dinger4,7, David M Thomas4,7.
Abstract
Allele frequency data from human reference populations is of increasing value for the filtering, interpretation, and assignment of pathogenicity to genetic variants. Aged and healthy populations are more likely to be selectively depleted of pathogenic alleles and therefore particularly suitable as a reference population for the major diseases of clinical and public health importance. However, reference studies of confirmed healthy elderly individuals have remained under-represented in human genetics. Here we describe the Medical Genome Reference Bank (MGRB), a large-scale comprehensive whole-genome data set of healthy elderly individuals. The MGRB provides an accessible data resource for health-related research and clinical genetics and a powerful platform for studying the genetics of healthy ageing. The MGRB is comprised of 4000 healthy, older individuals, mostly of European descent, recruited from two Australian community-based cohorts. Each participant lived ≥70 years with no reported history of cancer, cardiovascular disease, or dementia. DNA derived from blood samples has been subject to whole-genome sequencing. The MGRB has committed to a policy of data sharing, employing a hierarchical data management system to maintain participant privacy and confidentiality, while maximising research and clinical usage of the database. The MGRB represents a resource of international significance, which will be made broadly accessible to the clinical and genetic research community.Entities:
Mesh:
Year: 2018 PMID: 30353151 PMCID: PMC6336775 DOI: 10.1038/s41431-018-0279-z
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Features of human genetic reference populations (according to public domain websites and peer-reviewed literature, February 2018)
| MGRB | ExAC [ | GnomAD [ | UKBB SNPs [ | HLI - JCVI [ | Wellderly STSI [ | SweGen [ | HGVD [ | |
|---|---|---|---|---|---|---|---|---|
| Approx. cohort size (Feb 2018) | 4000 | 60,000 | 140,000 | 500,000 | 10,000 | 600 | 1000 | 3200 |
| Purpose-built cohort (versus data aggregation) | ✓ | X | X | ✓ | ✓ | ✓ | ✓ | ✓ |
| Whole genome sequencing | ✓ | X | ✓ | X | ✓ | ✓ | ✓ | X |
| Ability to detect complex and SV | ✓ | X | ✓ | X | ✓ | X | ✓ | X |
| Phenotype data to confirm absence of disease | ✓ | X | X | ✓ | ✓ | ✓ | X | ? |
| Confirmed healthy elderly population | ✓ | X | X | X | X | ✓ | X | X |
| Allele frequencies made readily accessible | ✓ | ✓ | ✓ | X | X | X | ✓ | ✓ |
| Formal data access and approval policy | ✓ | X | X | ✓ | X | X | ✓ | X |
| Access provided to individual VCFs | ✓ | X | X | X | X | X | X | X |
| ✓ | ✓ | ✓ | ✓ | ✓ | X | X | X | |
| Consistent and compatible seq. technology | ✓ | ✓ | ✓ | X | ✓ | X | ✓ | ✓ |
MGRB Medical Genome Reference Bank, ExAC Exome Aggregation Consortium, GnomAD Genome Aggregation Database, UKBB SNPs U.K. Biobank SNP data set, HLI-JCVI Human Longevity Inc - J. Craig Venter Institute, STSI Wellderly Scripps Translational Science Institute Wellderly study, SweGen Swedish Genome reference population project, HGVD Human Genetic Variation Database (Japan)
Fig. 1The Medical Genome Reference Bank: Project overview
MGRB summary demographics, by cohort
| 45 and Up | ASPREE | |
|---|---|---|
| Year of birth | ||
| 1910–1915 | 0 | 0 |
| 1915–1920 | 2 | 5 |
| 1920–1925 | 11 | 89 |
| 1925–1930 | 79 | 490 |
| 1930–1935 | 108 | 1388 |
| 1935–1940 | 181 | 1153 |
| 1940–1945 | 356 | 0 |
| 1945–1950 | 77 | 0 |
| Sex | ||
| Female | 492 (60.4%) | 1653 (52.9%) |
| Male | 322 (39.6%) | 1472 (47.1%) |
| Age at last follow-up (years; ~approx. 2016) | ||
| 70–75 | 324 | 0 |
| 75–80 | 235 | 349 |
| 80–85 | 132 | 1778 |
| 85–90 | 83 | 787 |
| 90–95 | 38 | 192 |
| 95–100 | 2 | 19 |
MGRB timeline for whole-genome sequencing and data release
| MGRB | WGS target sample number | Progress timeline H1 = first half of year, H2 = second half of year | ||
|---|---|---|---|---|
| Sequencing completion | Tier 1 open data release | Tier 2/3 approval | ||
| Phase I | 1500 | H2 2016 | H2 2016 | H1 2017 |
| Phase II | 3000 | H2 2017 | H1 2018 | H2 2018 |
| Phase III | 4000 | H2 2018 | H1 2019 | H2 2019 |
MGRB tiered Data Access Policy
| Tier | 1. Open Access | 2. Controlled Access | 3. Restricted Access |
|---|---|---|---|
| Access | Institutional email address required for MGRB data portal access (not required for Beacon) ( | Data Access Application (DAA) must be approved by the MGRB Data Access Committee (DAC) | DAA must be approved by the MGRB DAC and referred to the applicable cohort governing body for further approval |
| Clinical data | Basic demographic data are provided, genomic queries can be filtered according to these fields | Basic demographic data and minimal clinical information (where available) are provided per individual record | Comprehensive clinical data that is potentially specific to a participating cohort is provided per individual record |
| Genomic data | Beacon and pre-processed variant frequencies | Individual record data provided—either processed (VCF/gVCF format) or unprocessed (FASTQ or BAM format) (dependent on justification criteria being met) |
Fig. 2MGRB database functionality and Vectis platform
Features of the Vectis open-source analytical framework
| Feature | Description |
|---|---|
| Secure login | Two-factor authentication |
| Search | Querying of cohorts using chromosomal coordinates and gene annotations |
| Beacon | Integrated with the Global Alliance for Genomics and Health Beacon Network [ |
| Explore function | Highly interactive, low latency exploration of cohorts. Explore currently supports the querying of 84 million variants in real time |
| Interactive graphics | Including lollipop plots of allelic frequencies as well as gene transcripts |
| Variant annotations | Including links out to the original supporting evidence |
| Scalable variant store | Enables authorised users to subset patients based on clinical attributes and query actual genotypes at the individual patient level |