Literature DB >> 30561546

Seave: a comprehensive web platform for storing and interrogating human genomic variation.

Velimir Gayevskiy¹, Tony Roscioli^2,3,4, Marcel E Dinger^1,5,6, Mark J Cowley^1,5,7.

Abstract

Motivation: Genome sequencing has had a remarkable impact on our ability to study the effects of human genetic variation, however, variant interpretation remains the major bottleneck. Understanding the potential impact of variants, including structural variants, requires extensive annotation from disparate sources of knowledge, and in silico prediction algorithms.
Results: We introduce Seave, an intuitive web platform that enables all types of variants to be securely stored, annotated and filtered. Variants are annotated with allele frequencies and pathogenicity assessments from many popular databases and in silico pathogenicity prediction scores. Seave enables filtering of variants with specific inheritance patterns, including somatic variants, by quality, allele frequencies and gene lists which can be curated and saved. Seave was made for whole genome data and is capable of storing and querying copy number and structural variants. Availability and implementation: To demo Seave with public data, see https://www.seave.bio. Source code is available at http://code.seave.bio and extensive documentation is available at http://documentation.seave.bio. Seave can be locally installed on an Apache server with PHP and MySQL, or we provide an Amazon Machine Image for quick deployment. For commercial and clinical diagnostic licensing, contact the corresponding author. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30561546 PMCID： PMC6298057 DOI： 10.1093/bioinformatics/bty540

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

The rapid adoption of human genome sequencing has made substantial inroads in our understanding of the impact of genetic variation on health and disease (Delaney ). Vast catalogues of genetic variants now exist, tens of thousands of which have been unequivocally linked to disease. Through advances in genomic technologies and bioinformatic methodologies, it is feasible to comprehensively identify all classes of genomic variation within an individual’s genome, ranging from single nucleotide variants (SNVs) and short insertions or deletions (Indels), to large copy number variants (CNVs), structural variants (SVs) and mobile element insertions (MEIs), any of which may contribute to their phenotype. Interpreting the potential impact of any variant is a difficult task (Amendola ), and is a major impediment to widespread adoption of genomic medicine. Variant interpretation requires an assessment of data quality, and investigating dozens of resources, including databases of genomic variation in healthy controls, from patients with disease, resources linking genes to phenotype or disease and the latest literature. All of this information must be kept up to date. Interpreting the impact of novel variants can be supported by potentially dozens of in silico pathogenicity scores. CNVs of any size and SVs are important sources of pathogenic variants, and should be considered alongside short variants. Importantly, this genomic complexity must be distilled, and presented in a way which is accessible to all researchers, clinicians and laboratory staff. To address these challenges, we developed Seave, a web-based variant filtration platform that stores, queries and annotates genomic variation of all sizes. It is designed for clinicians and researchers, primarily for rare disease and cancer, and requires no knowledge of bioinformatics to use.

2 Seave description

2.1 Scope

Seave was designed from the outset to handle whole-genome-sized variant callsets from tens of thousands of patients, and ably supports data from any sized targeted sequencing panel. Seave supports the following classes of genetic variants: SNVs, Indels, CNVs, SVs and runs of homozygosity (ROH), from the nuclear and mitochondrial genomes. Due to the large file sizes from whole-genome sequencing (i.e. ∼5M variants per patient, and >3 Gb compressed VCF files), Seave is designed to automatically receive data generated by production analysis pipelines via an API (Fig. 1).

Fig. 1.

Schematic overview of Seave’s main features. Arrows represent the flow of information through sequencing, variant detection, data storage, filtration, annotation, interpretation and outcomes

2.2 Data import and annotation

Seave uses GEMINI (Paila ) databases to store, manage and query genome data, where each database represents a cohort (Supplementary Fig. S1). GEMINI databases are portable, convenient, sharable and allow Seave to scale to vast numbers of individuals by simply increasing disk space. Users can group samples within a cohort into families, and annotate individuals with their gender and affected status, interactively, or via a PED file. Importing data starts with annotating VCF files containing SNVs and Indels from one or more individuals using Variant Effect Predictor (VEP; McLaren ) or SnpEff (Cingolani ), then converting these into a GEMINI database. These databases are then imported into Seave using an API, or the administration interface. Seave manages a MySQL annotation database, which complements the annotations from VEP and GEMINI, to provide pre-computed, extensive in silico prediction scores, gene-phenotype-disease links and allele frequencies in healthy controls and various diseases (Fig. 1, Supplementary Data). To keep these annotations up to date, we provide tools for updating those annotations which are regularly updated. Seave supports many popular variant callers including GATK HaplotypeCaller (McKenna ), FreeBayes (Garrison and Marth, 2012) and somatic variant callers including Strelka (Saunders ), MuTect2 (Cibulskis ) and VarDict (Lai ). Seave currently supports only the GRCh37 (hg19) reference genome. To support large CNVs and SVs, we developed the Genome Block Store (GBS). The GBS is a scalable MySQL database designed to store large genomic segments or blocks (e.g. deletions, duplications, inversions, MEIs or ROH), or linked blocks (e.g. gene fusion breakpoints), with additional annotations (e.g. copy number or breakpoint read depth). A number of popular tools are supported, including CNVnator (Abyzov ), LUMPY (Layer ), Sequenza (Favero ), ClinSV (Minoche et al., manuscript in preparation), Manta (Chen ), CNVkit (Talevich ) and ROHmer (Puttick et al., manuscript in preparation).

2.3 Filtering

After selecting a cohort for analysis (Supplementary Fig. S1), users can optionally select to filter short variants by inheritance pattern, including heterozygous dominant, homozygous recessive, de novo dominant and compound heterozygous (Supplementary Fig. S2). On the Query page, the genomic search space can be restricted or specifically excluded by using any number of genomic coordinates, curated gene lists from Seave’s gene list management system, or custom gene symbols (Supplementary Fig. S3). Variants can be restricted by their impact: low (e.g. synonymous), medium (e.g. missense), high (e.g. nonsense, frameshift and essential splice region), or coding, by CADD score (Kircher ) and population allele frequencies. Technical filtering parameters include minimum sequencing depth in all samples, minimum variant quality, excluding failed variants and the type and number of variants to return. A typical family trio sequenced by whole genome sequencing yields 6 million variants and this number is rapidly reduced to below 200 by just filtering on rarity, impact and inheritance pattern (Supplementary Fig. S4). Most queries take 0–5 s to execute but this can stretch up to 2 min for large cohorts of whole genomes with inheritance patterns specified. Variants that pass all filters are displayed in a dynamic table, with the extensive annotations noted in Fig. 1 and hyperlinks where applicable (Supplementary Fig. S5). The Impact Summary column visually summarizes the pathogenicity evidence relating to a variant and its cognate gene (Fig. 1, Supplementary Fig. S5). Annotations place variants in the context of the functional genome, and can be dynamically shown using toggle buttons (Supplementary Fig. S6), and sorted by strength of evidence. A unique strength of Seave is that short variants that overlap CNVs or SVs from the same individual in the GBS are highlighted, allowing variants and CNVs to be jointly interrogated. Hyperlinks to control an IGV session are also provided. Results can be shared via their URL, or downloaded to TSV, which includes important auditing information, including timestamps, exact queries used and the versions of all annotations. There are a number of dedicated queries to interrogate CNVs and SVs, which partially rely on BEDTools (Quinlan and Hall, 2010) to perform interval querying logic. CNVs or SVs can be restricted by gene lists or genomic coordinates, copy number thresholds and minimum CNV size. The SV Fusions search mode is a powerful way to identify candidate gene fusions due to CNV or SV. The Method Overlaps query allows CNVs or SVs identified by multiple callers to be prioritized, whereas the Sample Overlaps query allows CNVs or SVs segregating in families to be prioritized. Finally, the ROHmer query is useful for variant filtering in consanguineous families, and identifies genomic regions of homozygosity that are shared by all affected individuals in a family but not by any unaffected individuals.

2.4 Sharing and security

Seave has a user management system, allowing fine grained data sharing control. Databases are owned by a single group, and users can be members of any number of groups. Administrators can import data, manage users, groups, databases, GBS data and custom gene lists. All login events, data import/export, queries, gene list and user or group changes are audited, and all data is transferred and stored using encryption. Seave is written in PHP to be run on an Apache web server with a MySQL database.

3 Conclusion

Seave was built to enable gene discovery research, diagnostics and precision cancer medicine from whole genome sequences. As a component of Australia’s first clinically accredited whole genome pathology service, Seave has met the rigorous demands of ISO 15189 clinical accreditation. In a research setting, it has been successfully used to discover novel disease genes and variants, as well as rapidly supporting the diagnosis of patients with previously reported pathogenic variants (Balasubramaniam ,b; De Sousa ; Ewans ; Heimer ; Kumar ; Riley ). Cancer research requires the comprehensive interrogation of large numbers of somatic variants of all sizes and types at differing variant allele frequencies. Accordingly, Seave has been used for characterizing tumour evolution (Merlevede ) and as part of two precision cancer genomics programs: the Lions Kids Cancer Genome Project (LKCGP), as part of the Zero Childhood Cancer Program for children with high-risk cancers using whole genome sequencing and for the Molecular Screening and Therapeutics (MoST) program using a targeted genomics screen to test targeted anti-cancer agents in patients with rare or advanced cancers. Finally, Seave has been used for training purposes in Australia and Hong Kong, across multiple clinical genomics data analysis workshops for clinical geneticists, researchers, laboratory scientists and other health professionals. Click here for additional data file.

24 in total

1. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Authors: Pablo Cingolani; Adrian Platts; Le Lily Wang; Melissa Coon; Tung Nguyen; Luan Wang; Susan J Land; Xiangyi Lu; Douglas M Ruden
Journal: Fly (Austin) Date: 2012 Apr-Jun Impact factor: 2.160

2. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

Authors: Alexej Abyzov; Alexander E Urban; Michael Snyder; Mark Gerstein
Journal: Genome Res Date: 2011-02-07 Impact factor: 9.043

3. MECR Mutations Cause Childhood-Onset Dystonia and Optic Atrophy, a Mitochondrial Fatty Acid Synthesis Disorder.

Authors: Gali Heimer; Juha M Kerätär; Lisa G Riley; Shanti Balasubramaniam; Eran Eyal; Laura P Pietikäinen; J Kalervo Hiltunen; Dina Marek-Yagel; Jeffrey Hamada; Allison Gregory; Caleb Rogers; Penelope Hogarth; Martha A Nance; Nechama Shalva; Alvit Veber; Michal Tzadok; Andreea Nissenkorn; Davide Tonduti; Florence Renaldo; Ichraf Kraoua; Celeste Panteghini; Lorella Valletta; Barbara Garavaglia; Mark J Cowley; Velimir Gayevskiy; Tony Roscioli; Jonathon M Silberstein; Chen Hoffmann; Annick Raas-Rothschild; Valeria Tiranti; Yair Anikster; John Christodoulou; Alexander J Kastaniotis; Bruria Ben-Zeev; Susan J Hayflick
Journal: Am J Hum Genet Date: 2016-11-03 Impact factor: 11.025

4. A SLC39A8 variant causes manganese deficiency, and glycosylation and mitochondrial disorders.

Authors: Lisa G Riley; Mark J Cowley; Velimir Gayevskiy; Tony Roscioli; David R Thorburn; Kristina Prelog; Melanie Bahlo; Carolyn M Sue; Shanti Balasubramaniam; John Christodoulou
Journal: J Inherit Metab Dis Date: 2016-12-19 Impact factor: 4.982

5. EPG5-Related Vici Syndrome: A Primary Defect of Autophagic Regulation with an Emerging Phenotype Overlapping with Mitochondrial Disorders.

Authors: Shanti Balasubramaniam; Lisa G Riley; Anand Vasudevan; Mark J Cowley; Velimir Gayevskiy; Carolyn M Sue; Caitlin Edwards; Edward Edkins; Reimar Junckerstorff; C Kiraly-Borri; P Rowe; J Christodoulou
Journal: JIMD Rep Date: 2017-11-21

6. A general framework for estimating the relative pathogenicity of human genetic variants.

Authors: Martin Kircher; Daniela M Witten; Preti Jain; Brian J O'Roak; Gregory M Cooper; Jay Shendure
Journal: Nat Genet Date: 2014-02-02 Impact factor: 38.330

7. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.

Authors: F Favero; T Joshi; A M Marquard; N J Birkbak; M Krzystanek; Q Li; Z Szallasi; A C Eklund
Journal: Ann Oncol Date: 2014-10-15 Impact factor: 32.976

8. Mutation allele burden remains unchanged in chronic myelomonocytic leukaemia responding to hypomethylating agents.

Authors: Jane Merlevede; Nathalie Droin; Tingting Qin; Kristen Meldi; Kenichi Yoshida; Margot Morabito; Emilie Chautard; Didier Auboeuf; Pierre Fenaux; Thorsten Braun; Raphael Itzykson; Stéphane de Botton; Bruno Quesnel; Thérèse Commes; Eric Jourdan; William Vainchenker; Olivier Bernard; Noemie Pata-Merci; Stéphanie Solier; Velimir Gayevskiy; Marcel E Dinger; Mark J Cowley; Dorothée Selimoglu-Buet; Vincent Meyer; François Artiguenave; Jean-François Deleuze; Claude Preudhomme; Michael R Stratton; Ludmil B Alexandrov; Eric Padron; Seishi Ogawa; Serge Koscielny; Maria Figueroa; Eric Solary
Journal: Nat Commun Date: 2016-02-24 Impact factor: 14.919

9. LUMPY: a probabilistic framework for structural variant discovery.

Authors: Ryan M Layer; Colby Chiang; Aaron R Quinlan; Ira M Hall
Journal: Genome Biol Date: 2014-06-26 Impact factor: 13.583

10. Defining the genetic basis of early onset hereditary spastic paraplegia using whole genome sequencing.

Authors: Kishore R Kumar; G M Wali; Mahesh Kamate; Gautam Wali; André E Minoche; Clare Puttick; Mark Pinese; Velimir Gayevskiy; Marcel E Dinger; Tony Roscioli; Carolyn M Sue; Mark J Cowley
Journal: Neurogenetics Date: 2016-09-28 Impact factor: 2.660

13 in total

1. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer.

Authors: Marie Wong; Chelsea Mayoh; Loretta M S Lau; David S Ziegler; Paul G Ekert; Mark J Cowley; Dong-Anh Khuong-Quang; Mark Pinese; Amit Kumar; Paulette Barahona; Emilie E Wilkie; Patricia Sullivan; Rachel Bowen-James; Mustafa Syed; Iñigo Martincorena; Federico Abascal; Alexandra Sherstyuk; Noemi A Bolanos; Jonathan Baber; Peter Priestley; M Emmy M Dolman; Emmy D G Fleuren; Marie-Emilie Gauthier; Emily V A Mould; Velimir Gayevskiy; Andrew J Gifford; Dylan Grebert-Wade; Patrick A Strong; Elodie Manouvrier; Meera Warby; David M Thomas; Judy Kirk; Katherine Tucker; Tracey O'Brien; Frank Alvaro; Geoffry B McCowage; Luciano Dalla-Pozza; Nicholas G Gottardo; Heather Tapp; Paul Wood; Seong-Lin Khaw; Jordan R Hansford; Andrew S Moore; Murray D Norris; Toby N Trahair; Richard B Lock; Vanessa Tyrrell; Michelle Haber; Glenn M Marshall
Journal: Nat Med Date: 2020-10-05 Impact factor: 53.440

2. Fatal perinatal mitochondrial cardiac failure caused by recurrent de novo duplications in the ATAD3 locus.

Authors: Ann E Frazier; Alison G Compton; Yoshihito Kishita; Daniella H Hock; AnneMarie E Welch; Sumudu S C Amarasekera; Rocio Rius; Luke E Formosa; Atsuko Imai-Okazaki; David Francis; Min Wang; Nicole J Lake; Simone Tregoning; Jafar S Jabbari; Alexis Lucattini; Kazuhiro R Nitta; Akira Ohtake; Kei Murayama; David J Amor; George McGillivray; Flora Y Wong; Marjo S van der Knaap; R Jeroen Vermeulen; Esko J Wiltshire; Janice M Fletcher; Barry Lewis; Gareth Baynam; Carolyn Ellaway; Shanti Balasubramaniam; Kaustuv Bhattacharya; Mary-Louise Freckmann; Susan Arbuckle; Michael Rodriguez; Ryan J Taft; Simon Sadedin; Mark J Cowley; André E Minoche; Sarah E Calvo; Vamsi K Mootha; Michael T Ryan; Yasushi Okazaki; David A Stroud; Cas Simons; John Christodoulou; David R Thorburn
Journal: Med (N Y) Date: 2020-07-09

3. Genomic diagnostics in polycystic kidney disease: an assessment of real-world use of whole-genome sequencing.

Authors: Amali C Mallawaarachchi; Ben Lundie; Yvonne Hort; Nicole Schonrock; Sarah R Senum; Velimir Gayevskiy; Andre E Minoche; Georgina Hollway; Thomas Ohnesorg; Marcus Hinchcliffe; Chirag Patel; Michel Tchan; Andrew Mallett; Marcel E Dinger; Gopala Rangan; Mark J Cowley; Peter C Harris; Leslie Burnett; John Shine; Timothy J Furlong
Journal: Eur J Hum Genet Date: 2021-01-12 Impact factor: 5.351

4. Reanalysis and optimisation of bioinformatic pipelines is critical for mutation detection.

Authors: Mark J Cowley; Yu-Chi Liu; Karen L Oliver; Gemma Carvill; Candace T Myers; Velimir Gayevskiy; Martin Delatycki; Danique R M Vlaskamp; Ying Zhu; Heather Mefford; Michael F Buckley; Melanie Bahlo; Ingrid E Scheffer; Marcel E Dinger; Tony Roscioli
Journal: Hum Mutat Date: 2019-01-31 Impact factor: 4.878

5. Genomic stratification and liquid biopsy in a rare adrenocortical carcinoma (ACC) case, with dual lung metastases.

Authors: Mark J McCabe; Mark Pinese; Chia-Ling Chan; Nisa Sheriff; Tanya J Thompson; John Grady; Marie Wong; Marie-Emilie A Gauthier; Clare Puttick; Velimir Gayevskiy; Elektra Hajdu; Stephen Q Wong; Wade Barrett; Peter Earls; Robyn Lukeis; Yuen Y Cheng; Ruby C Y Lin; David M Thomas; D Neil Watkins; Marcel E Dinger; Ann I McCormack; Mark J Cowley
Journal: Cold Spring Harb Mol Case Stud Date: 2019-04-01

6. Development and validation of a targeted gene sequencing panel for application to disparate cancers.

Authors: Mark J McCabe; Marie-Emilie A Gauthier; Chia-Ling Chan; Tanya J Thompson; Sunita M C De Sousa; Clare Puttick; John P Grady; Velimir Gayevskiy; Jiang Tao; Kevin Ying; Arcadi Cipponi; Niantao Deng; Alex Swarbrick; Melissa L Thomas; Reginald V Lord; Amber L Johns; Maija Kohonen-Corish; Sandra A O'Toole; Jonathan Clark; Simon A Mueller; Ruta Gupta; Ann I McCormack; Marcel E Dinger; Mark J Cowley
Journal: Sci Rep Date: 2019-11-19 Impact factor: 4.379

7. Application of Genome Sequencing from Blood to Diagnose Mitochondrial Diseases.

Authors: Rocio Rius; Alison G Compton; Naomi L Baker; AnneMarie E Welch; David Coman; Maina P Kava; Andre E Minoche; Mark J Cowley; David R Thorburn; John Christodoulou
Journal: Genes (Basel) Date: 2021-04-20 Impact factor: 4.096

8. Deep multi-region whole-genome sequencing reveals heterogeneity and gene-by-environment interactions in treatment-naive, metastatic lung cancer.

Authors: Tracy L Leong; Velimir Gayevskiy; Daniel P Steinfort; Marc R De Massy; Alvaro Gonzalez-Rajal; Kieren D Marini; Emily Stone; Venessa Chin; Adrian Havryk; Marshall Plit; Louis B Irving; Barton R Jennings; Rachael A McCloy; W Samantha N Jayasekara; Muhammad Alamgeer; Vishal Boolell; Andrew Field; Prudence A Russell; Beena Kumar; Daniel J Gough; Anette Szczepny; Vinod Ganju; Fernando J Rossello; Jason E Cain; Anthony T Papenfuss; Marie-Liesse Asselin-Labat; Mark J Cowley; D Neil Watkins
Journal: Oncogene Date: 2018-10-22 Impact factor: 9.867

9. NEMF mutations that impair ribosome-associated quality control are associated with neuromuscular disease.

Authors: Paige B Martin; Yu Kigoshi-Tansho; Roger B Sher; Gianina Ravenscroft; Jennifer E Stauffer; Rajesh Kumar; Ryo Yonashiro; Tina Müller; Christopher Griffith; William Allen; Davut Pehlivan; Tamar Harel; Martin Zenker; Denise Howting; Denny Schanze; Eissa A Faqeih; Naif A M Almontashiri; Reza Maroofian; Henry Houlden; Neda Mazaheri; Hamid Galehdari; Ganka Douglas; Jennifer E Posey; Monique Ryan; James R Lupski; Nigel G Laing; Claudio A P Joazeiro; Gregory A Cox
Journal: Nat Commun Date: 2020-09-15 Impact factor: 14.919

10. Recurrent SPECC1L-NTRK fusions in pediatric sarcoma and brain tumors.

Authors: Dong-Anh Khuong-Quang; Lauren M Brown; Marie Wong; Chelsea Mayoh; Alexandra Sexton-Oates; Amit Kumar; Mark Pinese; Sumanth Nagabushan; Loretta Lau; Louise E Ludlow; Andrew J Gifford; Michael Rodriguez; Jayesh Desai; Stephen B Fox; Michelle Haber; David S Ziegler; Jordan R Hansford; Glenn M Marshall; Mark J Cowley; Paul G Ekert
Journal: Cold Spring Harb Mol Case Stud Date: 2020-12-17