Velimir Gayevskiy1, Tony Roscioli2,3,4, Marcel E Dinger1,5,6, Mark J Cowley1,5,7. 1. Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia. 2. Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, NSW, Australia. 3. Prince of Wales Clinical School, University of New South Wales, UNSW Sydney, NSW, Australia. 4. Neuroscience Research Australia, University of New South Wales, UNSW Sydney, NSW, Australia. 5. St Vincent's Clinical School, University of New South Wales, UNSW Sydney, NSW, Australia. 6. Genome.One, Darlinghurst, NSW, Australia. 7. Children's Cancer Institute, UNSW Sydney, NSW, Australia.
Abstract
Motivation: Genome sequencing has had a remarkable impact on our ability to study the effects of human genetic variation, however, variant interpretation remains the major bottleneck. Understanding the potential impact of variants, including structural variants, requires extensive annotation from disparate sources of knowledge, and in silico prediction algorithms. Results: We introduce Seave, an intuitive web platform that enables all types of variants to be securely stored, annotated and filtered. Variants are annotated with allele frequencies and pathogenicity assessments from many popular databases and in silico pathogenicity prediction scores. Seave enables filtering of variants with specific inheritance patterns, including somatic variants, by quality, allele frequencies and gene lists which can be curated and saved. Seave was made for whole genome data and is capable of storing and querying copy number and structural variants. Availability and implementation: To demo Seave with public data, see https://www.seave.bio. Source code is available at http://code.seave.bio and extensive documentation is available at http://documentation.seave.bio. Seave can be locally installed on an Apache server with PHP and MySQL, or we provide an Amazon Machine Image for quick deployment. For commercial and clinical diagnostic licensing, contact the corresponding author. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Genome sequencing has had a remarkable impact on our ability to study the effects of human genetic variation, however, variant interpretation remains the major bottleneck. Understanding the potential impact of variants, including structural variants, requires extensive annotation from disparate sources of knowledge, and in silico prediction algorithms. Results: We introduce Seave, an intuitive web platform that enables all types of variants to be securely stored, annotated and filtered. Variants are annotated with allele frequencies and pathogenicity assessments from many popular databases and in silico pathogenicity prediction scores. Seave enables filtering of variants with specific inheritance patterns, including somatic variants, by quality, allele frequencies and gene lists which can be curated and saved. Seave was made for whole genome data and is capable of storing and querying copy number and structural variants. Availability and implementation: To demo Seave with public data, see https://www.seave.bio. Source code is available at http://code.seave.bio and extensive documentation is available at http://documentation.seave.bio. Seave can be locally installed on an Apache server with PHP and MySQL, or we provide an Amazon Machine Image for quick deployment. For commercial and clinical diagnostic licensing, contact the corresponding author. Supplementary information: Supplementary data are available at Bioinformatics online.
The rapid adoption of human genome sequencing has made substantial inroads in our understanding of the impact of genetic variation on health and disease (Delaney ). Vast catalogues of genetic variants now exist, tens of thousands of which have been unequivocally linked to disease. Through advances in genomic technologies and bioinformatic methodologies, it is feasible to comprehensively identify all classes of genomic variation within an individual’s genome, ranging from single nucleotide variants (SNVs) and short insertions or deletions (Indels), to large copy number variants (CNVs), structural variants (SVs) and mobile element insertions (MEIs), any of which may contribute to their phenotype.Interpreting the potential impact of any variant is a difficult task (Amendola ), and is a major impediment to widespread adoption of genomic medicine. Variant interpretation requires an assessment of data quality, and investigating dozens of resources, including databases of genomic variation in healthy controls, from patients with disease, resources linking genes to phenotype or disease and the latest literature. All of this information must be kept up to date. Interpreting the impact of novel variants can be supported by potentially dozens of in silico pathogenicity scores. CNVs of any size and SVs are important sources of pathogenic variants, and should be considered alongside short variants. Importantly, this genomic complexity must be distilled, and presented in a way which is accessible to all researchers, clinicians and laboratory staff.To address these challenges, we developed Seave, a web-based variant filtration platform that stores, queries and annotates genomic variation of all sizes. It is designed for clinicians and researchers, primarily for rare disease and cancer, and requires no knowledge of bioinformatics to use.
2 Seave description
2.1 Scope
Seave was designed from the outset to handle whole-genome-sized variant callsets from tens of thousands of patients, and ably supports data from any sized targeted sequencing panel. Seave supports the following classes of genetic variants: SNVs, Indels, CNVs, SVs and runs of homozygosity (ROH), from the nuclear and mitochondrial genomes. Due to the large file sizes from whole-genome sequencing (i.e. ∼5M variants per patient, and >3 Gb compressed VCF files), Seave is designed to automatically receive data generated by production analysis pipelines via an API (Fig. 1).
Fig. 1.
Schematic overview of Seave’s main features. Arrows represent the flow of information through sequencing, variant detection, data storage, filtration, annotation, interpretation and outcomes
Schematic overview of Seave’s main features. Arrows represent the flow of information through sequencing, variant detection, data storage, filtration, annotation, interpretation and outcomes
2.2 Data import and annotation
Seave uses GEMINI (Paila ) databases to store, manage and query genome data, where each database represents a cohort (Supplementary Fig. S1). GEMINI databases are portable, convenient, sharable and allow Seave to scale to vast numbers of individuals by simply increasing disk space. Users can group samples within a cohort into families, and annotate individuals with their gender and affected status, interactively, or via a PED file.Importing data starts with annotating VCF files containing SNVs and Indels from one or more individuals using Variant Effect Predictor (VEP; McLaren ) or SnpEff (Cingolani ), then converting these into a GEMINI database. These databases are then imported into Seave using an API, or the administration interface. Seave manages a MySQL annotation database, which complements the annotations from VEP and GEMINI, to provide pre-computed, extensive in silico prediction scores, gene-phenotype-disease links and allele frequencies in healthy controls and various diseases (Fig. 1, Supplementary Data). To keep these annotations up to date, we provide tools for updating those annotations which are regularly updated. Seave supports many popular variant callers including GATK HaplotypeCaller (McKenna ), FreeBayes (Garrison and Marth, 2012) and somatic variant callers including Strelka (Saunders ), MuTect2 (Cibulskis ) and VarDict (Lai ). Seave currently supports only the GRCh37 (hg19) reference genome.To support large CNVs and SVs, we developed the Genome Block Store (GBS). The GBS is a scalable MySQL database designed to store large genomic segments or blocks (e.g. deletions, duplications, inversions, MEIs or ROH), or linked blocks (e.g. gene fusion breakpoints), with additional annotations (e.g. copy number or breakpoint read depth). A number of popular tools are supported, including CNVnator (Abyzov ), LUMPY (Layer ), Sequenza (Favero ), ClinSV (Minoche et al., manuscript in preparation), Manta (Chen ), CNVkit (Talevich ) and ROHmer (Puttick et al., manuscript in preparation).
2.3 Filtering
After selecting a cohort for analysis (Supplementary Fig. S1), users can optionally select to filter short variants by inheritance pattern, including heterozygous dominant, homozygous recessive, de novo dominant and compound heterozygous (Supplementary Fig. S2). On the Query page, the genomic search space can be restricted or specifically excluded by using any number of genomic coordinates, curated gene lists from Seave’s gene list management system, or custom gene symbols (Supplementary Fig. S3). Variants can be restricted by their impact: low (e.g. synonymous), medium (e.g. missense), high (e.g. nonsense, frameshift and essential splice region), or coding, by CADD score (Kircher ) and population allele frequencies. Technical filtering parameters include minimum sequencing depth in all samples, minimum variant quality, excluding failed variants and the type and number of variants to return.A typical family trio sequenced by whole genome sequencing yields 6 million variants and this number is rapidly reduced to below 200 by just filtering on rarity, impact and inheritance pattern (Supplementary Fig. S4). Most queries take 0–5 s to execute but this can stretch up to 2 min for large cohorts of whole genomes with inheritance patterns specified.Variants that pass all filters are displayed in a dynamic table, with the extensive annotations noted in Fig. 1 and hyperlinks where applicable (Supplementary Fig. S5). The Impact Summary column visually summarizes the pathogenicity evidence relating to a variant and its cognate gene (Fig. 1, Supplementary Fig. S5). Annotations place variants in the context of the functional genome, and can be dynamically shown using toggle buttons (Supplementary Fig. S6), and sorted by strength of evidence. A unique strength of Seave is that short variants that overlap CNVs or SVs from the same individual in the GBS are highlighted, allowing variants and CNVs to be jointly interrogated. Hyperlinks to control an IGV session are also provided. Results can be shared via their URL, or downloaded to TSV, which includes important auditing information, including timestamps, exact queries used and the versions of all annotations.There are a number of dedicated queries to interrogate CNVs and SVs, which partially rely on BEDTools (Quinlan and Hall, 2010) to perform interval querying logic. CNVs or SVs can be restricted by gene lists or genomic coordinates, copy number thresholds and minimum CNV size. The SV Fusions search mode is a powerful way to identify candidate gene fusions due to CNV or SV. The Method Overlaps query allows CNVs or SVs identified by multiple callers to be prioritized, whereas the Sample Overlaps query allows CNVs or SVs segregating in families to be prioritized. Finally, the ROHmer query is useful for variant filtering in consanguineous families, and identifies genomic regions of homozygosity that are shared by all affected individuals in a family but not by any unaffected individuals.
2.4 Sharing and security
Seave has a user management system, allowing fine grained data sharing control. Databases are owned by a single group, and users can be members of any number of groups. Administrators can import data, manage users, groups, databases, GBS data and custom gene lists. All login events, data import/export, queries, gene list and user or group changes are audited, and all data is transferred and stored using encryption. Seave is written in PHP to be run on an Apache web server with a MySQL database.
3 Conclusion
Seave was built to enable gene discovery research, diagnostics and precision cancer medicine from whole genome sequences. As a component of Australia’s first clinically accredited whole genome pathology service, Seave has met the rigorous demands of ISO 15189 clinical accreditation. In a research setting, it has been successfully used to discover novel disease genes and variants, as well as rapidly supporting the diagnosis of patients with previously reported pathogenic variants (Balasubramaniam ,b; De Sousa ; Ewans ; Heimer ; Kumar ; Riley ). Cancer research requires the comprehensive interrogation of large numbers of somatic variants of all sizes and types at differing variant allele frequencies. Accordingly, Seave has been used for characterizing tumour evolution (Merlevede ) and as part of two precision cancer genomics programs: the Lions Kids Cancer Genome Project (LKCGP), as part of the Zero Childhood Cancer Program for children with high-risk cancers using whole genome sequencing and for the Molecular Screening and Therapeutics (MoST) program using a targeted genomics screen to test targeted anti-cancer agents in patients with rare or advanced cancers. Finally, Seave has been used for training purposes in Australia and Hong Kong, across multiple clinical genomics data analysis workshops for clinical geneticists, researchers, laboratory scientists and other health professionals.Click here for additional data file.
Authors: Gali Heimer; Juha M Kerätär; Lisa G Riley; Shanti Balasubramaniam; Eran Eyal; Laura P Pietikäinen; J Kalervo Hiltunen; Dina Marek-Yagel; Jeffrey Hamada; Allison Gregory; Caleb Rogers; Penelope Hogarth; Martha A Nance; Nechama Shalva; Alvit Veber; Michal Tzadok; Andreea Nissenkorn; Davide Tonduti; Florence Renaldo; Ichraf Kraoua; Celeste Panteghini; Lorella Valletta; Barbara Garavaglia; Mark J Cowley; Velimir Gayevskiy; Tony Roscioli; Jonathon M Silberstein; Chen Hoffmann; Annick Raas-Rothschild; Valeria Tiranti; Yair Anikster; John Christodoulou; Alexander J Kastaniotis; Bruria Ben-Zeev; Susan J Hayflick Journal: Am J Hum Genet Date: 2016-11-03 Impact factor: 11.025
Authors: Lisa G Riley; Mark J Cowley; Velimir Gayevskiy; Tony Roscioli; David R Thorburn; Kristina Prelog; Melanie Bahlo; Carolyn M Sue; Shanti Balasubramaniam; John Christodoulou Journal: J Inherit Metab Dis Date: 2016-12-19 Impact factor: 4.982
Authors: Shanti Balasubramaniam; Lisa G Riley; Anand Vasudevan; Mark J Cowley; Velimir Gayevskiy; Carolyn M Sue; Caitlin Edwards; Edward Edkins; Reimar Junckerstorff; C Kiraly-Borri; P Rowe; J Christodoulou Journal: JIMD Rep Date: 2017-11-21
Authors: Martin Kircher; Daniela M Witten; Preti Jain; Brian J O'Roak; Gregory M Cooper; Jay Shendure Journal: Nat Genet Date: 2014-02-02 Impact factor: 38.330
Authors: F Favero; T Joshi; A M Marquard; N J Birkbak; M Krzystanek; Q Li; Z Szallasi; A C Eklund Journal: Ann Oncol Date: 2014-10-15 Impact factor: 32.976
Authors: Jane Merlevede; Nathalie Droin; Tingting Qin; Kristen Meldi; Kenichi Yoshida; Margot Morabito; Emilie Chautard; Didier Auboeuf; Pierre Fenaux; Thorsten Braun; Raphael Itzykson; Stéphane de Botton; Bruno Quesnel; Thérèse Commes; Eric Jourdan; William Vainchenker; Olivier Bernard; Noemie Pata-Merci; Stéphanie Solier; Velimir Gayevskiy; Marcel E Dinger; Mark J Cowley; Dorothée Selimoglu-Buet; Vincent Meyer; François Artiguenave; Jean-François Deleuze; Claude Preudhomme; Michael R Stratton; Ludmil B Alexandrov; Eric Padron; Seishi Ogawa; Serge Koscielny; Maria Figueroa; Eric Solary Journal: Nat Commun Date: 2016-02-24 Impact factor: 14.919
Authors: Kishore R Kumar; G M Wali; Mahesh Kamate; Gautam Wali; André E Minoche; Clare Puttick; Mark Pinese; Velimir Gayevskiy; Marcel E Dinger; Tony Roscioli; Carolyn M Sue; Mark J Cowley Journal: Neurogenetics Date: 2016-09-28 Impact factor: 2.660
Authors: Marie Wong; Chelsea Mayoh; Loretta M S Lau; David S Ziegler; Paul G Ekert; Mark J Cowley; Dong-Anh Khuong-Quang; Mark Pinese; Amit Kumar; Paulette Barahona; Emilie E Wilkie; Patricia Sullivan; Rachel Bowen-James; Mustafa Syed; Iñigo Martincorena; Federico Abascal; Alexandra Sherstyuk; Noemi A Bolanos; Jonathan Baber; Peter Priestley; M Emmy M Dolman; Emmy D G Fleuren; Marie-Emilie Gauthier; Emily V A Mould; Velimir Gayevskiy; Andrew J Gifford; Dylan Grebert-Wade; Patrick A Strong; Elodie Manouvrier; Meera Warby; David M Thomas; Judy Kirk; Katherine Tucker; Tracey O'Brien; Frank Alvaro; Geoffry B McCowage; Luciano Dalla-Pozza; Nicholas G Gottardo; Heather Tapp; Paul Wood; Seong-Lin Khaw; Jordan R Hansford; Andrew S Moore; Murray D Norris; Toby N Trahair; Richard B Lock; Vanessa Tyrrell; Michelle Haber; Glenn M Marshall Journal: Nat Med Date: 2020-10-05 Impact factor: 53.440
Authors: Ann E Frazier; Alison G Compton; Yoshihito Kishita; Daniella H Hock; AnneMarie E Welch; Sumudu S C Amarasekera; Rocio Rius; Luke E Formosa; Atsuko Imai-Okazaki; David Francis; Min Wang; Nicole J Lake; Simone Tregoning; Jafar S Jabbari; Alexis Lucattini; Kazuhiro R Nitta; Akira Ohtake; Kei Murayama; David J Amor; George McGillivray; Flora Y Wong; Marjo S van der Knaap; R Jeroen Vermeulen; Esko J Wiltshire; Janice M Fletcher; Barry Lewis; Gareth Baynam; Carolyn Ellaway; Shanti Balasubramaniam; Kaustuv Bhattacharya; Mary-Louise Freckmann; Susan Arbuckle; Michael Rodriguez; Ryan J Taft; Simon Sadedin; Mark J Cowley; André E Minoche; Sarah E Calvo; Vamsi K Mootha; Michael T Ryan; Yasushi Okazaki; David A Stroud; Cas Simons; John Christodoulou; David R Thorburn Journal: Med (N Y) Date: 2020-07-09
Authors: Amali C Mallawaarachchi; Ben Lundie; Yvonne Hort; Nicole Schonrock; Sarah R Senum; Velimir Gayevskiy; Andre E Minoche; Georgina Hollway; Thomas Ohnesorg; Marcus Hinchcliffe; Chirag Patel; Michel Tchan; Andrew Mallett; Marcel E Dinger; Gopala Rangan; Mark J Cowley; Peter C Harris; Leslie Burnett; John Shine; Timothy J Furlong Journal: Eur J Hum Genet Date: 2021-01-12 Impact factor: 5.351
Authors: Mark J Cowley; Yu-Chi Liu; Karen L Oliver; Gemma Carvill; Candace T Myers; Velimir Gayevskiy; Martin Delatycki; Danique R M Vlaskamp; Ying Zhu; Heather Mefford; Michael F Buckley; Melanie Bahlo; Ingrid E Scheffer; Marcel E Dinger; Tony Roscioli Journal: Hum Mutat Date: 2019-01-31 Impact factor: 4.878
Authors: Mark J McCabe; Mark Pinese; Chia-Ling Chan; Nisa Sheriff; Tanya J Thompson; John Grady; Marie Wong; Marie-Emilie A Gauthier; Clare Puttick; Velimir Gayevskiy; Elektra Hajdu; Stephen Q Wong; Wade Barrett; Peter Earls; Robyn Lukeis; Yuen Y Cheng; Ruby C Y Lin; David M Thomas; D Neil Watkins; Marcel E Dinger; Ann I McCormack; Mark J Cowley Journal: Cold Spring Harb Mol Case Stud Date: 2019-04-01
Authors: Mark J McCabe; Marie-Emilie A Gauthier; Chia-Ling Chan; Tanya J Thompson; Sunita M C De Sousa; Clare Puttick; John P Grady; Velimir Gayevskiy; Jiang Tao; Kevin Ying; Arcadi Cipponi; Niantao Deng; Alex Swarbrick; Melissa L Thomas; Reginald V Lord; Amber L Johns; Maija Kohonen-Corish; Sandra A O'Toole; Jonathan Clark; Simon A Mueller; Ruta Gupta; Ann I McCormack; Marcel E Dinger; Mark J Cowley Journal: Sci Rep Date: 2019-11-19 Impact factor: 4.379
Authors: Rocio Rius; Alison G Compton; Naomi L Baker; AnneMarie E Welch; David Coman; Maina P Kava; Andre E Minoche; Mark J Cowley; David R Thorburn; John Christodoulou Journal: Genes (Basel) Date: 2021-04-20 Impact factor: 4.096
Authors: Tracy L Leong; Velimir Gayevskiy; Daniel P Steinfort; Marc R De Massy; Alvaro Gonzalez-Rajal; Kieren D Marini; Emily Stone; Venessa Chin; Adrian Havryk; Marshall Plit; Louis B Irving; Barton R Jennings; Rachael A McCloy; W Samantha N Jayasekara; Muhammad Alamgeer; Vishal Boolell; Andrew Field; Prudence A Russell; Beena Kumar; Daniel J Gough; Anette Szczepny; Vinod Ganju; Fernando J Rossello; Jason E Cain; Anthony T Papenfuss; Marie-Liesse Asselin-Labat; Mark J Cowley; D Neil Watkins Journal: Oncogene Date: 2018-10-22 Impact factor: 9.867
Authors: Paige B Martin; Yu Kigoshi-Tansho; Roger B Sher; Gianina Ravenscroft; Jennifer E Stauffer; Rajesh Kumar; Ryo Yonashiro; Tina Müller; Christopher Griffith; William Allen; Davut Pehlivan; Tamar Harel; Martin Zenker; Denise Howting; Denny Schanze; Eissa A Faqeih; Naif A M Almontashiri; Reza Maroofian; Henry Houlden; Neda Mazaheri; Hamid Galehdari; Ganka Douglas; Jennifer E Posey; Monique Ryan; James R Lupski; Nigel G Laing; Claudio A P Joazeiro; Gregory A Cox Journal: Nat Commun Date: 2020-09-15 Impact factor: 14.919
Authors: Dong-Anh Khuong-Quang; Lauren M Brown; Marie Wong; Chelsea Mayoh; Alexandra Sexton-Oates; Amit Kumar; Mark Pinese; Sumanth Nagabushan; Loretta Lau; Louise E Ludlow; Andrew J Gifford; Michael Rodriguez; Jayesh Desai; Stephen B Fox; Michelle Haber; David S Ziegler; Jordan R Hansford; Glenn M Marshall; Mark J Cowley; Paul G Ekert Journal: Cold Spring Harb Mol Case Stud Date: 2020-12-17