Literature DB >> 33322070

NCBI's Virus Discovery Codeathon: Building "FIVE" -The Federated Index of Viral Experiments API Index.

Joan Martí-Carreras1, Alejandro Rafael Gener2,3,4,5, Sierra D Miller6, Anderson F Brito7, Christiam E Camacho8, Ryan Connor8, Ward Deboutte1, Cody Glickman1, David M Kristensen9, Wynn K Meyer10, Sejal Modha11, Alexis L Norris12, Surya Saha13,14, Anna K Belford15, Evan Biederstedt16, James Rodney Brister8, Jan P Buchmann17, Nicholas P Cooley18, Robert A Edwards19, Kiran Javkar20,21, Michael Muchow22, Harihara Subrahmaniam Muralidharan20,23, Charles Pepe-Ranney24, Nidhi Shah20, Migun Shakya25, Michael J Tisza15, Benjamin J Tully26, Bert Vanmechelen1, Valerie C Virta27, J L Weissman28, Vadim Zalunin8, Alexandre Efremov8, Ben Busby8,29.   

Abstract

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.

Entities:  

Keywords:  CRISPR; HIV-1; data federation; genome graphs; metagenomics; protein domain; virus

Mesh:

Substances:

Year:  2020        PMID: 33322070      PMCID: PMC7764237          DOI: 10.3390/v12121424

Source DB:  PubMed          Journal:  Viruses        ISSN: 1999-4915            Impact factor:   5.048


  23 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

2.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

3.  The Sequence Read Archive: explosive growth of sequencing data.

Authors:  Yuichi Kodama; Martin Shumway; Rasko Leinonen
Journal:  Nucleic Acids Res       Date:  2011-10-18       Impact factor: 16.971

4.  ViPR: an open bioinformatics database and analysis resource for virology research.

Authors:  Brett E Pickett; Eva L Sadat; Yun Zhang; Jyothi M Noronha; R Burke Squires; Victoria Hunt; Mengya Liu; Sanjeev Kumar; Sam Zaremba; Zhiping Gu; Liwei Zhou; Christopher N Larson; Jonathan Dietrich; Edward B Klem; Richard H Scheuermann
Journal:  Nucleic Acids Res       Date:  2011-10-17       Impact factor: 16.971

5.  VirHostNet 2.0: surfing on the web of virus/host molecular interactions data.

Authors:  Thibaut Guirimand; Stéphane Delmotte; Vincent Navratil
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 16.971

Review 6.  Genome graphs and the evolution of genome inference.

Authors:  Benedict Paten; Adam M Novak; Jordan M Eizenga; Erik Garrison
Journal:  Genome Res       Date:  2017-03-30       Impact factor: 9.043

7.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Richa Agarwala; Evan E Bolton; J Rodney Brister; Kathi Canese; Karen Clark; Ryan Connor; Nicolas Fiorini; Kathryn Funk; Timothy Hefferon; J Bradley Holmes; Sunghwan Kim; Avi Kimchi; Paul A Kitts; Stacy Lathrop; Zhiyong Lu; Thomas L Madden; Aron Marchler-Bauer; Lon Phan; Valerie A Schneider; Conrad L Schoch; Kim D Pruitt; James Ostell
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

8.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Tanya Barrett; Dennis A Benson; Stephen H Bryant; Kathi Canese; Vyacheslav Chetvernin; Deanna M Church; Michael DiCuccio; Ron Edgar; Scott Federhen; Michael Feolo; Lewis Y Geer; Wolfgang Helmberg; Yuri Kapustin; David Landsman; David J Lipman; Thomas L Madden; Donna R Maglott; Vadim Miller; Ilene Mizrachi; James Ostell; Kim D Pruitt; Gregory D Schuler; Edwin Sequeira; Stephen T Sherry; Martin Shumway; Karl Sirotkin; Alexandre Souvorov; Grigory Starchenko; Tatiana A Tatusova; Lukas Wagner; Eugene Yaschenko; Jian Ye
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

9.  CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats.

Authors:  Charles Bland; Teresa L Ramsey; Fareedah Sabree; Micheal Lowe; Kyndall Brown; Nikos C Kyrpides; Philip Hugenholtz
Journal:  BMC Bioinformatics       Date:  2007-06-18       Impact factor: 3.169

10.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

View more
  1 in total

1.  Quantifying and Cataloguing Unknown Sequences within Human Microbiomes.

Authors:  Sejal Modha; David L Robertson; Joseph Hughes; Richard J Orton
Journal:  mSystems       Date:  2022-03-08       Impact factor: 7.324

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.