Literature DB >> 21666252

Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information.

Thomas Schmitt, David N Messina, Fabian Schreiber, Erik L L Sonnhammer.   

Abstract

There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records-the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

Mesh:

Substances:

Year:  2011        PMID: 21666252     DOI: 10.1093/bib/bbr025

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  30 in total

1.  GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis.

Authors:  Bruno Contreras-Moreira; Pablo Vinuesa
Journal:  Appl Environ Microbiol       Date:  2013-10-04       Impact factor: 4.792

2.  Linking genome annotation projects with genetic disorders using ontologies.

Authors:  María del Carmen Legaz-García; José Antonio Miñarro-Giménez; Marisa Madrid; Marcos Menárguez-Tortosa; Santiago Torres Martínez; Jesualdo Tomás Fernández-Breis
Journal:  J Med Syst       Date:  2012-11       Impact factor: 4.460

3.  Scripting Analyses of Genomes in Ensembl Plants.

Authors:  Bruno Contreras-Moreira; Guy Naamati; Marc Rosello; James E Allen; Sarah E Hunt; Matthieu Muffato; Astrid Gall; Paul Flicek
Journal:  Methods Mol Biol       Date:  2022

4.  Roundup 2.0: enabling comparative genomics for over 1800 genomes.

Authors:  Todd F DeLuca; Jike Cui; Jae-Yoon Jung; Kristian Che St Gabriel; Dennis P Wall
Journal:  Bioinformatics       Date:  2012-01-13       Impact factor: 6.937

5.  Toward community standards in the quest for orthologs.

Authors:  Christophe Dessimoz; Toni Gabaldón; David S Roos; Erik L L Sonnhammer; Javier Herrero
Journal:  Bioinformatics       Date:  2012-02-12       Impact factor: 6.937

6.  NeXML: rich, extensible, and verifiable representation of comparative data and metadata.

Authors:  Rutger A Vos; James P Balhoff; Jason A Caravas; Mark T Holder; Hilmar Lapp; Wayne P Maddison; Peter E Midford; Anurag Priyam; Jeet Sukumaran; Xuhua Xia; Arlin Stoltzfus
Journal:  Syst Biol       Date:  2012-02-22       Impact factor: 15.683

7.  The Quest for Orthologs orthology benchmark service in 2022.

Authors:  Yannis Nevers; Tamsin E M Jones; Dushyanth Jyothi; Bethan Yates; Meritxell Ferret; Laura Portell-Silva; Laia Codo; Salvatore Cosentino; Marina Marcet-Houben; Anna Vlasova; Laetitia Poidevin; Arnaud Kress; Mark Hickman; Emma Persson; Ivana Piližota; Cristina Guijarro-Clarke; Wataru Iwasaki; Odile Lecompte; Erik Sonnhammer; David S Roos; Toni Gabaldón; David Thybert; Paul D Thomas; Yanhui Hu; David M Emms; Elspeth Bruford; Salvador Capella-Gutierrez; Maria J Martin; Christophe Dessimoz; Adrian Altenhoff
Journal:  Nucleic Acids Res       Date:  2022-05-12       Impact factor: 19.160

8.  Exploring the utility of cross-laboratory RAD-sequencing datasets for phylogenetic analysis.

Authors:  Serap Gonen; Stephen C Bishop; Ross D Houston
Journal:  BMC Res Notes       Date:  2015-07-08

9.  OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis.

Authors:  Matthew D Whiteside; Geoffrey L Winsor; Matthew R Laird; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

10.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome.

Authors:  Jaime Huerta-Cepas; Salvador Capella-Gutiérrez; Leszek P Pryszcz; Marina Marcet-Houben; Toni Gabaldón
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.