Literature DB >> 23180791

OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs.

Robert M Waterhouse¹, Fredrik Tegenfeldt, Jia Li, Evgeny M Zdobnov, Evgenia V Kriventseva.

Abstract

The concept of orthology provides a foundation for formulating hypotheses on gene and genome evolution, and thus forms the cornerstone of comparative genomics, phylogenomics and metagenomics. We present the update of OrthoDB-the hierarchical catalog of orthologs (http://www.orthodb.org). From its conception, OrthoDB promoted delineation of orthologs at varying resolution by explicitly referring to the hierarchy of species radiations, now also adopted by other resources. The current release provides comprehensive coverage of animals and fungi representing 252 eukaryotic species, and is now extended to prokaryotes with the inclusion of 1115 bacteria. Functional annotations of orthologous groups are provided through mapping to InterPro, GO, OMIM and model organism phenotypes, with cross-references to major resources including UniProt, NCBI and FlyBase. Uniquely, OrthoDB provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and now extended with exon-intron architectures, syntenic orthologs and parent-child trees. The interactive web interface allows navigation along the species phylogenies, complex queries with various identifiers, annotation keywords and phrases, as well as with gene copy-number profiles and sequence homology searches. With the explosive growth of available data, OrthoDB also provides mapping of newly sequenced genomes and transcriptomes to the current orthologous groups.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 23180791 PMCID： PMC3531149 DOI： 10.1093/nar/gks1116

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Homology in molecular biology refers to a common ancestry. In practice, homologous genes are recognized through the assessment of the statistical significance of sequence similarities of aligned nucleotides or amino acids. With reference to a specific species radiation, homologous relations define orthologs—‘equivalent’ genes in different species descended from a single ancestral gene (1–3). Speciation events, gene duplications, losses and sequence mutations lead to the diversity of genes encoded in the genomes of modern species. For any given set of species, all the descendants of a single gene from their last common ancestor constitute an orthologous group of genes. Orthology is therefore inherently hierarchical, referring explicitly to the last common ancestor, such that mostly one-to-one orthologs are identified among closely related species, whereas among more distantly related species orthologous groups comprise all surviving descendants of the ancestral gene. There are two main approaches for orthology delineation: (i) algorithms that cluster all-against-all pairwise sequence comparisons, usually first identifying best-reciprocal matches between genomes that correspond to the shortest path over the speciation node of a distance-based tree, e.g. (4–12); and (ii) phylogeny-based methods that first define homologous gene families, build gene trees for each family, and then explicitly or implicitly reconcile them with the species tree often employing assumptions on rates of gene losses and duplications, e.g. (13–18). Phylogeny-based approaches have more parameters and may therefore yield better accuracy given sufficient data, but are often limited by the quality of multiple sequence alignments. This approach also considerably increases computational demands and becomes impractical for hundreds of species. Recent benchmarking of prominent orthology resources (19,20) show that in the trade-off between specificity and sensitivity, OrthoDB assignments favor greater specificity with reasonable sensitivity, a balance that is well-suited to the goal of inferring gene functions. Although orthology is strictly an evolutionary concept, it can support the tentative transfer of functional annotations from well-studied organisms to orthologs in newly sequenced species. The confidence of such hypotheses on gene function may be qualitatively gauged by the genes’ evolutionary histories, e.g. more confident inferences may be made for orthologs that are preserved across many species mostly as single-copy genes, with relatively low levels of sequence divergence, and consistent protein domain architectures. Gene duplicates in multi-copy orthologous groups often exhibit greater sequence divergence than single-copy orthologs (21), and as this may reflect biological innovation, any inferences on gene function should be made cautiously. OrthoDB classifications have proved to be accurate and biologically relevant as assessed within the framework of several recent genome projects, e.g. (22–26). Thus, the evolutionary characterization of orthologous groups in OrthoDB, collated with available gene functional annotations, provide a strong basis for making informed hypotheses that can drive evolutionary and molecular biology research.

SPECIES SAMPLING

The current OrthoDB release includes more than 250 eukaryotes and now also extends to cover prokaryotes with a total of 1115 bacterial species (Table 1, Supplementary Table S1). The predicted protein-coding gene sets and their corresponding General Feature Format (GFF) annotations for 52 vertebrate species were retrieved from Ensembl (27) (Release 67, May 2012). Data for the 45 arthropods were sourced from AphidBase (28), BeetleBase (29), FlyBase (30), Hymenoptera Genome Database (31), SilkDB (32), VectorBase (33), wFleaBase (34) and several genome consortia (as of July 2012). Gene sets for an additional 13 basal animal species were retrieved from Ensembl Genomes (35) and the Joint Genome Institute (36) (as of July 2012). The 142 fungal gene sets were retrieved from UniProt (37) (July 2012 release) and the bacteria were retrieved from NCBI (38) (Supplementary Table S1).

Table 1.

OrthoDB species and gene content

Lineage Representative species	Input genes		Classified genes (%)	Percentage of classified genes
	Total	Average		in groups with annotation(s)^a	in groups with phenotype(s)^b
52 Vertebrates	951 245	18 293	92.7	96.3	48.4
Homo sapiens	20 827	na	94.9	93.5	45.6
Mus musculus	23 075	na	87.0	96.5	47.9
Danio rerio	26 206	na	80.7	96.9	48.5
45 Arthropods	746 324	16 585	71.1	87.1	25.1
Drosophila melanogaster	13 927	na	96.1	86.5	26.6
110^c Metazoa	1 974 947	17 954	81.9	93.5	60.8
Caenorhabditis elegans	20 517	na	71.5	84.7	61.4
142 Fungi	1 223 848	8619	85.0	86.8	49.3
Saccharomyces cerevisiae	6652	na	96.2	91.9	94.8
1115 Bacteria	3 532 434	3168	91.0	91.6	47.1
Escherichia coli	4149	na	97.8	97.7	98.8
Haemophilus influenza	1657	na	98.2	98.8	85.3
Mycobacterium tuberculosis	3977	na	95.5	93.3	35.9

Statistics describing OrthoDB species coverage of vertebrate, arthropod, basal metazoan, fungal and bacterial orthologs with rich functional annotations.

aGO terms or InterPro domains.

bFrom Online Mendelian Inheritance in Man, the Mouse Genome Database, the Zebrafish Model Organism Database, FlyBase, WormBase, Saccharomyces Genome Database, EcoGene or the Database of Essential Genes.

c13 basal metazoan species plus 52 vertebrates and 45 arthropods.

OrthoDB species and gene content Statistics describing OrthoDB species coverage of vertebrate, arthropod, basal metazoan, fungal and bacterial orthologs with rich functional annotations. aGO terms or InterPro domains. bFrom Online Mendelian Inheritance in Man, the Mouse Genome Database, the Zebrafish Model Organism Database, FlyBase, WormBase, Saccharomyces Genome Database, EcoGene or the Database of Essential Genes. c13 basal metazoan species plus 52 vertebrates and 45 arthropods.

HIERARCHICAL ORTHOLOGOUS GROUPS

The OrthoDB orthology delineation procedure is based on clustering of best-reciprocal-hits (BRHs) between genes from each species pair, determined from all-against-all Smith–Waterman protein sequence comparisons now using SWIPE (39). The clustering procedure considers only the longest transcript per gene, and only the longest of all gene copies in a single genome with over 97% amino acid identity as determined by CD-HIT (40). Clusters are built progressively, with an e-value cutoff of 1e-3 for triangulating BRHs, and 1e-6 for pair-only BRHs, requiring an overall minimum sequence alignment overlap of 30 amino acids. The clusters of BRHs are subsequently further expanded to include all in-paralogs recognized as within-species homologs that are more closely related than the clustered BRHs. Since its conception, OrthoDB (41) has promoted the concept of hierarchical orthology classifications by applying the clustering procedure at each radiation point of the considered species phylogeny and allowing users to explicitly select the most relevant level. It is rewarding to note that other resources e.g. (7,8) have embraced this concept and now provide orthology classifications at several major radiations across the tree of life. To determine the OrthoDB hierarchy, the species phylogenies in the current release were empirically computed using a maximum-likelihood approach as implemented in FastTree (42) over the super-alignment of mostly single-copy orthologs defined at the root node, multiply-aligned using MAFFT (43), and filtered using TrimAl (44), and corroborated with known taxonomies from the literature. The hierarchical orthology delineation procedure of the sampled lineages of vertebrates, arthropods and fungi classified 84% of a total of 2 921 417 protein-coding genes into 25 371, 33 393 and 55 793 orthologous groups, respectively (Table 1). Root-level delineation across the 110 animal species defined 58 308 orthologous groups covering 82% of the 3 198 795 metazoan genes and clustering of the 1115 bacteria classified 91% of the 3 532 434 bacterial genes. In addition to the root-level orthologs, 11 subgroups of bacteria—corresponding to the NCBI taxonomy ‘class’ levels—were clustered to provide more fine-grained orthologous groups for Actinobacteria, Spirochetes, Tenericutes, Thermotage, two classes of Cyanobacteria and Firmicutes, and three classes of Proteobacteria.

MAPPED FUNCTIONAL ANNOTATIONS

As orthologous groups comprise genes descended from a common ancestor, functional attributes ascribed to one or more members can be tentatively extrapolated to the last common ancestor and describe the group as a whole. In this way, orthologous group summary annotations provide an overview of mapped functional attributes with links to respective source databases to allow further investigations of the putative biological roles of their member genes (Figure 1).

Figure 1.

Screenshot of a sample orthologous group results page, featuring functional and evolutionary annotations, the inferred parent–child gene tree and syntenic orthologs.

Concise descriptors

Gene functional descriptions sourced from UniProt (37) and NCBI (38) provide succinct indications of known or inferred biological functions with coherent nomenclatures based on data from the literature as well as biocurator-evaluated and automatic computational classifications and annotations. In this OrthoDB release, frequently occurring phrases from member-gene descriptions label the group with a meaningful descriptor for each orthologous group.

Gene ontologies and InterPro domains

Molecular function, biological process and cellular component Gene Ontology (GO) (45) terms were retrieved from UniProt (37) and InterPro (46) protein domain signatures were sourced from the UniProt Archive of sequences. The available functional evidence for each orthologous group is summarized by listing the frequencies of associated GO terms and InterPro domains with concise attribute descriptions. Additionally, InterPro matches are displayed with domains ordered sequentially from the N- to C-terminus, describing the complete domain architecture of multi-domain genes, thereby allowing database queries with specific domain combinations. More than 85% of orthologs from each of the lineages are classified in groups that can be described by either GO terms or InterPro domains (Table 1).

Model organism phenotypes

OrthoDB gene annotations are enhanced with detailed functional data from well-studied model organisms in each lineage to highlight phenotypes associated with genes from Mus musculus, Drosophila melanogaster and Saccharomyces cerevisiae, sourced from the Mouse Genome Database (47), FlyBase (30) and Saccharomyces Genome Database (48), respectively. Eukaryotic model organism phenotypes now also include Danio rerio from the Zebrafish Model Organism Database (49) and Caenorhabditis elegans from WormBase (50). For bacteria, gene annotations are extended with phenotype data from EcoGene (51) for Escherichia coli genes and from the Database of Essential Genes (52) which covers 16 bacteria including E. coli, Haemophilus influenza and Mycobacterium tuberculosis (Table 1).

Online Mendelian inheritance in man

Human gene annotations are now enhanced with links to online Mendelian inheritance in man (OMIM®) (53), the catalog of associations between causative genes and human disease phenotypes, which describes thousands of allelic variants linked to numerous different disorders or susceptibilities. Mapping of human genes in OrthoDB to OMIM® records highlights known disease associations for almost 3000 genes (Table 1).

COMPUTED EVOLUTIONARY ANNOTATIONS

OrthoDB presents quantified orthologous group characteristics that describe evolutionary properties such as gene duplications or losses and rates of sequence divergence, these detail their evolutionary histories and provide a basis for the assessment of the confidence with which inferences on gene function may be made (Figure 1).

Phyletic profiles

Orthologous group phyletic profiles contrast the number of species with single-copy versus multi-copy orthologs and indicate the species coverage at the selected radiation point. The profiles thus highlight how descendant genes have been preserved across the phylogeny and whether gene duplications are widespread (‘multi-copy license’) or restricted (‘single-copy control’) as discussed in (21).

Evolutionary rates

The relative divergence among orthologous group member genes is quantified as the average of inter-species protein sequence identities normalized to the average identity of all inter-species BRHs. Appreciably higher or lower rates of divergence distinguish groups of orthologs with restrained or relaxed rates of protein sequence evolution, e.g. essential-gene-containing groups usually exhibit greater sequence conservation than those without.

Sibling groups

Homologous relations among genes from different orthologous groups at a given species radiation identify homologous or ‘sibling’ orthologous groups. These relations are quantified using data from all-against-all sequence comparisons by averaging over all pairs of homologs that link two orthologous groups with an e-value cutoff of 1e-3. This allows the user to retrieve sets of sibling orthologous groups that share significant sequence homology—which may therefore have some functional similarities—in an unbiased way that does not rely on protein domain or gene functional annotations.

Parent–child trees

Orthology delineation at each radiation along a given phylogeny hierarchically defines groups of orthologs with increasing resolution from the root level with the complete set of species to the most closely related species pairs. Parent–child relationships among orthologous groups delineated at each descendant radiation may therefore be defined by stepping along the phylogeny to identify orthologous groups with common subsets of genes (Figure 2). This new feature of OrthoDB represents these relationships as parent–child trees that illustrate the hierarchy of orthologous groups and their member genes, thereby building an inferred gene tree for a parent group by taking advantage of the greater resolution of its child groups. Users may view and edit the parent–child trees, as well as retrieve tree data formatted using Newick Utilities (54), from the ‘Display Tree’ window (Figure 1) that integrates the PhyloWidget (55) tool for the visualization and manipulation of phylogenetic tree data.

Figure 2.

Hierarchical parent–child trees.

Gene architectures

Evolutionary annotations now also feature summary tables of protein lengths (all lineages) and exon counts (meatazoan lineages) that detail quantified mean, median and standard deviation values for each orthologous group, effectively describing a ‘consensus’ gene architecture. Amino acid and exon counts are also listed for each member gene, flagging those that are significantly shorter or longer than the consensus as potentially inaccurate gene model predictions.

Syntenic orthologs

Comparing the chromosomal arrangements of orthologous genes among sets of species from the OrthoDB arthropod lineage identifies conserved blocks of syntenic orthologs. Such genes have maintained their local gene neighborhoods in the face of continual genomic evolution through sequence deletions, insertions and inversions, which may suggest selective advantages associated with their genomic arrangements, e.g. the TipE gene cluster of insect Para sodium channel auxiliary subunits (56). Ortholog-anchored synteny delineation (57) first identifies pairwise blocks with a minimum of two orthologs, allowing at most two intervening orthologs for each pair of genomes, and then successively projects these blocks through each pair of species across the phylogeny. The ‘OrthoBlock’ viewer (Figure 1) displays the best block—weighted according to the evolutionary span of the species and the number of orthologous groups in the block—selected from all the resulting blocks with at least five species for each orthologous group.

ORTHODB ONLINE

Selecting any species radiation point of interest from the interactive species trees, users can navigate through the hierarchy of orthologous groups defined at each radiation of the eukaryotic species phylogenies and for 11 major bacterial clades. At each orthology level, text searches return results from matches to various database identifiers and annotation keywords or phrases that can be combined through logical operator syntax to build more complex queries (e.g. [‘cytochrome c'-mitochondrial]) using Sphinx indexing technology (http://sphinxsearch.com/). In addition, database cross-referencing of gene identifiers enhances search term matches through available gene names and synonyms, InterPro, or GO identifiers, as well as secondary identifiers from UniProt, Entrez GeneID, RefSeq, Protein Data Bank, OMIM, PubMed and model organism databases. Copy-number profile searches retrieve groups matching specific user-defined or general pre-defined phyletic profiles by combining the criteria of absent, present, single-copy, multi-copy or no restriction, for each species within any selected clade BLAST (58) sequence similarity searches identify the best matches to genes from different species classified in OrthoDB, thereby allowing database querying with protein sequence data from any species. Importantly, although such sequence similarity searches with a single gene can recognize its homologs, accurate mapping to the defined orthologous groups requires assessment of the organism’s complete gene set (see ortholog mapping section below). Searches stored during each user’s web browser session provide a query history facility to allow recently executed queries to be reviewed, re-run or combined, e.g. a profile search for ‘single-copy in >90% of species’ could be combined with a text search with the GO identifier for ‘receptor activity’ to retrieve groups of mostly single-copy receptors. All search results may be easily exported as either Fasta-formatted files of protein sequences or tab-delimited text files of gene annotations, and the complete datasets are provided for download. All OrthoDB features are described in a comprehensive online help page and users may contact support@orthodb.org for additional information or specific requests, they may also subscribe to the low-traffic ‘orthodb-news’ mailing list (https://list.unige.ch/mailman/listinfo/orthodb-news) to keep abreast of the latest developments.

OrthoDB links

Search results present annotations for each orthologous group and tabulate all member genes with links to their respective sources e.g. Ensembl, UniProt, NCBI and FlyBase. Concise descriptors displayed for GO terms and InterPro domains are hyperlinked to their source records, and hyperlinks to OMIM and model organism databases provide direct access to all supporting data for genes with mapped phenotypes and synonyms. OrthoDB now provides FlyBase with orthology calls for the 12 Drosophila species as well as to selected arthropods and other animals. In addition, classified genes in OrthoDB are referenced with link-outs from UniProt records and NCBI gene link-outs.

Mapping of new species

Through a recently developed ortholog mapping procedure and corresponding web interfaces, OrthoDB now provides orthology classifications for genes from species with newly sequenced genomes mapped to existing orthologous groups. The mapping procedure first compares all genes from the new organism to all genes in OrthoDB groups, and then performs the BRH clustering procedure only allowing new genes to be added to existing clusters. The web interfaces list mapped genes and mirror OrthoDB data from the lineage(s) to which the new species is mapped. Thus, OrthoDB now provides online browsing of mapped orthologs for new species with publically available gene sets such as the Chinese softshell turtle, Pelodiscus sinensis, (from Ensembl Release 68) (Supplementary Figure S1). Portals with restricted access provide the same functionality for private gene sets from organisms with recently sequenced genomes. For example, mapping the initial gene annotations of the genome of the alfalfa leafcutting bee, Megachile rotundata, helped to assess their quality and completeness, as well as providing a user-friendly portal to identify orthologs from other insects (G. Robinson, personal communication).

BENCHMARKING SETS OF UNIVERSAL SINGLE-COPY ORTHOLOGS

The fast-growing number of sequenced genomes and transcriptomes vary substantially in their completeness of sequencing, quality of read assembly and accuracy of gene annotation. A complementary approach to technical statistics such as the widely used N50 measure of genome assemblies, is to gauge the quality by examining the coverage of an expected gene set. This approach can assess not only completeness of genome coverage and fragmentation of the assembly, but also misassembly of haplotypes when the marker genes are known to exist only in single-copy, as well as the accuracy of annotation of such genes. For this purpose—of quality assessment of genomic data—we compiled benchmarking sets of universal single-copy orthologs (abbreviated BUSCOs) identified using OrthoDB for the Metazoan, Vertebrate, Arthropod and Fungal lineages (respectively, named BUSCO-Me, -Ve, -Ar, -Fu). Although these sets are intentionally conservative, they comprehensively sample each lineage and select representative genes from orthologous groups with single-copy orthologs in at least 90% of the species. The BUSCOs are available for download as Fasta-formatted protein sequences with corresponding gene, species and orthologous group identifiers.

PERSPECTIVES

The current OrthoDB release demonstrates the scalability of our computational procedures for the ab initio analysis of several millions of genes within a reasonable timeframe, e.g. with a 150 CPU-core computer cluster the total all-against-all sequence comparisons took about 1 month and the subsequent clustering procedures required from 1 day for the arthropod set to 4 weeks for the largest bacteria dataset on a single machine using a multi-threaded algorithm. Nevertheless, its comprehensive application to all emerging data will become prohibitive in a few years due to the exponential scaling of genome sequencing as well as to the variable completeness and quality of new genome annotations. Thus, our approach will be to focus the complete clustering analyses on only a representative selection of the best annotated species and those that maximize phylogenetic coverage, corroborating the results with curated classifications. These will form a comprehensive set of well-annotated and trusted orthologies to which genes from the other genomes, e.g. the thousands of insects to be sequenced through the i5K initiative (59), and new transcriptomes, e.g. from the 1KITE project (http://www.1kite.org), can be mapped.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Table 1 and Supplementary Figure 1.

FUNDING

Swiss National Science Foundation [31003A-125350]; ‘Commission Informatique’ of the University of Geneva; and Schmidheiny Foundation. Funding for open access charge: Swiss Institute of Bioinformatics. Conflict of interest statement. None declared.

59 in total

1. FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors: Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal: PLoS One Date: 2010-03-10 Impact factor: 3.240

2. The genome portal of the Department of Energy Joint Genome Institute.

Authors: Igor V Grigoriev; Henrik Nordberg; Igor Shabalov; Andrea Aerts; Mike Cantor; David Goodstein; Alan Kuo; Simon Minovitsky; Roman Nikitin; Robin A Ohm; Robert Otillar; Alex Poliakov; Igor Ratnere; Robert Riley; Tatyana Smirnova; Daniel Rokhsar; Inna Dubchak
Journal: Nucleic Acids Res Date: 2011-11-22 Impact factor: 16.971

3. InterPro in 2011: new developments in the family and domain prediction database.

Authors: Sarah Hunter; Philip Jones; Alex Mitchell; Rolf Apweiler; Teresa K Attwood; Alex Bateman; Thomas Bernard; David Binns; Peer Bork; Sarah Burge; Edouard de Castro; Penny Coggill; Matthew Corbett; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Matthew Fraser; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Conor McMenamin; Huaiyu Mi; Prudence Mutowo-Muellenet; Nicola Mulder; Darren Natale; Christine Orengo; Sebastien Pesseat; Marco Punta; Antony F Quinn; Catherine Rivoire; Amaia Sangrador-Vegas; Jeremy D Selengut; Christian J A Sigrist; Maxim Scheremetjew; John Tate; Manjulapramila Thimmajanarthanan; Paul D Thomas; Cathy H Wu; Corin Yeats; Siew-Yit Yong
Journal: Nucleic Acids Res Date: 2011-11-16 Impact factor: 16.971

4. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges.

Authors: Sean Powell; Damian Szklarczyk; Kalliopi Trachana; Alexander Roth; Michael Kuhn; Jean Muller; Roland Arnold; Thomas Rattei; Ivica Letunic; Tobias Doerks; Lars J Jensen; Christian von Mering; Peer Bork
Journal: Nucleic Acids Res Date: 2011-11-16 Impact factor: 16.971

5. The Gene Ontology: enhancements for 2011.

Authors:
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

6. Ensembl 2012.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Denise Carvalho-Silva; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Laurent Gil; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas K Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Monika Komorowska; Gautier Koscielny; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Matthieu Muffato; Bert Overduin; Miguel Pignatelli; Bethan Pritchard; Harpreet Singh Riat; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Y Amy Tang; Kieron Taylor; Stephen Trevanion; Jana Vandrovcova; Simon White; Mark Wilson; Steven P Wilder; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Jennifer Harrow; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Giulietta Spudich; Jan Vogel; Andy Yates; Amonida Zadissa; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2011-11-15 Impact factor: 16.971

7. Reorganizing the protein space at the Universal Protein Resource (UniProt).

Authors:
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

8. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse.

Authors: Janan T Eppig; Judith A Blake; Carol J Bult; James A Kadin; Joel E Richardson
Journal: Nucleic Acids Res Date: 2011-11-10 Impact factor: 16.971

9. Saccharomyces Genome Database: the genomics resource of budding yeast.

Authors: J Michael Cherry; Eurie L Hong; Craig Amundsen; Rama Balakrishnan; Gail Binkley; Esther T Chan; Karen R Christie; Maria C Costanzo; Selina S Dwight; Stacia R Engel; Dianna G Fisk; Jodi E Hirschman; Benjamin C Hitz; Kalpana Karra; Cynthia J Krieger; Stuart R Miyasato; Rob S Nash; Julie Park; Marek S Skrzypek; Matt Simison; Shuai Weng; Edith D Wong
Journal: Nucleic Acids Res Date: 2011-11-21 Impact factor: 16.971

10. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium.

Authors: Huaiyu Mi; Qing Dong; Anushya Muruganujan; Pascale Gaudet; Suzanna Lewis; Paul D Thomas
Journal: Nucleic Acids Res Date: 2009-12-16 Impact factor: 16.971

163 in total

Review 1. Protein Bioinformatics Databases and Resources.

Authors: Chuming Chen; Hongzhan Huang; Cathy H Wu
Journal: Methods Mol Biol Date: 2017

2. Mitf is a master regulator of the v-ATPase, forming a control module for cellular homeostasis with v-ATPase and TORC1.

Authors: Tianyi Zhang; Qingxiang Zhou; Margret Helga Ogmundsdottir; Katrin Möller; Robert Siddaway; Lionel Larue; Michael Hsing; Sek Won Kong; Colin Ronald Goding; Arnar Palsson; Eirikur Steingrimsson; Francesca Pignoni
Journal: J Cell Sci Date: 2015-06-19 Impact factor: 5.285

3. Genetic and epigenetic architecture of sex-biased expression in the jewel wasps Nasonia vitripennis and giraulti.

Authors: Xu Wang; John H Werren; Andrew G Clark
Journal: Proc Natl Acad Sci U S A Date: 2015-06-22 Impact factor: 11.205

4. Phylogenomic resolution of scorpions reveals multilevel discordance with morphological phylogenetic signal.

Authors: Prashant P Sharma; Rosa Fernández; Lauren A Esposito; Edmundo González-Santillán; Lionel Monod
Journal: Proc Biol Sci Date: 2015-04-07 Impact factor: 5.349

5. Programmed Ribosomal Frameshifting Generates a Copper Transporter and a Copper Chaperone from the Same Gene.

Authors: Sezen Meydan; Dorota Klepacki; Subbulakshmi Karthikeyan; Tõnu Margus; Paul Thomas; John E Jones; Yousuf Khan; Joseph Briggs; Jonathan D Dinman; Nora Vázquez-Laslop; Alexander S Mankin
Journal: Mol Cell Date: 2017-01-19 Impact factor: 17.970

6. Simultaneous radiation of bird and mammal lice following the K-Pg boundary.

Authors: Kevin P Johnson; Nam-Phuong Nguyen; Andrew D Sweet; Bret M Boyd; Tandy Warnow; Julie M Allen
Journal: Biol Lett Date: 2018-05 Impact factor: 3.703

7. Computational Methods for Predicting Protein-Protein Interactions Using Various Protein Features.

Authors: Ziyun Ding; Daisuke Kihara
Journal: Curr Protoc Protein Sci Date: 2018-06-21

8. Whole transcriptome responses among females of the filariasis and arbovirus vector mosquito Culex pipiens implicate TGF-β signaling and chromatin modification as key drivers of diapause induction.

Authors: Paul V Hickner; Akio Mori; Erliang Zeng; John C Tan; David W Severson
Journal: Funct Integr Genomics Date: 2015-01-30 Impact factor: 3.410

9. Comparative genomics of 40 edible and medicinal mushrooms provide an insight into the evolution of lignocellulose decomposition mechanisms.

Authors: Qi An; Xue-Jun Wu; Yu-Cheng Dai
Journal: 3 Biotech Date: 2019-03-28 Impact factor: 2.406

10. CMT-associated mutations in glycyl- and tyrosyl-tRNA synthetases exhibit similar pattern of toxicity and share common genetic modifiers in Drosophila.

Authors: Biljana Ermanoska; William W Motley; Ricardo Leitão-Gonçalves; Bob Asselbergh; LaTasha H Lee; Peter De Rijk; Kristel Sleegers; Tinne Ooms; Tanja A Godenschwege; Vincent Timmerman; Kenneth H Fischbeck; Albena Jordanova
Journal: Neurobiol Dis Date: 2014-05-05 Impact factor: 5.996