Literature DB >> 26157507

GLAD: an Online Database of Gene List Annotation for Drosophila.

Yanhui Hu1, Aram Comjean1, Lizabeth A Perkins1, Norbert Perrimon2, Stephanie E Mohr1.   

Abstract

We present a resource of high quality lists of functionally related Drosophila genes, e.g. based on protein domains (kinases, transcription factors, etc.) or cellular function (e.g. autophagy, signal transduction). To establish these lists, we relied on different inputs, including curation from databases or the literature and mapping from other species. Moreover, as an added curation and quality control step, we asked experts in relevant fields to review many of the lists. The resource is available online for scientists to search and view, and is editable based on community input. Annotation of gene groups is an ongoing effort and scientific need will typically drive decisions regarding which gene lists to pursue. We anticipate that the number of lists will increase over time; that the composition of some lists will grow and/or change over time as new information becomes available; and that the lists will benefit the scientific community, e.g. at experimental design and data analysis stages. Based on this, we present an easily updatable online database, available at www.flyrnai.org/glad, at which gene group lists can be viewed, searched and downloaded.

Entities:  

Keywords:  Drosophila; GLAD; genes

Year:  2015        PMID: 26157507      PMCID: PMC4495321          DOI: 10.7150/jgen.12863

Source DB:  PubMed          Journal:  J Genomics


Introduction

The Drosophila genome was first published in 2000 1, 2 and so far five major updates to the genome assembly have been released (versions 2-6) 3, 4. Based on recent FlyBase release (FB2015_02, May 4, 2015), the Drosophila genome is thought to contain 17,622 annotated genes, of which 13,903 are protein-coding genes. There are many advantages to interrogate the full Drosophila genome, such as in genetic or RNAi screens. However, often it is more appropriate and/or more feasible to screen a sub-set of genes, such as due to limitations on time, availability (e.g. of reagents) and/or costs. Choosing an appropriate sub-set of genes is largely guided by scientific interest. In some cases, researchers build lists for functional studies based on other 'omics data, such as transcriptomics or proteomics data, prior to a functional genomics study. In other cases, genes are grouped based on common features such as biochemical functions (e.g. kinases) or biological processes. In either case, the quality and completeness of the library will impact results, as genes inadvertently left off a list will not be included in the study, and the presence of genes that do not belong on the list will needlessly use up resources and/or affect the analysis of the results. The availability of high quality annotated groups of related genes (hereafter, “gene groups”) allow scientists to quickly focus on relevant genes, such as in the context of functional genomics screens in tissue culture cells or in vivo. However, mechanisms for distribution and update of gene groups have been limited. At the time the Drosophila genome was published, efforts were made to compile several gene groups (see for example 5-8). However, given the number of changes to gene annotations since then, both in terms of defining genes and understanding their functions, updating and adding to existing lists becomes important. Over the past years, the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School (HMS) has put together several gene groups based on the needs of specific screening projects, as well as to support organization of reagent collections at the Transgenic RNAi Project (TRiP) at HMS. We have recognized over time a) that gene groups are of value to the community for applications additional to RNAi screens; b) that the gene groups benefit from careful curation and review by experts; and c) that the lists change over time, such that they would benefit from being available in an easily updateable database rather than as static lists. We expect gene groups to be of value to researchers at the study design stage, where the lists can help guide decisions regarding what genes are interrogated in a given screen or other assay, and at data analysis stages, such as by providing a supplement to existing groups used in gene set enrichment analyses.

Results & Discussion

Compilation and annotation strategy for gene groups

As mentioned above, several gene groups were annotated in conjunction with release of the Drosophila genome in 2000 (see for example 5-8). More recently, FlyBase has begun to associate a number of genes with gene groups. The first release of the FlyBase gene groups (FB2015_02, released May 4, 2015) includes 178 gene groups, with the number of genes in a group ranging from 1 to 168. As we had high-throughput functional genomics screening in mind, the approaches we took to defining, building and annotating gene groups draw on knowledge not available in 2000 and are complementary to the approaches taken by FlyBase. In general, our focus is on larger sets of genes and, given the goals of large-scale functional genomics, we tend to cast a broader net, applying less stringent cut-offs for inclusion in a gene group. So far, we have annotated 23 major gene groups with 29 sub-groups. For example, kinases are annotated as belonging to one of two subgroups: protein kinases and non-protein kinases, and the transcription factors (TFs), related proteins and other DNA-binding proteins are organized into four groups, DNA-binding with transcription factor activity; transcriptional co-factors; chromatin regulation; and possible TFs, which we assign to proteins predicted to be TF based only on low confidence data (Table 1). The number of genes in major gene groups ranges from 53 to 3,683. Currently, GO annotation is the major resource for identification of groups of genes relevant to a particular molecular function, biological process or sub-cellular localization. Several individual laboratories have also built databases in particular areas (e.g. GlycoFly 9 and FlyTF 10). In addition, a large amount of information exists in free text format in the literature. Although our strategy differed for each group, depending on available resources, in general we built the lists using one or more of the following approaches: a) mining of organized and digitalized information from existing annotation resources and databases including generic gene and protein annotation, e.g. gene ontology and UniProt, as well as specialized resources, e.g. transporterDB 11 or FlyTF 10, 12; b) mining of information in free text format from the literature; c) mining lists from relevant publications on Drosophila or other species; d) direct curation or review by experts (Table 1). The strategy used to build a Drosophila kinases gene group is outlined in Fig. 1. To help guide studies or analyses that use the gene groups, when possible we have assigned confidence scores that help separate high- and low-confidence associations of a gene with a given group. See Methods and Table 1 for additional details regarding annotation.
Table 1

Summary of approaches to gene group compilation, annotation and expert review.

Gene groupSub-groupSource
Autophagy-related*Mapped from literature 17
Chaperone and heat shock proteinsGO/UniProt annotation supplemented by human chaperon/HSP list from Dr. Susan Lindquist lab
CytoskeletalInteractive Fly, UniProt and publication 6, reviewed by Dr. Norbert Perrimon
GlycoproteinsGlycoFly database 9
GPCRs*GO annotation, in collaboration with Dr. Mathias Beller
Kinases*Non-protein KinaseNomenclature/GO/domain annotation supplemented with human KP list 23 and publications 22 7, reviewed by Dr. Richelle Sopko
Protein Kinase
Phosphatases*Non-protein Phosphatase
Protein Phosphatase
Major signaling pathwaysImd pathwayflyReactome, reviewed by Dr. Herve Agaisse
Toll pathwayflyReactome, reviewed by Dr. Herve Agaisse
Planar Cell Polarity pathwayflyReactome, reviewed by Dr. Jeff Axelrod
Circadian Clock pathwayflyReactome, reviewed by Dr. Phillip Karpowicz
EGFR and PVR RTK signaling pathwayManually assembled by Dr. Norbert Perrimon
FGFR signaling pathway
HEDGEHOG signaling pathway
HIPPO signaling pathway
INSULIN signaling pathway
JAK/STAT signaling pathway
NOTCH signaling pathway
TGF beta signaling pathway
TNF alpha signaling pathway
WNT signaling pathway
Nuclear hormone receptorSignaLink 24, reviewed by Dr. Henry Krause
MetabolicEnzymeKEGG,UniProt and mapped from publication 25
Other
MitochondrialGO/UniProt/Mito databases 26,27 and publications 18192028)
Nuclear-encoded oxidative phosphorylationMitoComp2 database 29
PeroxisomalPublication 30, UniProt/GO
ProteasomeGO annotation, reviewed by Dr. Jonathan Zirin
ReceptorsGO/UniProt/GPCR/mapped from Human Receptome 31
RibosomeGO annotation, reviewed by Dr. Ralph Neumuller
RNA-binding*GO/UniProt/InterPro, in collaboration with Dr. Bing Ye
Secreted proteinsUniProt annotation, reviewed by Dr. Norbert Perrimon
Serine proteasesUniProt annotation, reviewed by Dr. Norbert Perrimon
SpliceosomeGO/UniProt/Publication 32
Transcription factors, related proteins and other DNA-binding proteins*DNA-binding with transcription factor activityGO/domain annotation and FlyTF.org 10
Co-factor
Chromatin regulation
Maybe TF
Trans-membrane proteins*DRSC in collaboration with NYU supplemented by TMHMM prediction 33
TransportersATP-DependentTransporterDB 11
Ion Channels
Secondary Transporter
Unclassified
Ubiquitin-related*DRSC in collaboration with NYU

* Indicates availability of a corresponding DRSC RNAi library for cell-based screens.

Figure 1

Strategy for assembly of a As outlined, we incorporated information from several sources. See main text and Table 1 for relevant URLs and reference citations.

Features of the user interface

The gene group resource, which we call GLAD for gene list annotation for , is available online at www.flyrnai.org/glad. Users can choose a gene group of interest from a drop-down menu. At the results page (Fig. 2), information about how the list was built is indicated, along with detailed information regarding members of the group. The table of genes includes FlyBase gene identifiers, gene symbols, sub-group annotations and if available, a confidence score. Tables can be downloaded for off-line analysis. At the user interface, a link to UP-TORR 13 is provided so users can quickly identify corresponding cell-based or in vivo RNAi reagents from public resources. A form provides an opportunity for the research community at large to suggest changes or additions to a given list (see below).
Figure 2

User interface for the GLAD gene group resource (Features of the user interface include search or download of specific lists, one-click transfer of the list to UP-TORR for identification of corresponding RNAi reagents, and the option to provide feedback regarding a list (e.g. suggest new genes or relevant publications).

Feedback will improve the quality of the resource

Although we have made a concerted effort to evaluate all available resources and used best available methods for building each list, there remains room for improvement, in particular as new knowledge places new genes in a given group. To facilitate community updates, we welcome and encourage researchers to use the form at each gene group list to provide feedback and/or alert us to relevant publications. We will evaluate feedback and modify gene groups accordingly. Annotation of new gene groups is also an ongoing effort, and scientific interest will typically drive decisions regarding which gene groups to build next. We welcome feedback from the community regarding which groups not already covered by GLAD or FlyBase gene groups should be added. With community input as well as continued curation by bioinformatics experts, we anticipate that the GLAD resource will improve and expand over time, further increasing its value to the community. Examples of studies that used these gene groups include a primary cell-based screen of autophagy-related factors 14 and an in vivo screen of transcription factors 15.

Methods

Compilation of gene groups

Specific resources used to annotate the gene groups are shown in Table 1. Five groups listed below exemplify the range of approaches we took. 1) Most of the major signal transduction pathways were assembled manually by Dr. N. Perrimon. 2) The main source for the autophagy-related factors list was orthologs 16 of factors identified in a mammalian proteomics study 17. 3) The mitochondrial gene group was built by combining gene annotation, relevant databases (MitoDrome, MitoMiner), mapping of mammalian orthologs and experimental proteomics data 18-20. 4) The transcription factors and other DNA-binding factors list was built based on GO annotation and domain annotation. More specifically, the genes annotated with relevant GO terms such as “sequence-specific DNA binding transcription factor activity” and/or genes annotated with relevant domain at InterPro database. The list was then supplemented with the genes annotated at FlyTF database 10, 12. The sub-categories were assigned based on GO term and high confidence was assigned to TFs associated with experimental evidence at FlyTF database. 5) As outlined in Fig. 1, the kinase list was initially assembled based on genes annotated as having “kinase activity,” supplemented by genes annotated at InterPro 21 as containing a “kinase” domain, supplemented with information from two publications 7, 22, and further supplemented following mapping of Drosophila orthologs of human genes included on a list of kinases 23. The compiled list was reviewed by Dr. Richelle Sopko with specific expertise and interest in the field. Family assignment was applied according to Manning et al. 22.

User interface implementation

The GLAD website was created at the DRSC and hosted on web servers provided and maintained by the Harvard Medical School (HMS) Research Computing Group. The website was created in PHP using silex as a web framework. The PHP web code pulls the lists from tables from the mysql database of flyrnai and prepares data for display. HTML/javascript and jquery was used to create sortable output tables.
  33 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster.

Authors:  Boris Adryan; Sarah A Teichmann
Journal:  Bioinformatics       Date:  2006-04-13       Impact factor: 6.937

3.  Dynamic regulation of alternative splicing and chromatin structure in Drosophila gonads revealed by RNA-seq.

Authors:  Qiang Gan; Iouri Chepelev; Gang Wei; Lama Tarayrah; Kairong Cui; Keji Zhao; Xin Chen
Journal:  Cell Res       Date:  2010-05-04       Impact factor: 25.617

4.  Building a human kinase gene repository: bioinformatics, molecular cloning, and functional validation.

Authors:  Jaehong Park; Yanhui Hu; T V S Murthy; Fredrik Vannberg; Binghua Shen; Andreas Rolfs; Jessica E Hutti; Lewis C Cantley; Joshua Labaer; Ed Harlow; Leonardo Brizuela
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-31       Impact factor: 11.205

5.  An inventory of peroxisomal proteins and pathways in Drosophila melanogaster.

Authors:  Joseph E Faust; Avani Verma; Chengwei Peng; James A McNew
Journal:  Traffic       Date:  2012-07-25       Impact factor: 6.215

Review 6.  Evolution of protein kinase signaling from yeast to man.

Authors:  Gerard Manning; Gregory D Plowman; Tony Hunter; Sucha Sudarsanam
Journal:  Trends Biochem Sci       Date:  2002-10       Impact factor: 13.807

Review 7.  Drosophila melanogaster G protein-coupled receptors.

Authors:  T Brody; A Cravchik
Journal:  J Cell Biol       Date:  2000-07-24       Impact factor: 10.539

8.  SignaLink 2 - a signaling pathway resource with multi-layered regulatory networks.

Authors:  Dávid Fazekas; Mihály Koltai; Dénes Türei; Dezső Módos; Máté Pálfy; Zoltán Dúl; Lilian Zsákai; Máté Szalay-Bekő; Katalin Lenti; Illés J Farkas; Tibor Vellai; Péter Csermely; Tamás Korcsmáros
Journal:  BMC Syst Biol       Date:  2013-01-18

9.  The nuclear OXPHOS genes in insecta: a common evolutionary origin, a common cis-regulatory motif, a common destiny for gene duplicates.

Authors:  Damiano Porcelli; Paolo Barsanti; Graziano Pesole; Corrado Caggese
Journal:  BMC Evol Biol       Date:  2007-11-08       Impact factor: 3.260

10.  Quantitative evaluation of the mitochondrial proteomes of Drosophila melanogaster adapted to extreme oxygen conditions.

Authors:  Songyue Yin; Jin Xue; Haidan Sun; Bo Wen; Quanhui Wang; Guy Perkins; Huiwen W Zhao; Mark H Ellisman; Yu-hsin Hsiao; Liang Yin; Yingying Xie; Guixue Hou; Jin Zi; Liang Lin; Gabriel G Haddad; Dan Zhou; Siqi Liu
Journal:  PLoS One       Date:  2013-09-12       Impact factor: 3.240

View more
  24 in total

1.  The Transgenic RNAi Project at Harvard Medical School: Resources and Validation.

Authors:  Lizabeth A Perkins; Laura Holderbaum; Rong Tao; Yanhui Hu; Richelle Sopko; Kim McCall; Donghui Yang-Zhou; Ian Flockhart; Richard Binari; Hye-Seok Shim; Audrey Miller; Amy Housden; Marianna Foos; Sakara Randkelv; Colleen Kelley; Pema Namgyal; Christians Villalta; Lu-Ping Liu; Xia Jiang; Qiao Huan-Huan; Xia Wang; Asao Fujiyama; Atsushi Toyoda; Kathleen Ayers; Allison Blum; Benjamin Czech; Ralph Neumuller; Dong Yan; Amanda Cavallaro; Karen Hibbard; Don Hall; Lynn Cooley; Gregory J Hannon; Ruth Lehmann; Annette Parks; Stephanie E Mohr; Ryu Ueda; Shu Kondo; Jian-Quan Ni; Norbert Perrimon
Journal:  Genetics       Date:  2015-08-28       Impact factor: 4.562

2.  A cell atlas of the adult Drosophila midgut.

Authors:  Ruei-Jiun Hung; Yanhui Hu; Rory Kirchner; Yifang Liu; Chiwei Xu; Aram Comjean; Sudhir Gopal Tattikota; Fangge Li; Wei Song; Shannan Ho Sui; Norbert Perrimon
Journal:  Proc Natl Acad Sci U S A       Date:  2020-01-08       Impact factor: 11.205

3.  Tumor-Derived Ligands Trigger Tumor Growth and Host Wasting via Differential MEK Activation.

Authors:  Wei Song; Serkan Kir; Shangyu Hong; Yanhui Hu; Xiaohui Wang; Richard Binari; Hong-Wen Tang; Verena Chung; Alexander S Banks; Bruce Spiegelman; Norbert Perrimon
Journal:  Dev Cell       Date:  2019-01-10       Impact factor: 12.270

4.  Analysis of Single-Cell Transcriptome Data in Drosophila.

Authors:  Schayan Yousefian; Maria Jelena Musillo; Josephine Bageritz
Journal:  Methods Mol Biol       Date:  2022

5.  Proteomics of protein trafficking by in vivo tissue-specific labeling.

Authors:  Amanda S Meyer; Dan Wang; Namrata D Udeshi; Ilia A Droujinine; Yanhui Hu; David Rocco; Jill A McMahon; Rui Yang; JinJin Guo; Luye Mu; Dominique K Carey; Tanya Svinkina; Rebecca Zeng; Tess Branon; Areya Tabatabai; Justin A Bosch; John M Asara; Alice Y Ting; Steven A Carr; Andrew P McMahon; Norbert Perrimon
Journal:  Nat Commun       Date:  2021-04-22       Impact factor: 14.919

6.  FlyPhoneDB: an integrated web-based resource for cell-cell communication prediction in Drosophila.

Authors:  Yifang Liu; Joshua Shing Shun Li; Jonathan Rodiger; Aram Comjean; Helen Attrill; Giulia Antonazzo; Nicholas H Brown; Yanhui Hu; Norbert Perrimon
Journal:  Genetics       Date:  2022-03-03       Impact factor: 4.402

7.  FlyBase: establishing a Gene Group resource for Drosophila melanogaster.

Authors:  Helen Attrill; Kathleen Falls; Joshua L Goodman; Gillian H Millburn; Giulia Antonazzo; Alix J Rey; Steven J Marygold
Journal:  Nucleic Acids Res       Date:  2015-10-13       Impact factor: 16.971

8.  FlyRNAi.org-the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update.

Authors:  Yanhui Hu; Aram Comjean; Charles Roesel; Arunachalam Vinayagam; Ian Flockhart; Jonathan Zirin; Lizabeth Perkins; Norbert Perrimon; Stephanie E Mohr
Journal:  Nucleic Acids Res       Date:  2016-10-23       Impact factor: 16.971

9.  The Drosophila Gene Expression Tool (DGET) for expression analyses.

Authors:  Yanhui Hu; Aram Comjean; Norbert Perrimon; Stephanie E Mohr
Journal:  BMC Bioinformatics       Date:  2017-02-10       Impact factor: 3.169

10.  Oxidative stress induces stem cell proliferation via TRPA1/RyR-mediated Ca2+ signaling in the Drosophila midgut.

Authors:  Chiwei Xu; Junjie Luo; Li He; Craig Montell; Norbert Perrimon
Journal:  Elife       Date:  2017-05-31       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.