Literature DB >> 26578584

InsectBase: a resource for insect genomes and transcriptomes.

Chuanlin Yin1, Gengyu Shen2, Dianhao Guo1, Shuping Wang3, Xingzhou Ma4, Huamei Xiao5, Jinding Liu6, Zan Zhang7, Ying Liu8, Yiqun Zhang6, Kaixiang Yu4, Shuiqing Huang6, Fei Li9.   

Abstract

The genomes and transcriptomes of hundreds of insects have been sequenced. However, insect community lacks an integrated, up-to-date collection of insect gene data. Here, we introduce the first release of InsectBase, available online at http://www.insect-genome.com. The database encompasses 138 insect genomes, 116 insect transcriptomes, 61 insect gene sets, 36 gene families of 60 insects, 7544 miRNAs of 69 insects, 96,925 piRNAs of Drosophila melanogaster and Chilo suppressalis, 2439 lncRNA of Nilaparvata lugens, 22,536 pathways of 78 insects, 678,881 untranslated regions (UTR) of 84 insects and 160,905 coding sequences (CDS) of 70 insects. This release contains over 12 million sequences and provides search functionality, a BLAST server, GBrowse, insect pathway construction, a Facebook-like network for the insect community (iFacebook), and phylogenetic analysis of selected genes.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2015        PMID: 26578584      PMCID: PMC4702856          DOI: 10.1093/nar/gkv1204

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Insects are essential to maintain agricultural ecosystems, but they are also pests that damage >30% of agricultural, forestry and livestock production and cause billions in economic losses annually. Insects are vectors of many devastating diseases leading to the loss of numerous human lives. The availability of insect genomes and transcriptomes provides valuable resources for entomological research. However, the insect community lacks an integrated, up-to-date database of gene resources. Currently, the genomes of at least 138 insects have been sequenced and deposited in public databases such as the NCBI genome database (1), FlyBase (2), i5k Workspace@NAL (3), VectorBase (4), SilkDB (5), ButterflyBase (6), BeetleBase (7), MonarchBase (8), AphidBase (9), NasoniaBase, BeeBase and Ant Genomes Portal (10), Hessian Fly Base and Manduca Base (http://www.agripestbase.org), ChiloDB (11), DBM-DB (12), KAIKObase (13) and KONAGAbase (14). Since the cost of whole-genome sequencing has decreased dramatically, many genome-sequencing projects on insects have been initiated in recent years (Supplementary Figure S1). These projects were completed by small research groups with the technical assistance of sequencing companies. These research groups normally construct an organism-specific database for organizing and managing the genome data (8,11,12). Some organism-specific databases also contain the gene data of closely-related insect species (2,4,6,10). Though a copy of the genome Scaffolds, Contigs and Official Gene Sets (OGSs) should be submitted to the NCBI genome database because of publication requirements, it is not updated frequently and often lacks good annotation. The lack of availability of an integrated, up-to-date collection of insect genomes and transcriptomes has hampered entomological research. Here, we introduce InsectBase (http://www.insect-genome.com/), which is intended to meet the needs of the insect community, especially for studies on molecular biology, evolution, development, immunity, pest control and insecticide resistance. To the best of our knowledge, InsectBase collects almost all insect genomes and most insect transcriptomes from publicly available databases. Besides offering widely used Web-services such as a search tool, BLAST and GBrowse, InsectBase is also a platform for comparative genomics analysis on gene families, pathways and orthologs. Additionally, iFacebook is designed to provide a Facebook-like social network for the insect community by constructing relationships between researchers, genes and insect species.

MATERIALS AND METHODS

Data sources

InsectBase harvests insect gene data from tens of databases. We also developed software or pipelines to identify miRNA, piRNA, lncRNA, insect pathways, and orthologous groups from insect genomes and transcriptomes (Table 1).
Table 1.

Summary of the data content of InsectBase

CategorySpeciesSequences
Genome1381 090 915
Transcriptome795 140 642
EST2354 108 911
Pathway78352 700
Ortholog76811
Gene Family6039 105
miRNA697544
mir-family544637
piRNA296 925
LncRNA12439
Transposon22880
UTR84679 881
CDS74160 905
Insect genomes sequences were downloaded from the NCBI genome database (1), Ensembl, VectorBase (4), FlyBase (2), Hessian Fly Base (www.agripestbase.org), AphidBase (9), Ant Genomes Portal (10), BeeBase (10), NasoniaBase (10), SilkDB (5), ChiloDB (11), Heliconius Genome project (http://www.butterflygenome.org/), Manduca Base (www.agripestbase.org) and DBM-DB (12). Stick insect genomes were downloaded from http://nosil-lab.group.shef.ac.uk/ (15) (Supplementary Table S1). This yielded a collection of 138 insect genomes (Table 2, Supplementary Figure S2). Among these, the gene annotation files were obtained for 61 insect genomes (Supplementary Table S2). However, 31 insect gene sets lacked gene annotation information. Therefore, these gene sets were annotated by BLASTP against the Swiss-Prot database to get annotations.
Table 2.

The distribution of insect genome resource

DatabaseSpeciesGenome with Gene SetsURL
NCBI(GeneBank & Refseq)13442http://www.ncbi.nlm.nih.gov
Ensembl3131http://metazoa.ensembl.org
Flybase1212http://flybase.org/
i5kworkspace3535http://i5k.nal.usda.gov/
VectorBase4233https://www.VectorBase.org/
Hymenoptera Genome DatabaseHymenopteraMine1712http://hymenopteragenome.org/hymenopteramine
BeeBase11http://hymenopteragenome.org/beebase/
NasoniaBase11http://hymenopteragenome.org/nasonia
Ant Genomes Portal88http://hymenopteragenome.org/ant_genomes
AgripestbaseHessian Fly Base11http://agripestbase.org/hessianfly/
Manduca Base11http://agripestbase.org/manduca/
BeetleBase11http://beetlebase.org/
DBM-DB11http://www.iae.fafu.edu.cn/DBM/
KONAGAbase1http://dbm.dna.affrc.go.jp/px/
MonarchBase11http://monarchbase.umassmed.edu/
APHIDBASE11http://www.aphidbase.com/
SilkDB11http://www.silkdb.org/silkdb/
KAIKObase1http://sgp.dna.affrc.go.jp/KAIKObase
Heliconius Genome Project11http://butterflygenome.org/
ChiloDB11http://ento.njau.edu.cn/ChiloDB
InsectBase13861http://www.insect-genome.com
Protein information of 61 insects were obtained by InterProScan analysis (16), including Coils, Gene3D, Hamap, Pfam, PIRSF, PRINTS, ProDom, ProSitePatterns, ProSiteProfiles, SMART, SUPERFAMILY and TIGRFA. Insect transcriptomes were assembled using Trinity with default parameters (17) and then annotated by BLASTX against the NCBI nr database (1) (Supplementary Table S3). The raw reads of 46 insect transcriptomes were downloaded from the NCBI SRA database (18). Seventy assembled insect transcriptomes were obtained from the NCBI TSA database (1) (Supplementary Figure S3). Insect pathways information were obtained by analyzing transcriptome data of 78 insects using KAAS (19) and iPathCons (20). Expressed sequence tags (ESTs) of 235 insects were downloaded from the NCBI EST database (1). Insect orthologous were obtained by we analyzing the official gene sets (OGS) of seven insects with the software orthoMCL (21), including Bombyx mori, Danaus plexippus, Linepithema humile, Nasonia vitripennis, Tribolium castaneum, Aedes aegypti, and Pediculus humanus. This produced 973 1:1:1 ortholog groups. Insect miRNA sequences were downloaded from the miRBase (22) and were also collected from the supplemental files of published references because many miRNAs were not submitted to miRBase. The conserved miRNAs of 54 insects were obtained by homology searching with RNA-seq data. After removing the redundancy, the miRNAs of 69 insects were stored in InsectBase (Supplementary Table S4). For piRNA, 987 piRNAs of D. melanogaster were downloaded from the NCBI GenBank database (GI: 157361675–157362817) and 13 299 Drosophila piRNAs were from the NCBI Gene Expression Omnibus with the accession number GSE9138 (1). The piRNAs of Chilo suppressalis were obtained by Piano prediction (23). For long noncoding RNA (lncRNA), we developed a pipeline to find lncRNAs from 12 transcriptomes of N. lugens.

Transposons

1572 Drosophila transposons were downloaded from the Berkeley Drosophila Genome Project (http://www.fruitfly.org/p_disrupt/TE.html) (24) and 1308 silkworm transposons were obtained from the BmTEdb database (25).

Coding sequences (CDS) and untranslated regions (UTR)

We downloaded UTR sequences of ten insects from the UTRBase (26), including Acyrthosiphon pisum, Aedes aegypti, Anopheles gambiae, Apis mellifera, Bombyx mori, Culex quinquefasciatus, D. melanogaster, Nasonia vitripennis, Pediculus humanus, Tribolium castaneum. A pipeline was developed to predict CDS and UTR from transcriptome data, producing CDS and UTR sequences of 74 insects (Supplementary Table S5).

RESULTS

Structure of InsectBase

InsectBase provides Web services such as a search tool, BLAST, visualization with GBrowse, insect pathway construction and phylogenetic analysis. The gene information for widely-studied gene families, noncoding RNA (ncRNA), transposons, UTRs and CDSs are collected and presented in the database. Genome annotation tools optimized for insects are incorporated into InsectBase. InsectBase also includes iFacebook, a Web-based construction for gene–researcher–species networking (Figure 1).
Figure 1.

The structure of InsectBase. It provides Search, Blast, GBrowse, iFacebook, Ortholog, Gene family, iPathway and insect gene information.

The structure of InsectBase. It provides Search, Blast, GBrowse, iFacebook, Ortholog, Gene family, iPathway and insect gene information.

Web services

The search tool can be used to find interesting information from genes, transcriptomes, pathways, gene families, and orthologs using either a gene ID or a gene name. Besides gene sequences, users can obtain related information for a gene. When a specific gene is found using a gene ID, the Swiss-Prot annotation, super family, Gene3D, Pfam, SMART, ProSite Profiles and Coils are provided. The ‘Advanced’ option enables users to select one or multiple species when searching. BLAST is provided using the Web-based BLAST server 2.2.28+ (27). The data used for nucleotide BLAST (BLASTN, TBLASTN) searches contains 138 insect genomes, 61 insect OGSs and 116 insect transcriptomes. The protein data used for amino acid BLAST (BLASTP, TBLASTX, BLASTX) searches contains the 61 insect protein sequences. In the BLAST results webpage, the top five BLAST hits are presented (users can find all BLAST results by clicking ‘more’ under the table). InsectBase ‘guesses’ the gene from the BLAST results and recommends ‘Your interested Gene set’ by presenting Swiss-Prot annotation, KEGG, super family, Gene3D, Pfam, PRINTS and ProSitePatterns information for the gene. According to the BLAST results, InsectBase also recommends related researchers (‘You might be interested in these researchers’) and references (‘You might be interested in these references’). GBrowse provides visualization of 58 insect genomes (28). The GBrowse tracks are customized according to the genome annotation information of various species. Three basic tracks, mRNA, CDS and exon, are provided for all insects but more tracks are provided for those insects with more annotations. For example, A. gambiae has 10 more tracks including tRNA, miRNA, snRNA, snoRNA, tRNA_pseudogene, rRNA, misc_RNA, RNase_p_RNA, pseudogene, and SRP_RNA. C. suppressalis has seven more tracks including piRNA, miRNA, repeat_sequence, homologs in silkDB, exon structures in SilkDB, Homologs in FlyBase, exon structures in FlyBase, and similar polypeptides in ChiloDB. Phylogenetic analysis enables users to construct an evolutionary tree with gene sequences of interest (Figure 2). The evolutionary tree is constructed with ClustalW2 (29) using the neighbor-joining clustering method and bootstrap value of 500. The tree is displayed with Newick Utilities 1.6 (30). The phylogenetic analysis function is incorporated into the search results, BLAST results, ortholog, and gene family webpages.
Figure 2.

The server of phylogenetic analysis of selected genes were provided.

The server of phylogenetic analysis of selected genes were provided. Insect pathway construction is indispensable for gene function analysis. InsectBase incorporates a Web-service, iPathCons, for knowledge-based construction of pathways from the transcriptomes or OGSs of genomes (20). A voting system is used for Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology assignment. The pathways of 52 insects are used as templates. Users can select one or multiple templates to map the queried sequences to known pathways based on their requirements. When the number of sequences is less than 10, the results are displayed directly. If there are more than 10 sequences, a URL link to the iPathCons results will be sent to the user's e-mail.

iFacebook: Facebook for the insect community

A major roadblock in the insect field is the lack of efficient communication. iFacebook is intended to construct a gene–researcher–species network for entomologists (Figure 3). In total, 94,758 references of 143 insects were downloaded from the PubMed database (1). The gene names, researchers and species were extracted from these references. A gene–researcher–species network has been constructed and deposited in InsectBase. Users can find the collaborators, studied species and publications of a researcher. For a queried gene/species, InsectBase returns a result page with researchers who studied this gene/species and their publications. In this way, the first group of gene–researcher, researcher–researcher, and species–researcher relationships is constructed. InsectBase is still at primary status and we encourage the users to submit their information to improve iFacebook.
Figure 3.

iFacebook provides the researcher-researcher (A) and gene-researcher (B) network.

iFacebook provides the researcher-researcher (A) and gene-researcher (B) network.

Insect gene information

Considering the research hotspot in the insect field, data mining of important gene information from all insect genomes, genes, transcriptomes and ESTs was carried out. Reference mining of important genes was also conducted. These efforts produced a batch of insect gene collections that should be of great interest to entomologists. These genes include 22 536 pathways from 78 insects, 96 925 piRNAs from D. melanogaster and C. suppressalis, 2880 transposons from D. melanogaster and B. mori, 7544 miRNAs from 69 insects, 118 miRNA families from 54 insects, 2439 long noncoding RNAs (lncRNA) of N. lugens, 973 orthologs groups of seven insects, 36 protein-coding gene families from 60 species, 679 881 UTRs from 84 insects and 160,905 CDSs from 74 insects. All these gene sequences are available for download and search in InsectBase.

Tools, software and insect databases

Several tools that can be used for insect genome annotation or RNA-seq analysis are collected and provided in InsectBase. Optimized Make-based Insect Genome Annotation (OMIGA) can be used to annotate insect genomes (31). iPathCons is a tool for constructing insect pathways (20). Triplet-SVM is a support vector machine (SVM)-based de novo classifier for miRNA identification (32). Piano is a SVM-based tool for piRNA annotation (23). We developed these tools and intend to provide additional Web-based services for entomologists in the future. In the links webpage, 180 software tools are collected and presented, including genome assemblers, gene predictors, genome browsers, multiple sequence aligners, sequence analyzers, and structure modeling modules. The URL links to 18 insect genome databases are given, which enables the user to visit other useful insect gene databases.

The features of InsectBase

InsectBase is intended to provide related information for a queried gene to users, including sequences, gene features, domains, homologs, gene family, researchers, and references, etc. This ‘data consumer’-oriented function enables entomologists to get an overview of a queried gene or a gene family. It should be greatly helpful for designing experiments or even finding potential collaborators. At present, 18 insect genome databases have been constructed and reported. Most of these databases contain one or several closely related species with good annotations. InsectBase, to the best of our knowledge, has an almost complete list of insect genomes and genes. More than 100 Gb of insect gene data have been collected and deposited in InsectBase. We also provided some users-oriented software and Web servers, which could significantly facilitate the gene analysis for insect molecular biologists. In comparison, the i5k Workspace@NAL has a wide range of arthropod genomes (3), which are mainly collected from species in the Baylor College of Medicine's i5k pilot project. At present, 42 arthropod species are included. The major goal of the i5k Workspace@NAL is to provide an efficient and well-organized platform for genome assembly, annotation, and RNA-seq mapping in a new genome-sequencing project. Data consumers can easily see the progress of the annotated genome and access the data. Therefore, the i5k Workspace@NAL is ‘data producer’ or ‘genome-sequencer’ -oriented whereas InsectBase is ‘data consumer’-oriented.

Submitting data to InsectBase

We strongly encourage researchers to submit genome and transcriptome sequences. Technical assistance will be provided for uploading or handling of sequences.

Implementation

InsectBase was developed on an Apache HTTP server in a Linux (Redhat 6.5) operating system, and the database was deployed on PostgreSQL. The webpages were written using PHP, HTML language, Bootstrap, Cascading Style Sheets (CSS), the JavaScript (JS) framework and Layer JS. Perl scripts were used to make the database user-friendly with a good interaction interface. The Apache server handles queries from Web clients through PHP scripts to perform searches. Chado, a relational database schema that has been designed to handle complex representations of biological knowledge, is used to store the data (33). The generic Genome Browser (GBrowse 2.0) package, a component of the Generic Model Organism Project, was used for genome visualization (28). This tool allows researchers to obtain gene structure information. A local Basic Local Alignment Search Tool (BLAST 2.2.28+) server has been installed in the InsectBase system (27). ClustalW 2.1 has also been installed in the database for multiple alignment of nucleic acid and protein sequences (29).

FUTURE PERSPECTIVES

InsectBase is intended to provide a platform for researchers interested in analyzing insect gene data. We anticipate development in two directions: experiment-oriented and integration. First, we wish to integrate all related information for each gene. When a queried gene is searched or BLASTed, InsectBase will be designed to provide information to answer questions such as: what is this gene? Which gene family does this gene belong to? Which pathway does this gene participate in? How many homologs of this gene are there? What is the evolutionary tree of this gene and its homologs? Who has studied this gene and in what species? What are the expression patterns of this gene or its orthologs? What is the RNAi phenotype of this gene or its orthologs? We will present the results in a webpage for users. Second, we will assign all insect genes a unique InsectBase ID, which will facilitate the use and management of rapidly accumulated insect genes. Third, we will harvest all newly published insect genomes and transcriptomes and keep the database up-to-date. Fourthly, InsectBase will pay special attentions to the pathways and ncRNA in insects. Pathway construction and ncRNA annotation will be improved.

AVAILABILITY

All data in InsectBase are available for download. InsectBase can be accessed at http://www.insect-genome.com.
  33 in total

1.  AphidBase: a centralized bioinformatic resource for annotation of the pea aphid genome.

Authors:  F Legeai; S Shigenobu; J-P Gauthier; J Colbourne; C Rispe; O Collin; S Richards; A C C Wilson; T Murphy; D Tagu
Journal:  Insect Mol Biol       Date:  2010-03       Impact factor: 3.585

2.  Stick insect genomes reveal natural selection's role in parallel speciation.

Authors:  Víctor Soria-Carrasco; Zachariah Gompert; Aaron A Comeault; Timothy E Farkas; Thomas L Parchman; J Spencer Johnston; C Alex Buerkle; Jeffrey L Feder; Jens Bast; Tanja Schwander; Scott P Egan; Bernard J Crespi; Patrik Nosil
Journal:  Science       Date:  2014-05-15       Impact factor: 47.728

3.  Hymenoptera Genome Database: integrated community resources for insect species of the order Hymenoptera.

Authors:  Monica C Munoz-Torres; Justin T Reese; Christopher P Childers; Anna K Bennett; Jaideep P Sundaram; Kevin L Childs; Juan M Anzola; Natalia Milshina; Christine G Elsik
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

4.  Chado controller: advanced annotation management with a community annotation system.

Authors:  Valentin Guignon; Gaëtan Droc; Michael Alaux; Franc-Christophe Baurens; Olivier Garsmeur; Claire Poiron; Tim Carver; Mathieu Rouard; Stéphanie Bocs
Journal:  Bioinformatics       Date:  2012-01-28       Impact factor: 6.937

5.  VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases.

Authors:  Gloria I Giraldo-Calderón; Scott J Emrich; Robert M MacCallum; Gareth Maslen; Emmanuel Dialynas; Pantelis Topalis; Nicholas Ho; Sandra Gesing; Gregory Madey; Frank H Collins; Daniel Lawson
Journal:  Nucleic Acids Res       Date:  2014-12-15       Impact factor: 16.971

6.  iPathCons and iPathDB: an improved insect pathway construction tool and the database.

Authors:  Zan Zhang; Chuanlin Yin; Ying Liu; Wencai Jie; Wenjie Lei; Fei Li
Journal:  Database (Oxford)       Date:  2014-11-11       Impact factor: 3.451

7.  Using GBrowse 2.0 to visualize and share next-generation sequence data.

Authors:  Lincoln D Stein
Journal:  Brief Bioinform       Date:  2013-02-01       Impact factor: 11.622

8.  ButterflyBase: a platform for lepidopteran genomics.

Authors:  Alexie Papanicolaou; Steffi Gebauer-Jung; Mark L Blaxter; W Owen McMillan; Chris D Jiggins
Journal:  Nucleic Acids Res       Date:  2007-10-12       Impact factor: 16.971

9.  miRBase: annotating high confidence microRNAs using deep sequencing data.

Authors:  Ana Kozomara; Sam Griffiths-Jones
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

10.  ChiloDB: a genomic and transcriptome database for an important rice insect pest Chilo suppressalis.

Authors:  Chuanlin Yin; Ying Liu; Jinding Liu; Huamei Xiao; Shuiqing Huang; Yongjun Lin; Zhaojun Han; Fei Li
Journal:  Database (Oxford)       Date:  2014-07-04       Impact factor: 3.451

View more
  45 in total

1.  The evolution of insect metallothioneins.

Authors:  Mei Luo; Cédric Finet; Haosu Cong; Hong-Yi Wei; Henry Chung
Journal:  Proc Biol Sci       Date:  2020-10-28       Impact factor: 5.349

2.  microRNA-14 as an efficient suppressor to switch off ecdysone production after ecdysis in insects.

Authors:  Kang He; Huamei Xiao; Yang Sun; Gongming Situ; Yu Xi; Fei Li
Journal:  RNA Biol       Date:  2019-06-23       Impact factor: 4.652

Review 3.  Perspectives on Gene Regulatory Network Evolution.

Authors:  Marc S Halfon
Journal:  Trends Genet       Date:  2017-05-18       Impact factor: 11.639

4.  Evolution of salivary glue genes in Drosophila species.

Authors:  Jean-Luc Da Lage; Gregg W C Thomas; Magalie Bonneau; Virginie Courtier-Orgogozo
Journal:  BMC Evol Biol       Date:  2019-01-29       Impact factor: 3.260

5.  Double-Strand RNA (dsRNA) Delivery Methods in Insects: Diaphorina citri.

Authors:  Yulica Santos-Ortega; Alex Flynt
Journal:  Methods Mol Biol       Date:  2022

6.  The relevance of studying insect-nematode interactions for human disease.

Authors:  Zorada Swart; Tuan A Duong; Brenda D Wingfield; Alisa Postma; Bernard Slippers
Journal:  Pathog Glob Health       Date:  2021-11-02       Impact factor: 3.735

7.  APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.

Authors:  Metin Balaban; Shahab Sarmashghi; Siavash Mirarab
Journal:  Syst Biol       Date:  2020-05-01       Impact factor: 15.683

8.  Divergent Gene Expression Following Duplication of Meiotic Genes in the Stick Insect Clitarchus hookeri.

Authors:  Chen Wu; Victoria G Twort; Richard D Newcomb; Thomas R Buckley
Journal:  Genome Biol Evol       Date:  2021-05-07       Impact factor: 3.416

Review 9.  Vector-Borne Bacterial Plant Pathogens: Interactions with Hemipteran Insects and Plants.

Authors:  Laura M Perilla-Henao; Clare L Casteel
Journal:  Front Plant Sci       Date:  2016-08-09       Impact factor: 5.753

10.  Folded gastrulation and T48 drive the evolution of coordinated mesoderm internalization in flies.

Authors:  Silvia Urbansky; Paula González Avalos; Maike Wosch; Steffen Lemke
Journal:  Elife       Date:  2016-09-29       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.