Literature DB >> 17145709

VectorBase: a home for invertebrate vectors of human pathogens.

Daniel Lawson1, Peter Arensburger, Peter Atkinson, Nora J Besansky, Robert V Bruggner, Ryan Butler, Kathryn S Campbell, George K Christophides, Scott Christley, Emmanuel Dialynas, David Emmert, Martin Hammond, Catherine A Hill, Ryan C Kennedy, Neil F Lobo, M Robert MacCallum, Greg Madey, Karine Megy, Seth Redmond, Susan Russo, David W Severson, Eric O Stinson, Pantelis Topalis, Evgeny M Zdobnov, Ewan Birney, William M Gelbart, Fotis C Kafatos, Christos Louis, Frank H Collins.   

Abstract

VectorBase (http://www.vectorbase.org/) is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for two organisms: Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever.

Entities:  

Mesh:

Year:  2006        PMID: 17145709      PMCID: PMC1751530          DOI: 10.1093/nar/gkl960

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Even before the completion of the human genome a number of laboratories initiated projects to sequence the genomes of important human pathogens: Plasmodium, Trypanosome and Leishmania species (1–3). The aim of these projects was to better understand the biology of the pathogen through its genome, with the goal of identifying new therapeutics and thus shorten the time from therapeutic lead to marketable product, a notoriously slow process. A more holistic approach to improving our understanding of these pathogens needs to include intermediary vectors where they exist. Over the past few years the cost of genome sequencing has fallen dramatically making it feasible to sequence the genomes of vectors and complete our knowledge of the triumvirate of species involved in many parasitic diseases. VectorBase is funded by the National Institute of Allergy and Infectious disease (NIAID) as part of a group of Bioinformatics Resource Centres (BRCs) () aiming to provide web-based resources to the scientific community for organisms considered to be causing or transmitting emerging or re-emerging infectious disease. Parallel to this, NIAID has funded a number of genome projects of important vector species that are destined to be housed within the VectorBase system (Table 1).
Table 1

List of vector species scheduled for inclusion into VectorBase

VectorDiseaseStatus
Aedes aegyptiaYellow and Dengue feverComplete
Anopheles gambiae PESTMalariaComplete
A.gambiae M formbMalariaInitiated
A.gambiae S formbMalariaInitiated
Culex pipiens quinquefasciatusaLymphatic filariasisAssembly
Glossina morsitanscSleeping sicknessInitiated
Ixodes scapularisaLyme diseaseSequencing
Lutzomyia longipalpisdLeishmaniasisPlanned
Pediculus humanusbTyphusInitiated
Phlebotomus papatasidLeishmaniasisPlanned
Rhodnius prolixusbChagas diseaseInitiated

aFunded by NIAID.

bFunded by NHGRI.

cFunded by Wellcome Trust.

dFunded by NHGRI and Wellcome Trust.

List of vector species scheduled for inclusion into VectorBase aFunded by NIAID. bFunded by NHGRI. cFunded by Wellcome Trust. dFunded by NHGRI and Wellcome Trust. VectorBase is involved in all the stages of genome analysis: first-pass annotation of new genome sequences in collaboration with the sequencers, re-annotation of existing genome sequences and submission of these data sets to the public nucleotide databanks. VectorBase acts as the repository for the genome and predicted gene set providing web access for browsing and data mining capability. VectorBase participates in teaching workshops (supporters include WHO-TDR, MR4, EMBO and BioMalPar) and has undertaken ‘hands-on’ demonstrations at international meetings. VectorBase strives to improve the accuracy and scope of the annotations, expanding controlled vocabularies for the vectors and incorporating new data types (expression, population and variation data).

RESULTS

Data storage

VectorBase uses the Genome Model Organism Database (GMOD) construction set for the storage of genome sequence and annotations. The GMOD CHADO schema facilitates the rapid incorporation of diverse data types, e.g. literature, controlled vocabularies and good inter-operability with the manual annotation effort, using Apollo, as well as the Ensembl system. For web display and access to the genome data and annotation, VectorBase utilises the Ensembl database schema, API and web code (4).

Genome browsing and data mining

The Ensembl system provides a good model for handling genomic data from a number of species in a consistent and unified manner and a highly sophisticated set of interlinked web pages. Entry points into the genome are through text searches of gene names, symbols and descriptions, pairwise similarity searches and from cross-references in the public nucleotide and protein databanks. The VectorBase website contains standard Ensembl style gene and transcript pages. Gene pages contain information about the prediction including gene orthologue and protein features (signal peptides, trans-membrane domains, InterPro domains). Gene Ontology (GO) codes and Enzyme Classification (EC) numbers are assigned where possible. Batch file downloads are available for both the raw sequence data (fasta files of genomic sequence, including repeat masked sequence and ESTs) and the annotation (GFF3 files or a MySQL dump for use with the Ensembl API). Batch searching capabilities are handled by the powerful data mining tool BioMart (5) and through two spreadsheets, AnoXcel and AegyXcel (6) that contain gene based information about the presence of signal peptides, trans-membrane domains, protein domains and the best similarity with yeast, Drosophila and human.

Annotation

The two genomes currently available through VectorBase are good examples of the multiple roles undertaken by the group.

Anopheles gambiae annotation

The A.gambiae PEST genome was published in 2002 (7) with a genome size of 260 Mb and ∼14 000 genes. The predicted gene set has been reviewed and updated several times since publication. This process involves a blend of automated evidence-based gene prediction (8) and manual approaches. Manual appraisal of gene models is firstly targeted to regions of interest from the community and regions for which we are aware that automated approaches fail. A manual re-annotation of chromosome arm 2L is finished. Manually appraised gene models are highlighted for the user as a separate track on the genome browser.

Aedes aegypti annotation

The A.aegypti Liverpool strain was sequenced and assembled by The Institute of Genomic Research (TIGR) and the Broad Institute. Aedes has a large genome size of 1.3 Gb and a predicted gene complement of ∼16 000 genes. This represents the first-pass annotation of the genome using automated approaches. Improvements in quality will be achieved by manual efforts and enhancements to the evidence-based automated gene predictions.

Sequence comparisons between Anopheles and Aedes

The presence of two related mosquito genomes in VectorBase allows for comparative analysis to identify conserved regions between the genomes. This can be useful in verifying, and correcting, gene models and for studying gene family expansions. Translated BLAT (9) similarities between the two mosquito species and with Drosophila have been identified. Figure 1 shows a view of an orthologous locus between the two mosquito genomes and highlights the expanded intron size in A.aegypti and related higher repeat content.
Figure 1

Comparative display of Aedes aegypti gene AAEL003853 (upper panel) with the Anopheles gambiae orthologue (lower panel). Green lines join blocks of similarity between the two genomes and highlights the expansion of intron size due to an increased frequency of repeat sequences.

Comparative display of Aedes aegypti gene AAEL003853 (upper panel) with the Anopheles gambiae orthologue (lower panel). Green lines join blocks of similarity between the two genomes and highlights the expansion of intron size due to an increased frequency of repeat sequences.

Use of Distributed Annotation System (DAS) in VectorBase

The DAS protocol (10) allows community researchers to integrate and display their data sets in the genome browser window. This is especially powerful with alternative sets of gene predictions. As an example, the Anopheles browser contains DAS tracks for alternate EST based gene predictions from AnoEST (11), an independent re-annotation effort by Li et al. (12) and predictions based on mass spectrometry data (13).

Microarray data

Microarray data exists for both Anopheles and Aedes and array probes from both species are mapped to the genome. These alignments are displayed in the browser and queries can be made against these via BioMart. Concise expression summaries for probes and genes are made available as experiments are published.

FUTURE DIRECTIONS

At least four new arthropod vector genomes will soon be incorporated into VectorBase: the mosquito Culex pipiens quinquefasciatus, the tick Ixodes scapularis, the kissing bug Rhodnius prolixus and the human body louse, Pediculus humanus (see Table 1 for more details). Furthermore, the genomes of two molecular forms of A.gambiae (the S and M forms which are considered to be incipient or possibly distinct species) will soon be completed and integrated into VectorBase. We will continue to re-annotate the existing mosquito genomes to improve gene prediction drawing more on manual/community annotation and comparative analysis with additional arthropod genomes. The increased importance of manual/community annotation is being addressed by the development of a CHADO-based database for tracking internal VectorBase manual annotation and submissions from the community. Other material of interest to the vector community is being incorporated, including the newly developed controlled vocabulary of mosquito anatomy () and other vector-related ontologies. VectorBase is an ongoing project and the scope and usability of the site are improving rapidly. The coming year will see a significant expansion in the number of vector genomes housed.
  13 in total

1.  Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms.

Authors:  Jun Li; Michelle M Riehle; Yan Zhang; Jiannong Xu; Frederick Oduol; Shawn M Gomez; Karin Eiglmeier; Beatrix M Ueberheide; Jeffrey Shabanowitz; Donald F Hunt; José M C Ribeiro; Kenneth D Vernick
Journal:  Genome Biol       Date:  2006-03-27       Impact factor: 13.583

2.  AnoEST: toward A. gambiae functional genomics.

Authors:  Evgenia V Kriventseva; Anastasios C Koutsos; Claudia Blass; Fotis C Kafatos; George K Christophides; Evgeny M Zdobnov
Journal:  Genome Res       Date:  2005-05-17       Impact factor: 9.043

3.  The genome of the kinetoplastid parasite, Leishmania major.

Authors:  Alasdair C Ivens; Christopher S Peacock; Elizabeth A Worthey; Lee Murphy; Gautam Aggarwal; Matthew Berriman; Ellen Sisk; Marie-Adele Rajandream; Ellen Adlem; Rita Aert; Atashi Anupama; Zina Apostolou; Philip Attipoe; Nathalie Bason; Christopher Bauser; Alfred Beck; Stephen M Beverley; Gabriella Bianchettin; Katja Borzym; Gordana Bothe; Carlo V Bruschi; Matt Collins; Eithon Cadag; Laura Ciarloni; Christine Clayton; Richard M R Coulson; Ann Cronin; Angela K Cruz; Robert M Davies; Javier De Gaudenzi; Deborah E Dobson; Andreas Duesterhoeft; Gholam Fazelina; Nigel Fosker; Alberto Carlos Frasch; Audrey Fraser; Monika Fuchs; Claudia Gabel; Arlette Goble; André Goffeau; David Harris; Christiane Hertz-Fowler; Helmut Hilbert; David Horn; Yiting Huang; Sven Klages; Andrew Knights; Michael Kube; Natasha Larke; Lyudmila Litvin; Angela Lord; Tin Louie; Marco Marra; David Masuy; Keith Matthews; Shulamit Michaeli; Jeremy C Mottram; Silke Müller-Auer; Heather Munden; Siri Nelson; Halina Norbertczak; Karen Oliver; Susan O'neil; Martin Pentony; Thomas M Pohl; Claire Price; Bénédicte Purnelle; Michael A Quail; Ester Rabbinowitsch; Richard Reinhardt; Michael Rieger; Joel Rinta; Johan Robben; Laura Robertson; Jeronimo C Ruiz; Simon Rutter; David Saunders; Melanie Schäfer; Jacquie Schein; David C Schwartz; Kathy Seeger; Amber Seyler; Sarah Sharp; Heesun Shin; Dhileep Sivam; Rob Squares; Steve Squares; Valentina Tosato; Christy Vogt; Guido Volckaert; Rolf Wambutt; Tim Warren; Holger Wedler; John Woodward; Shiguo Zhou; Wolfgang Zimmermann; Deborah F Smith; Jenefer M Blackwell; Kenneth D Stuart; Bart Barrell; Peter J Myler
Journal:  Science       Date:  2005-07-15       Impact factor: 47.728

4.  The genome of the African trypanosome Trypanosoma brucei.

Authors:  Matthew Berriman; Elodie Ghedin; Christiane Hertz-Fowler; Gaëlle Blandin; Hubert Renauld; Daniella C Bartholomeu; Nicola J Lennard; Elisabet Caler; Nancy E Hamlin; Brian Haas; Ulrike Böhme; Linda Hannick; Martin A Aslett; Joshua Shallom; Lucio Marcello; Lihua Hou; Bill Wickstead; U Cecilia M Alsmark; Claire Arrowsmith; Rebecca J Atkin; Andrew J Barron; Frederic Bringaud; Karen Brooks; Mark Carrington; Inna Cherevach; Tracey-Jane Chillingworth; Carol Churcher; Louise N Clark; Craig H Corton; Ann Cronin; Rob M Davies; Jonathon Doggett; Appolinaire Djikeng; Tamara Feldblyum; Mark C Field; Audrey Fraser; Ian Goodhead; Zahra Hance; David Harper; Barbara R Harris; Heidi Hauser; Jessica Hostetler; Al Ivens; Kay Jagels; David Johnson; Justin Johnson; Kristine Jones; Arnaud X Kerhornou; Hean Koo; Natasha Larke; Scott Landfear; Christopher Larkin; Vanessa Leech; Alexandra Line; Angela Lord; Annette Macleod; Paul J Mooney; Sharon Moule; David M A Martin; Gareth W Morgan; Karen Mungall; Halina Norbertczak; Doug Ormond; Grace Pai; Chris S Peacock; Jeremy Peterson; Michael A Quail; Ester Rabbinowitsch; Marie-Adele Rajandream; Chris Reitter; Steven L Salzberg; Mandy Sanders; Seth Schobel; Sarah Sharp; Mark Simmonds; Anjana J Simpson; Luke Tallon; C Michael R Turner; Andrew Tait; Adrian R Tivey; Susan Van Aken; Danielle Walker; David Wanless; Shiliang Wang; Brian White; Owen White; Sally Whitehead; John Woodward; Jennifer Wortman; Mark D Adams; T Martin Embley; Keith Gull; Elisabetta Ullu; J David Barry; Alan H Fairlamb; Fred Opperdoes; Barclay G Barrell; John E Donelson; Neil Hall; Claire M Fraser; Sara E Melville; Najib M El-Sayed
Journal:  Science       Date:  2005-07-15       Impact factor: 47.728

5.  The genome sequence of the malaria mosquito Anopheles gambiae.

Authors:  Robert A Holt; G Mani Subramanian; Aaron Halpern; Granger G Sutton; Rosane Charlab; Deborah R Nusskern; Patrick Wincker; Andrew G Clark; José M C Ribeiro; Ron Wides; Steven L Salzberg; Brendan Loftus; Mark Yandell; William H Majoros; Douglas B Rusch; Zhongwu Lai; Cheryl L Kraft; Josep F Abril; Veronique Anthouard; Peter Arensburger; Peter W Atkinson; Holly Baden; Veronique de Berardinis; Danita Baldwin; Vladimir Benes; Jim Biedler; Claudia Blass; Randall Bolanos; Didier Boscus; Mary Barnstead; Shuang Cai; Angela Center; Kabir Chaturverdi; George K Christophides; Mathew A Chrystal; Michele Clamp; Anibal Cravchik; Val Curwen; Ali Dana; Art Delcher; Ian Dew; Cheryl A Evans; Michael Flanigan; Anne Grundschober-Freimoser; Lisa Friedli; Zhiping Gu; Ping Guan; Roderic Guigo; Maureen E Hillenmeyer; Susanne L Hladun; James R Hogan; Young S Hong; Jeffrey Hoover; Olivier Jaillon; Zhaoxi Ke; Chinnappa Kodira; Elena Kokoza; Anastasios Koutsos; Ivica Letunic; Alex Levitsky; Yong Liang; Jhy-Jhu Lin; Neil F Lobo; John R Lopez; Joel A Malek; Tina C McIntosh; Stephan Meister; Jason Miller; Clark Mobarry; Emmanuel Mongin; Sean D Murphy; David A O'Brochta; Cynthia Pfannkoch; Rong Qi; Megan A Regier; Karin Remington; Hongguang Shao; Maria V Sharakhova; Cynthia D Sitter; Jyoti Shetty; Thomas J Smith; Renee Strong; Jingtao Sun; Dana Thomasova; Lucas Q Ton; Pantelis Topalis; Zhijian Tu; Maria F Unger; Brian Walenz; Aihui Wang; Jian Wang; Mei Wang; Xuelan Wang; Kerry J Woodford; Jennifer R Wortman; Martin Wu; Alison Yao; Evgeny M Zdobnov; Hongyu Zhang; Qi Zhao; Shaying Zhao; Shiaoping C Zhu; Igor Zhimulev; Mario Coluzzi; Alessandra della Torre; Charles W Roth; Christos Louis; Francis Kalush; Richard J Mural; Eugene W Myers; Mark D Adams; Hamilton O Smith; Samuel Broder; Malcolm J Gardner; Claire M Fraser; Ewan Birney; Peer Bork; Paul T Brey; J Craig Venter; Jean Weissenbach; Fotis C Kafatos; Frank H Collins; Stephen L Hoffman
Journal:  Science       Date:  2002-10-04       Impact factor: 47.728

6.  AnoXcel: an Anopheles gambiae protein database.

Authors:  J M C Ribeiro; P Topalis; C Louis
Journal:  Insect Mol Biol       Date:  2004-10       Impact factor: 3.585

7.  Genome sequence of the human malaria parasite Plasmodium falciparum.

Authors:  Malcolm J Gardner; Neil Hall; Eula Fung; Owen White; Matthew Berriman; Richard W Hyman; Jane M Carlton; Arnab Pain; Karen E Nelson; Sharen Bowman; Ian T Paulsen; Keith James; Jonathan A Eisen; Kim Rutherford; Steven L Salzberg; Alister Craig; Sue Kyes; Man-Suen Chan; Vishvanath Nene; Shamira J Shallom; Bernard Suh; Jeremy Peterson; Sam Angiuoli; Mihaela Pertea; Jonathan Allen; Jeremy Selengut; Daniel Haft; Michael W Mather; Akhil B Vaidya; David M A Martin; Alan H Fairlamb; Martin J Fraunholz; David S Roos; Stuart A Ralph; Geoffrey I McFadden; Leda M Cummings; G Mani Subramanian; Chris Mungall; J Craig Venter; Daniel J Carucci; Stephen L Hoffman; Chris Newbold; Ronald W Davis; Claire M Fraser; Bart Barrell
Journal:  Nature       Date:  2002-10-03       Impact factor: 49.962

8.  Ensembl 2006.

Authors:  E Birney; D Andrews; M Caccamo; Y Chen; L Clarke; G Coates; T Cox; F Cunningham; V Curwen; T Cutts; T Down; R Durbin; X M Fernandez-Suarez; P Flicek; S Gräf; M Hammond; J Herrero; K Howe; V Iyer; K Jekosch; A Kähäri; A Kasprzyk; D Keefe; F Kokocinski; E Kulesha; D London; I Longden; C Melsopp; P Meidl; B Overduin; A Parker; G Proctor; A Prlic; M Rae; D Rios; S Redmond; M Schuster; I Sealy; S Searle; J Severin; G Slater; D Smedley; J Smith; A Stabenau; J Stalker; S Trevanion; A Ureta-Vidal; J Vogel; S White; C Woodwark; T J P Hubbard
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  Genome annotation of Anopheles gambiae using mass spectrometry-derived data.

Authors:  Dário E Kalume; Suraj Peri; Raghunath Reddy; Jun Zhong; Mobolaji Okulate; Nirbhay Kumar; Akhilesh Pandey
Journal:  BMC Genomics       Date:  2005-09-19       Impact factor: 3.969

10.  The distributed annotation system.

Authors:  R D Dowell; R M Jokerst; A Day; S R Eddy; L Stein
Journal:  BMC Bioinformatics       Date:  2001-10-10       Impact factor: 3.169

View more
  64 in total

1.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes.

Authors:  Brandi L Cantarel; Ian Korf; Sofia M C Robb; Genis Parra; Eric Ross; Barry Moore; Carson Holt; Alejandro Sánchez Alvarado; Mark Yandell
Journal:  Genome Res       Date:  2007-11-19       Impact factor: 9.043

Review 2.  Genomic resources for invertebrate vectors of human pathogens, and the role of VectorBase.

Authors:  K Megy; M Hammond; D Lawson; R V Bruggner; E Birney; F H Collins
Journal:  Infect Genet Evol       Date:  2008-01-03       Impact factor: 3.342

Review 3.  Protein Bioinformatics Databases and Resources.

Authors:  Chuming Chen; Hongzhan Huang; Cathy H Wu
Journal:  Methods Mol Biol       Date:  2017

4.  Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation.

Authors:  Samuel V Angiuoli; Aaron Gussman; William Klimke; Guy Cochrane; Dawn Field; George Garrity; Chinnappa D Kodira; Nikos Kyrpides; Ramana Madupu; Victor Markowitz; Tatiana Tatusova; Nick Thomson; Owen White
Journal:  OMICS       Date:  2008-06

5.  Two frequenins in Drosophila: unveiling the evolutionary history of an unusual neuronal calcium sensor (NCS) duplication.

Authors:  Alejandro Sánchez-Gracia; Jesús Romero-Pozuelo; Alberto Ferrús
Journal:  BMC Evol Biol       Date:  2010-02-19       Impact factor: 3.260

Review 6.  Design and utilization of epitope-based databases and predictive tools.

Authors:  Nima Salimi; Ward Fleri; Bjoern Peters; Alessandro Sette
Journal:  Immunogenetics       Date:  2010-03-06       Impact factor: 2.846

7.  Transcriptome analysis of reproductive tissue and intrauterine developmental stages of the tsetse fly (Glossina morsitans morsitans).

Authors:  Geoffrey M Attardo; José Mc Ribeiro; Yineng Wu; Matthew Berriman; Serap Aksoy
Journal:  BMC Genomics       Date:  2010-03-09       Impact factor: 3.969

8.  Ixodes scapularis tick serine proteinase inhibitor (serpin) gene family; annotation and transcriptional analysis.

Authors:  Albert Mulenga; Rabuesak Khumthong; Katelyn C Chalaire
Journal:  BMC Genomics       Date:  2009-05-12       Impact factor: 3.969

9.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009.

Authors:  Michael Y Galperin; Guy R Cochrane
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

10.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.

Authors:  Gabriel Ostlund; Thomas Schmitt; Kristoffer Forslund; Tina Köstler; David N Messina; Sanjit Roopra; Oliver Frings; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2009-11-05       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.