Literature DB >> 19015128

Influenza sequence and epitope database.

In Seok Yang¹, Joo-Yeon Lee, Joon Seung Lee, Wayne P Mitchell, Hee-Bok Oh, Chun Kang, Kyung Hyun Kim.

Abstract

Influenza epidemics arise through the acquisition of viral genetic changes to overcome immunity from previous infections. An increasing number of complete genomes of influenza viruses have been sequenced in Asia in recent years. Knowledge about the genomes of the seasonal influenza viruses from different countries in Asia is valuable for monitoring and understanding of the emergence, migration and evolution of strains. In order to make full use of the wealth of information from such data, we have developed an integrated user friendly relational database, Influenza Sequence and Epitope Database (ISED), that catalogs the influenza sequence and epitope information obtained in Asia. ISED currently hosts a total of 13,020 influenza A and 2984 influenza B virus sequence data collected in 17 countries including 9 Asian countries, and a total of approximately 545 amantadine-resistant influenza virus sequences collected in Korea. ISED provides users with prebuilt application tools to analyze sequence alignment and different patterns and allows users to visualize epitope-matching structures, which is freely accessible at http://influenza.korea.ac.kr and http://influenza.cdc.go.kr.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2008 PMID： 19015128 PMCID： PMC2686482 DOI： 10.1093/nar/gkn881

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Influenza is one of the most important respiratory infectious diseases of humans. It is estimated that influenza is responsible for 250 000–500 000 deaths annually (1). The 1918 pandemic resulted in the deaths of 20–50 million on a global scale, which was one of the most devastating disease outbreaks in human history (2). Influenza viruses of the family Orthomyxoviridae contain eight single-stranded negative-sense RNA molecules which encode a total of 11 proteins. Three antigenically distinct virus types—A, B and C—circulate in human populations (3). Antigenic drift of the viruses makes the existing vaccines ineffective and antigenic shift creates new strains which may cause worldwide pandemic. Genome sequences of currently circulating virus isolates are important sources of information about influenza. Recent developments in viral genome sequencing, antigenic mapping and epidemiological modeling are greatly improving our knowledge of the evolution of human influenza virus (4–6). However, many aspects of the evolutionary and epidemiological dynamics of influenza viruses are still far from complete. Significant efforts have been made to build public resources of influenza viruses, such as the Influenza Virus Resource (http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html) at the National Center for Biotechnology Information (NCBI), the Influenza Sequence Database at Los Alamos National Laboratory, the Influenza Virus Database (http://influenza.genomics.org.cn) at the Beijing Institute of Genomics and the BioHealthBase Bioinformatics Resource Center (http://www.biohealthbase.org) (7–10). An increasing number of genomes of influenza viruses have been sequenced in Asia in recent years. Southern China has long been considered a potential epicenter for emergence of pandemic influenza viruses (11) and becomes one of the major foci for viral surveillance. Tropical regions may function as permanent mixing pools for viruses from around the world, providing ideal source populations because of extended viral transmission (12). Knowledge about the genomes of the seasonal influenza viruses from different countries in Asia is valuable for monitoring and understanding of the evolution and migration of strains. Since 1968, Korea National Institute of Health (KNIH) has performed influenza virus isolation as part of the World Health Organization's influenza surveillance network. In 2000, the Korean Influenza Surveillance Scheme was established as an integrated clinical and laboratory surveillance network involving public health centers and private clinics (13). Sentinel physicians report cases of influenza-like illness weekly and forward specimens to KNIH for virus isolation and characterization. KNIH has sequenced the isolates of influenza viruses collected in Korea, which have been registered to GenBank at the NCBI. New insights into immunity initiated by host–pathogen interaction are changing the way we think about pathogenesis of influenza. The immune response to influenza virus infection is directed against various epitopes of antigens. Two important surface glycoproteins hemagglutinin (HA) and neuraminidase (NA) mutate at high frequencies under the strong selective pressure of the host's immune system (14). Epitopes can be used to monitor immune response and a single amino acid mutation at the key residue of the epitope is frequently sufficient to cause an antigenic change (15). High-level antiviral drug resistance can also be conferred by single amino acid substitutions (16). Over the years, influenza antiviral drug resistance has grown rapidly despite the efficacy of the drugs comparable to that of vaccines. In order to leverage the wealth of information from such data, we have developed an integrated user friendly relational database, Influenza Sequence and Epitope Database (ISED), particularly focusing on the genomes of the seasonal influenza viruses from Asian countries. We have added value by implementing a suite of bioinformatics tools that can be used to analyze and visualize the influenza data. This freely accessible resource will augment influenza research and contribute to improved public heath.

OVERVIEW OF THE DATABASE

ISED was designed to collect, store and provide sequence information on influenza viruses including drug-resistant strains, conjoined to research tools for sequence pattern and epitope structural analyses of the data. At present, ISED includes information on 16 004 influenza sequences (13 020 influenza A and 2984 influenza B viruses) including those from nine Asian countries (China, Japan, Korea, Malaysia, Philippines, Singapore, Taiwan, Thailand and Vietnam) (Table 1). It also hosts 545 drug-resistant influenza sequences against amantadine collected in Korea (Table 2). No drug-resistant influenza isolates were found in Korea against oseltamivir and zanamivir. Influenza virus sequences collected in Korea are registered and will be registered to GenBank at the NCBI immediately upon publication (currently an additional 184 segment sequences as well as 545 drug-resistant sequences). Those of other countries are collected by searching from NCBI GenBank database. ISED also contains a total of 179 T cell epitopes and 5 antibody epitopes experimentally determined or curated from scientific literature, useful for epitope matching.

Table 1.

Current number of influenza virus sequence data in ISED

Nation	Host	Segment^a								Total
		PB2	PB1	PA	HA	NP	NA	M1/M2	NS1/NS2
Australia	Human	103/4	93/4	211/4	259/70	106/4	219/9	0	0	991/95
Canada	Human	4/3	4/3	4/3	7/4	2/3	49/5	0	0	70/21
China	Human	32/22	33/22	32/23	810/164	45/22	343/49	78/42	67/40	1440/384
France	Human	1/1	1/1	1/1	205/6	1/1	33	0	0	242/10
Germany	Human	33/0	17/0	28/0	116/1	33/0	35/0	0	0	262/1
Italy	Human	0/2	0/1	0/2	91/86	0/2	4/40	0	0	95/133
Japan	Human	14/40	14/40	14/40	655/200	15/48	15/52	32/89	29/109	788/615
Korea	Human	3/0	3/0	3/0	265/81	3/0	41/5	48/2	6/2	372/90
Malaysia	Human	0	0	0	59/22	0	0	0	0	59/22
Philippines	Human	0	0	0	66/60	0	0	0	0	66/60
Singapore	Human	0	0	0	86/13	0	0	0	0	86/13
Spain	Human	0/1	0/1	0/1	72/23	0/1	6/3	0	0	78/30
Taiwan	Human	6/2	6/2	6/2	254/336	6/2	6/2	12/12	11/11	307/354
Thailand	Human	0/6	0/6	0/6	124/70	0/6	0/12	0/12	0/11	124/129
USA	Human	1277/120	1062/120	1386/117	1799/272	1347/120	908/264	0	0	7779/1013
United Kingdom	Human	12/1	11/1	8/1	123/9	15/1	15/1	0	0	184/14
Vietnam	Human	0	0	0	77/0	0	0	0	0	77/0
Total		1485/202	1244/201	1693/200	5068/1417	1573/210	1674/442	170/149	113/163	13 020/2984

aInfluenza virus type A/B.

Table 2.

Current number of drug-resistant virus data in Korea

Amantadine-resistant influenza virus strains in Korea
Season	A/H1N1			A/H3N2
	Total number of isolates	Resistant/ tested	Percent resistance	Total number of isolates	Resistant/ tested	Percent resistance
2003–2008^a	1858	156/302	51.7	4418	389/684	56.9

aThe 2008 data included the number of isolates determined by the 7th week.

bThe values in parentheses represent the number of isolates against zanamivir.

Current number of influenza virus sequence data in ISED aInfluenza virus type A/B. Current number of drug-resistant virus data in Korea aThe 2008 data included the number of isolates determined by the 7th week. bThe values in parentheses represent the number of isolates against zanamivir. The data are updated on a regular basis by a curation team, composed of researchers at the Center for Infectious Diseases at KNIH and in Korea University, in order to ensure a consistently high data quality. The data in ISED is open and freely accessible to the general public, which is one of the chief goals of ISED to offer users easy Web access and graphical user interfaces. ISED is a part of the National BioBank project intended to integrate a framework for identifying, collecting, distributing and managing of biomateirals, which is being developed at KNIH.

DATABASE DESIGN AND CONTENTS

The virus sequences in ISED are categorized into tables according to countries, each of which is characterized by a number of attributes: strain name, target host, virus type, virus subtype or lineage (B type only), RNA segment, amino acid sequence, start number of amino acid sequence, aligned amino acid sequences, NCBI accession number (amino acid sequence), nucleotide sequence, start number of nucleotide sequence, aligned nucleotide sequences, NCBI accession number (nucleotide sequence), reference, author list, isolated region, isolated year and isolated season, followed by oseltamivir/zanamivir-resistant and amantadine-resistant viral sequences if available (data not shown). Reference, one of the attributes, is linked to the PubMed abstract and in some instances to the full text of the article if the journal is available online. Target host and isolated region (nation) tables have one-to-many relationships with the virus sequence table, which are frequently used to extract statistical information. Both vaccine and drug-resistant strain sequences are included in the sequence table. The sequences of 46 vaccine strains (9 strains in A/H1N1, 23 in A/H3N2 and 14 in B) are separately grouped as a vaccine strain table. Since 2002, drug-susceptibility surveillance has been routinely undertaken in the characterization of influenza virus isolates submitted to KNIH. Earlier surveillance showed a low incidence of resistance to amantadine (below 10%). However, as of August 2008, 156 amantadine-resistant influenza sequences in A/H1N1 and 389 amantadine-resistant strain sequences in A/H3N2 were collected in Korea (Table 2). Epitope data were obtained from the Immune Epitope Database and Analysis Resource (IEDB) (http://www.immuneepitope.org/home.do) with 14 reference strain data (15). The database fields in the epitope data table contain epitope residue, start residue number of epitope, number of residues (only B cell response), virus strain, source protein, protein sequence, start residue number of source protein, epitope type (T cell, B cell response or MHC binding), NCBI accession number of source protein and reference. A reference strain table has one-to-many relationship with the epitope table (data not shown).

DATA RETRIEVAL AND TOOLKIT

ISED consists of a framework for advanced web-based retrieval, analysis and visualization of related influenza data: sequence browse, sequence analysis and epitope matching arranged in one Oracle schema (17). Sequence data can be retrieved efficiently through establishment of the sequence browse mode (Figure 1). Users can combine various options, such as virus type, nation, host, RNA segment, subtype and collection year. The website then provides access to individual influenza sequence records characterized by a number of database fields, such as accession number, sequence length, virus type, target host, RNA segment, subtype, collected nation and year, virus name and potential N-glycosylation site. The sequence browse results are displayed in chronological order and can also be sorted by column by clicking the table header. Two different search options are provided: individual and collective selection. The amino acid sequences of the selected strains in the displayed list can be retrieved in a separate window by clicking the ‘View fasta format’ button, or can be easily downloaded (Figure 1). Users can prepare an input data by clicking ‘Sequence alignment’ and conduct multiple sequence alignment by direct submission or upload a file of the chosen sequences to CLUSTALW tool of EBI (18). Later, a user's past search history can be located and accessed by the Web server. Drug-resistant influenza sequences can also be retrieved in a sequence browse mode, where alignment or difference patterns of selected resistant sequences can be examined (Figure 2A).

Figure. 1.

Figure 2.

The mutant and epitope analysis tools (A) Mutant sequence search, alignment and difference patterns are shown. (B) Epitope sequence and structure analysis showing sequence alignment with reference strains, sequence-matching frequency and an epitope 3D conformation superimposed on a HA structure viewed with Jmol.

Snapshots showing the interrelation of data retrieval tools in ISED. Users can access the data through search options and the results can be selectively saved. The results can be subjected to further analysis, such as multiple sequence alignment. The mutant and epitope analysis tools (A) Mutant sequence search, alignment and difference patterns are shown. (B) Epitope sequence and structure analysis showing sequence alignment with reference strains, sequence-matching frequency and an epitope 3D conformation superimposed on a HA structure viewed with Jmol. The contents of the epitope resource can be searched via user-friendly interface. For epitope matching, users can select virus subtype and reference strains in the reference table containing database fields, such as virus strain, collected area, virus type and target host. Search can query via one of strains and locations which can be selected from pull-down menus, or users can upload and submit their own sequences (Figure 2B). Details of epitope information can be viewed with the strings of amino acid sequences highlighted either in green or blue for antibody or T cell epitopes, respectively. More detailed information can be retrieved by clicking each epitope segment. The epitope 3D structure is visualized using an interactive Jmol (http://www.jmol.org), which is superimposed on an HA tertiary structure model provided by the Protein Data Bank (19). Users can also easily examine matching frequencies between the selected strain and reference strains.

DATA ANALYSIS

ISED allows access to sequence analysis tools by clicking ‘Sequence analysis’ on the top menu bar. Users can select virus sequence resources via a graphical interface according to virus type, collected region (nation) and RNA segment, with collection year range (Figure 3A), and conduct sequence alignment or sequence difference by clicking the ‘alignment’ or ‘difference’ button. Users can also combine sequences from different sources. On the result page, the alignment can be viewed with color-coded amino acids, so that viral mutations can be seen as changes in color when scanning from the N- to C-terminus along the sequence (Figure 3B). Difference can be also viewed in a separate full-screen window with color coding, where the amino acids with mutations are displayed with additional information of mutation frequencies as well as antigenic sites. Notably, the sequence analysis also includes a vaccine strain tool, with which users can conduct either sequence difference or sequence comparison with vaccine strains (Figure 3C). Sequence difference among vaccine strains returns a result of list of the differences in amino acid sequences among the vaccine strains with information of mutation frequencies as well as antigenic sites. More interestingly, the results of sequence comparison between a sample of circulating strain and vaccine strains can be illustrated by the changes in color patterns against vaccine strains (the lighter the color, the lower the difference). Thus, ISED provides a convenient tool for evaluating the relative closeness of the currently circulating strain against known vaccine strains. Users can then export the results as an Excel or Word file.

Figure 3.

Snapshots showing the interrelation of ISED sequence analysis tools. (A) Users can select sequences or combine sequences from different sources and (B) conduct sequence alignment or sequence comparisons. (C) Sequence differences with vaccine strains are displayed in a separate window with color coding.

FUTURE DIRECTIONS

It is still unclear what features of the influenza viruses are responsible for the global spread and more specifically how the dominant strain is derived. For instance, A/Fujian/411/02 collected in southern China is believed to cause significant outbreaks in China, Japan and Korea in 2002 and spread worldwide during the successive winter season of 2003–2004. The main challenge in the future is to keep ISED up to date with the growing number of complete influenza virus sequences experimentally verified and registered to other databases such as NCBI GenBank. We will thus implement text mining support for database curation in the near future. Toward this goal, a network of influenza expert groups at the Center for Infectious Diseases at KNIH and at Korea University and advisory committee outside KNIH will coordinate validation of new virus strains. Another challenge is to provide ISED with regional epidemiological features of drug-resistant viruses. Amantadine and rimantadine have been used for the prevention and treatment of influenza A virus infection for >30 years (20). Widespread use of antiviral drugs relying on pandemic stockpiles has the potential to promote emergence of resistant strains of which the epidemiological surveillance is a key to monitor and control. Open sharing of the resistant viral genome information has become increasingly important in preventing and controlling the spread of the resistant viruses. In addition, an antiviral drug resistance analysis tool can be developed and linked to the records in the database, which provides users to analyze influenza sequences for mutations known to confer drug resistance or sensitivity. The recent H5N1 outbreaks in Asia and a worst outbreak in Korea in 2008 have spurred our interest in surveillance among wild and domestic birds. Avian influenza surveillance may provide early warning signals for any possible introduction of avian viruses in new regions. Importantly, a large number of genome sequences of avian influenza viruses are accumulated in Asia. Given the regions potential as an epicenter for emergence of new influenza virus strains, we particularly intend to extend the ISED platform to enable epidemiological monitoring of avian influenza virus sequences.

USER MANAGEMENT

The ISED management system allows users to access the influenza virus sequence database without registration, except for drug-resistant virus sequence data. However, user registration is required for adding and editing database contents, and user support can be obtained by e-mailing graduate@korea.ac.kr or khkim@korea.ac.kr. Readers are encouraged to contact us if they wish to provide new data for inclusion in ISED, assist with curation or have any suggestions for improvements.

IMPLEMENTATION

ISED was developed as a relational database using Oracle 10g applications (14) on the Windows operating system. Two open source programs, the Apache HTTP Server and Apache Tomcat, were used as HTTP server and servlet container for web service, respectively. Perl scripts were used to provide common gateway interface for sequence alignment using ClustalW, and Java applet was used to link Jmol for displaying 3D models. ISED can be publicly accessed from any Web browser at http://influenza.korea.ac.kr.

FUNDING

Korea National Institute of Health (2008-E00179); BioGreen 21 program grant (20080401-034-008) and the Basic Research Program of the Korea Science & Engineering Foundation. Funding for open access charge: Korea National Institute of Health. Conflict of interest statement. None declared.

18 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Influenza--WHO cares.

Authors: Klaus Stöhr
Journal: Lancet Infect Dis Date: 2002-09 Impact factor: 25.071

Review 3. Avian influenza A (H5N1) infection in humans.

Authors: John H Beigel; Jeremy Farrar; Aye Maung Han; Frederick G Hayden; Randy Hyer; Menno D de Jong; Sorasak Lochindarat; Thi Kim Tien Nguyen; Tran Hien Nguyen; Tinh Hien Tran; Angus Nicoll; Sok Touch; Kwok-Yung Yuen
Journal: N Engl J Med Date: 2005-09-29 Impact factor: 91.245

Review 4. Evolution and ecology of influenza A viruses.

Authors: R G Webster; W J Bean; O T Gorman; T M Chambers; Y Kawaoka
Journal: Microbiol Rev Date: 1992-03

5. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

Authors: Elodie Ghedin; Naomi A Sengamalay; Martin Shumway; Jennifer Zaborsky; Tamara Feldblyum; Vik Subbu; David J Spiro; Jeff Sitz; Hean Koo; Pavel Bolotov; Dmitry Dernovoy; Tatiana Tatusova; Yiming Bao; Kirsten St George; Jill Taylor; David J Lipman; Claire M Fraser; Jeffery K Taubenberger; Steven L Salzberg
Journal: Nature Date: 2005-10-05 Impact factor: 49.962

Review 6. Global epidemiology of influenza: past and present.

Authors: N J Cox; K Subbarao
Journal: Annu Rev Med Date: 2000 Impact factor: 13.739

7. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors: J D Thompson; D G Higgins; T J Gibson
Journal: Nucleic Acids Res Date: 1994-11-11 Impact factor: 16.971

8. Incidence of adamantane resistance among influenza A (H3N2) viruses isolated worldwide from 1994 to 2005: a cause for concern.

Authors: Rick A Bright; Marie-jo Medina; Xiyan Xu; Gilda Perez-Oronoz; Teresa R Wallis; Xiaohong M Davis; Laura Povinelli; Nancy J Cox; Alexander I Klimov
Journal: Lancet Date: 2005-09-22 Impact factor: 79.321

9. Mapping the antigenic and genetic evolution of influenza virus.

Authors: Derek J Smith; Alan S Lapedes; Jan C de Jong; Theo M Bestebroer; Guus F Rimmelzwaan; Albert D M E Osterhaus; Ron A M Fouchier
Journal: Science Date: 2004-06-24 Impact factor: 47.728

10. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences.

Authors: Susie M Stephens; Jake Y Chen; Marcel G Davidson; Shiby Thomas; Barry M Trute
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

7 in total

Review 1. Unraveling the web of viroinformatics: computational tools and databases in virus research.

Authors: Deepak Sharma; Pragya Priyadarshini; Sudhanshu Vrati
Journal: J Virol Date: 2014-11-26 Impact factor: 5.103

2. OpenFluDB, a database for human and animal influenza virus.

Authors: Robin Liechti; Anne Gleizes; Dmitry Kuznetsov; Lydie Bougueleret; Philippe Le Mercier; Amos Bairoch; Ioannis Xenarios
Journal: Database (Oxford) Date: 2010-07-06 Impact factor: 3.451

3. Oseltamivir-resistant pandemic (H1N1) 2009 virus, South Korea.

Authors: Hwajung Yi; Joo Yeon Lee; Eun Hye Hong; Mi Seon Kim; Donghyok Kwon; Jang Hoon Choi; Woo Young Choi; Ki Soon Kim; Jong Koo Lee; Hee Bok Oh; Chun Kang
Journal: Emerg Infect Dis Date: 2010-12 Impact factor: 6.883

4. Influenza research database: an integrated bioinformatics resource for influenza research and surveillance.

Authors: R Burke Squires; Jyothi Noronha; Victoria Hunt; Adolfo García-Sastre; Catherine Macken; Nicole Baumgarth; David Suarez; Brett E Pickett; Yun Zhang; Christopher N Larsen; Alvin Ramsey; Liwei Zhou; Sam Zaremba; Sanjeev Kumar; Jon Deitrich; Edward Klem; Richard H Scheuermann
Journal: Influenza Other Respir Viruses Date: 2012-01-20 Impact factor: 4.380

5. Pathogenicity of the H1N1 influenza virus enhanced by functional synergy between the NPV100I and NAD248N pair.

Authors: Woo-Jong Kim; Kye-Yeon Hur; Han Wook Park; Seung-Woo Lee; Joo-Yeon Yoo
Journal: PLoS One Date: 2019-05-31 Impact factor: 3.240

6. Recombinant hemagglutinin of swine H1N1 influenza virus expression in the insect cells: Formulation in Montanide ISA71 adjuvant and the potency studies.

Authors: Sara Zahmati; Morteza Taghizadeh; Setareh Haghighat; Reza Jalalirad; Mehdi Mahdavi
Journal: Iran J Basic Med Sci Date: 2021-11 Impact factor: 2.699

7. The immune epitope database 2.0.

Authors: Randi Vita; Laura Zarebski; Jason A Greenbaum; Hussein Emami; Ilka Hoof; Nima Salimi; Rohini Damle; Alessandro Sette; Bjoern Peters
Journal: Nucleic Acids Res Date: 2009-11-11 Impact factor: 16.971

7 in total