Literature DB >> 23407581

A Nonlinear Pattern Recognition of Pandemic H1N1 Using a State Space Based Methods.

Mai S Mabrouk1.   

Abstract

Genomic Signal Processing is a relatively new field in bioinformatics, in which signal processing algorithms and methods are used to study functional structures in the DNA. An appropriate mapping of the DNA sequence into one or more numerical sequences enables the use of many digital signal processing tools in the analysis of different genomic sequences. Also, a novel Influenza A (H1N1) virus of swine origin emerged in the spring of 2009 and spread very rapidly among people. The severity of the disease and the number of deaths caused by a pandemic virus varies greatly and can change over time. Throughout this work, Pandemic H1N1 genomic sequences were characterized according to nonlinear dynamical features such as moment invariants and largest Lyapunov exponents and then compared to those features that extracted from classical H1N1 genomic sequences. The proposed methods were applied to a number of sequences encoded into a time series using a coding measure scheme employing Electron-Ion Interaction Pseudopotential (EIIP). The aim of this work is to extract genomic features that can distinguish the new swine flu from the classical H1N1 existed before using sequences from segment 8 of the influenza genome that consists of 8 RNA segments which encodes two important proteins for immune system attack (NS1 and NS2). According to the obtained results it is evident that variability is present based on a significance test in both groups; pandemic and classical H1N1 sequences.

Entities:  

Keywords:  DNA; Genome; H1N1 Subtype; Pandemics; Sequence

Year:  2011        PMID: 23407581      PMCID: PMC3558171     

Source DB:  PubMed          Journal:  Avicenna J Med Biotechnol        ISSN: 2008-2835


Introduction

The variations of pandemic H1N1 influenza virus are caused as a result of different mutations occurring during viral replication (1). The polymerase of this RNA virus lacks proof reading activity (2); this gives rise to considerable viral variability culminating in 3 different types A, B and C, in addition to many subtypes based on variations in the hemagglutinin (HA) and the neuraminidase (NA) surface proteins (3). The influenza genome consists of 8 RNA segments and encodes for 10 polypeptides; the internal structural proteins, nucleocapsid protein (NP), the two matrix protein (M) are used for the classification of the influenza virus into A, B and C. The surface proteins neuraminidase (NA) and hemagglutinin (HA) have been studied extensively and the antigenic variations in the these surface glycoproteins are used to subtype Influenza A. Additionally, three of the influenza polypeptides are associated with RNA polymerase activity (PA, PB1, PB2), and the RNA binding non-structural protein (NS) that contribute to viral pathogenicity and play a central role in the prevention of interferon mediated antiviral response. The Influenza A Virus (IAV) undergoes major and minor genetic variations, the yearly antigenic drift resulting in as minor as a single amino acid mismatch. Major variations known as antigenic shifts are the cause of serious outbreaks and pandemics as the 1918, 1957, and 1968 worldwide outbreaks (4). Changes in the genetic and antigenic composition result in challenges in the development of influenza vaccines and antiviral medications (5). In the last two decades, there has been an increasing interest in applying techniques from the domains of nonlinear analysis and chaos theory in different fields of research. In this work, the chaos theory was applied to both pandemic H1N1 and classical H1N1 genomic sequences in order to discriminate between them according to their non linear dynamical features as moment invariants, and Largest Lyapunov Exponents (LLE).

Materials and Methods

The conversion of the DNA sequences into digital signals offers the possibility of applying signal processing methods to the analysis of genomic data (6, 7). The genomic signal processing applications in bioinformatics provides an efficient tool used to extract features of DNA sequences maintained over the whole genomes (8). In this work, the EIIP sequence indicators were used, the energy of delocalized electrons in amino acids and nucleotides has been calculated as the Electron-Ion Interaction Pseudopotential (EIIP). The EIIP values of amino acids were used to substitute for the corresponding amino acids in protein sequences, whose power spectrum is taken to extract the information contents (9). To study the dynamics of the proposed system, the state space trajectory was first reconstructed. Phase space reconstruction is the fundamental for analyzing nonlinear signals, by which a time series can be embedded to n-dimensional space. Briefly the basic steps of the reconstruction of the phase space were demonstrated. First, different sequences of the pandemic H1N1 and classical H1N1 which existed before were encoded into a time series signal using EIIP sequence indicators. A good choice for a delay time was yielded by using the first minimum of the auto mutual information function. The first minimum of the auto mutual information could be found at four. The minimal embedding dimension for the pandemic H1N1 and classical H1N1 time series signals were calculated using Cao's method with a delay time of four, a maximal dimension of eight, three nearest neighbors and reference point depending on the length of each signal. There was a kink produced by Cao's method at 3. This kink represents the time delay reconstruction of pandemic and classical H1N1 time series signals with embedding dimension of 3 and delay of 4. Finally, the phase space trajectory was obtained for both time series signals of the two types of H1N1 genomic sequences (pandemic and classical). The step following obtaining the phase trajectory is the step of feature extraction (10).

Feature extraction

TSTOOL software package is used to estimate the extracted nonlinear dynamical features; it is a software package for signal processing with emphasis on nonlinear time-series analysis (11).

Moment invariants

Features obtained by moment invariants are simple calculated features that do not change under translation, scaling or rotation (12). These invariants are constructed using the generalized fundamental theorem of moment invariants (GFTMI), which was formulated as in (13). The n-dimensional moments of order p of a function of intensity ρ (x1, ..., xn) = ρ (x) are defined in terms of Rieman integral as: Where p + ... p = p, 0 < p < ∞. It is assumed that ρ (x) is piecewise continuous and therefore bounded function, and it can have nonzero values only in a finite part of the R ; then the moments of all orders exist. The central moments: Where The seven features of moment invariants:

Largest lyapunov exponent (LLE)

In this work, a set of genomic sequences from segment 8 of the influenza genome of both pandemic and classical H1N1 was downloaded from the NCBI. The length of these sequences was chosen to be 800-1000 bp. These sequences are first encoded using EIIP sequence indicators. Then, the phase space trajectory was reconstructed for each time series of both of them. The TSTOOL larglyap algorithm was used to estimate the Largest Lyapunov Exponent (LLE). This algorithm is similar to Wolf's algorithm and provides an efficient estimation of the Largest Lyapunov Exponent through the calculation of the rate of increase of the prediction error versus the pre-diction time (14).

Results

Results of moment invariants

Features based on moment invariants were computed after the construction of phase space of both pandemic and classic H1N1 EIIP encoded sequences. The seven features are arranged as (φ1, φ2, φ3, φ4, φ5, φ7, and φ8). A significance test (t-test) was performed on the proposed features to assess the use of such parameters for discriminating between them. The result of the t-test is presented and the p value is calculated for all seven features; they are all less than 0.05 as shown in Table 1. Figure 1 shows the result of comparing the average features extracted based on moment invariants for pandemic and classical H1N1. There is a significant difference between the two types of H1N1 as shown in the figure. Also, small vertical bars represent a standard deviation across features.
Table 1

P-value of t-test on a set of pandemic and classical H1N1 EIIP encoded sequences for feature extracted using moment invariants

Moment invariants featurep-value
φ11.8235e-004
ф22.2912e-005
ф31.3674e-010
ф40.0012
ф50.0288
ф72.2912e-005
ф86.7141e-005
Figure 1

Features extracted based on moment invariants for pandemic and classical H1N1, the small vertical bars represent standard deviations across features

Features extracted based on moment invariants for pandemic and classical H1N1, the small vertical bars represent standard deviations across features P-value of t-test on a set of pandemic and classical H1N1 EIIP encoded sequences for feature extracted using moment invariants

Results of largest lyapunov exponent (LLE)

The LLE estimates of a set of pandemic and classical H1N1 genomic sequences were calculated using TSTOOL largelyap algorithm as shown in table 2. It is an algorithm very similar to the Wolf algorithm; it computes the average exponential growth of the distance of neighboring orbits via the prediction error. The increase of the prediction error versus the prediction time allows an estimation of the Largest Lyapunov Exponent. A significance t-test was applied to assess the use of LLE estimates in the discrimination between pandemic and classical H1N1.
Table 2

Largest Lyapunov Exponent estimates of pandemic and classical H1N1 encoded sequences

LLE (Pandemic H1N1)LLE (Classical H1N1)
2.69320.3428
2.71030.3601
2.71130.3628
2.71420.3667
2.71530.3795
2.72800.3854
2.73920.4429
2.74750.4491
2.75060.4533
2.75670.5569
2.75770.6274
2.77220.7417
2.89720.7509
2.90750.8078
2.91780.8891
2.94720.9501
2.95080.9677
Largest Lyapunov Exponent estimates of pandemic and classical H1N1 encoded sequences

Significance test

The accuracy of a test was evaluated to discriminate between pandemic H1N1 and classical H1N1 by moment invariants and Largest Lyapunov Exponent dynamical system features). These features were divided into three feature vectors as follows: V1= {ф1, ф2, ф3, ф4, ф5, ф7, ф8} V2= {LLE} V3= {ф1, ф2, ф3, ф4, ф5, ф7, ф8, LEE} The feature vectors were fed into the classification process using K-means clustering classifier. Results of applying the significance test are shown in table 3.
Table 3

Accuracy of the proposed nonlinear pattern recognition method using K- means classifier

V1V2V3
Pandemic H1N1 100%100%100%
Classical H1N1 70.8%100%100%
Accuracy of the proposed nonlinear pattern recognition method using K- means classifier

Discussion

The proposed techniques were implemented and applied to a number EIIP encoded sequences of pandemic and classical H1N1 from segment 8 of the influenza genome to identify their genomic signatures as continuous detection of these signatures is important in the analysis of the adaptation process from nonhumans to humans. As to chaotic features extracted based on moment invariants, the seven features are arranged as (φ1, φ2, φ3, φ4, φ5, φ7, and φ8). Considering the p-values: if p<0.05 there is a significant difference, if p>0.05 there is no significant difference. The results show that these features generally support the hypothesis that they have a potential to discriminate between pandemic and classical H1N1 as they all <0.05. As to chaotic features based on LLE estimates, the p-value of the t-test was calculated as 2.1546e-019 which is < 0.05. To validate this result, a random DNA sequence of length 1000 bp was generated, the Largest Lyapunov Exponent (LLE) of this random sequence was estimated at 1.4046 and compared to the average LLE estimates of pandemic H1N1 (2.8218) and the average LLE estimates of classical H1N1 (0.4697). The results confirm that pandemic H1N1 genomic sequences can be statistically differentiated from classical H1N1 genomic sequences by LLE dynamical features.

Conclusion

The analysis of different genomic mutations of the pandemic H1N1 genomic sequences is very important to study the possibility of virus adaptation from non-humans to humans. A study of nonlinear dynamics of pandemic and classical H1N1 genomic sequences of segment 8 of the influenza genome was presented to discriminate between them by their moment invariants and Largest Lyapunov Exponent (LLE) estimates. The results of this work were supported by statistical analysis indicating that the discrimination between these two types of H1N1 provides a clear outline for the potential of using such nonlinear dynamical features with high accuracy. The study shows that using these nonlinear dynamical features will open the door to extract more patterns to be used in monitoring and extracting all H1N1 genomic signatures.
  8 in total

1.  Conversion of nucleotides sequences into genomic signals.

Authors:  P D Cristea
Journal:  J Cell Mol Med       Date:  2002 Apr-Jun       Impact factor: 5.310

2.  Study of features based on nonlinear dynamical modeling in ECG arrhythmia detection and classification.

Authors:  Mohamed I Owis; Ahmed H Abou-Zied; Abou-Bakr M Youssef; Yasser M Kadah
Journal:  IEEE Trans Biomed Eng       Date:  2002-07       Impact factor: 4.538

Review 3.  Influenza.

Authors:  N J Cox; K Subbarao
Journal:  Lancet       Date:  1999-10-09       Impact factor: 79.321

4.  Time lines of infection and disease in human influenza: a review of volunteer challenge studies.

Authors:  Fabrice Carrat; Elisabeta Vergu; Neil M Ferguson; Magali Lemaitre; Simon Cauchemez; Steve Leach; Alain-Jacques Valleron
Journal:  Am J Epidemiol       Date:  2008-01-29       Impact factor: 4.897

5.  Triple-reassortant swine influenza A (H1) in humans in the United States, 2005-2009.

Authors:  Vivek Shinde; Carolyn B Bridges; Timothy M Uyeki; Bo Shu; Amanda Balish; Xiyan Xu; Stephen Lindstrom; Larisa V Gubareva; Varough Deyde; Rebecca J Garten; Meghan Harris; Susan Gerber; Susan Vagasky; Forrest Smith; Neal Pascoe; Karen Martin; Deborah Dufficy; Kathy Ritger; Craig Conover; Patricia Quinlisk; Alexander Klimov; Joseph S Bresee; Lyn Finelli
Journal:  N Engl J Med       Date:  2009-05-07       Impact factor: 91.245

6.  Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.

Authors:  Rebecca J Garten; C Todd Davis; Colin A Russell; Bo Shu; Stephen Lindstrom; Amanda Balish; Wendy M Sessions; Xiyan Xu; Eugene Skepner; Varough Deyde; Margaret Okomo-Adhiambo; Larisa Gubareva; John Barnes; Catherine B Smith; Shannon L Emery; Michael J Hillman; Pierre Rivailler; James Smagala; Miranda de Graaf; David F Burke; Ron A M Fouchier; Claudia Pappas; Celia M Alpuche-Aranda; Hugo López-Gatell; Hiram Olivera; Irma López; Christopher A Myers; Dennis Faix; Patrick J Blair; Cindy Yu; Kimberly M Keene; P David Dotson; David Boxrud; Anthony R Sambol; Syed H Abid; Kirsten St George; Tammy Bannerman; Amanda L Moore; David J Stringer; Patricia Blevins; Gail J Demmler-Harrison; Michele Ginsberg; Paula Kriner; Steve Waterman; Sandra Smole; Hugo F Guevara; Edward A Belongia; Patricia A Clark; Sara T Beatrice; Ruben Donis; Jacqueline Katz; Lyn Finelli; Carolyn B Bridges; Michael Shaw; Daniel B Jernigan; Timothy M Uyeki; Derek J Smith; Alexander I Klimov; Nancy J Cox
Journal:  Science       Date:  2009-05-22       Impact factor: 47.728

7.  Genomic signatures of human versus avian influenza A viruses.

Authors:  Guang-Wu Chen; Shih-Cheng Chang; Chee-keng Mok; Yu-Luan Lo; Yu-Nong Kung; Ji-Hung Huang; Yun-Han Shih; Ji-Yi Wang; Chiayn Chiang; Chi-Jene Chen; Shin-Ru Shih
Journal:  Emerg Infect Dis       Date:  2006-09       Impact factor: 6.883

8.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP).

Authors:  Achuthsankar S Nair; Sivarama Pillai Sreenadhan
Journal:  Bioinformation       Date:  2006-10-07
  8 in total
  1 in total

1.  First two months of the 2019 Coronavirus Disease (COVID-19) epidemic in China: real-time surveillance and evaluation with a second derivative model.

Authors:  Xinguang Chen; Bin Yu
Journal:  Glob Health Res Policy       Date:  2020-03-02
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.