| Literature DB >> 35962315 |
Yixiang Zhang1, Kent M Eskridge2, Shunpu Zhang3, Guoqing Lu4.
Abstract
BACKGROUND: Influenza A viruses (IAV) exhibit vast genetic mutability and have great zoonotic potential to infect avian and mammalian hosts and are known to be responsible for a number of pandemics. A key computational issue in influenza prevention and control is the identification of molecular signatures with cross-species transmission potential. We propose an adjusted entropy-based host-specific signature identification method that uses a similarity coefficient to incorporate the amino acid substitution information and improve the identification performance. Mutations in the polymerase genes (e.g., PB2) are known to play a major role in avian influenza virus adaptation to mammalian hosts. We thus focus on the analysis of PB2 protein sequences and identify host specific PB2 amino acid signatures.Entities:
Keywords: Adjusted entropy; Amino acid signatures; Host specificity; Influenza A virus
Mesh:
Substances:
Year: 2022 PMID: 35962315 PMCID: PMC9372975 DOI: 10.1186/s12859-022-04885-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Amino acid composition, proposition, entropy, adjusted entropy and similarity coefficient (SC) at example positions of the PB2 protein sequence alignment
| Attribute | Position 1 (n = 2000) | Position 2 (n = 2000) |
|---|---|---|
| Composition | 1600+ Pro, 200 Phe and 200 Asn | 1000 Tyr, 500 Phe and 500 Trp |
| Proportion ( | 0.8# Pro, 0.1 Phe and 0.1 Asn | 0.5 Tyr, 0.25 Phe and 0.25 Trp |
| Entropy | 0.639 | 1.040 |
| 0.625 | 3.605 | |
| Adjusted entropy* | 1.022 | 0.288 |
+The number preceding the amino acid is the observed number of residues for that amino acid
#The number preceding the amino acid is the observed proportion of residues out of 2000 observed for that amino acid
&SC = Similarity Coefficient
*Adj. Entropy = Entropy/SC. See the methods section for a detailed explanation of adjusted entropy and a simple example
PB2 positions identified as host-specific signatures using unadjusted and adjusted entropy with two thresholds (0.33 and 0.15)
| Method | Signatures |
|---|---|
| Unadjusted Entropy (< 0.33) | 44 199 271 475 567 588 613 627 674* 702 |
| Adjusted Entropy (< 0.33) | 44 |
| Adjusted Entropy (< 0.15) | 44 199 |
*Position 674 identified by Chen et al. [7]
+Bold figures indicates new signature identified by adjusted entropy
Signature positions and mutation patterns of PB2 identified by the unadjusted method
| Strain | 44 | 199 | 271 | 475 | 567 | 588 | 613 | 627 | 674 | 702 | Mutations+ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| AAK49374(A/Hong Kong/482/97(H5N1)) | A | T | L | E | A | V | E | A | K | 1 | |
| AAK49375(A/Hong Kong/483/1997(H5N1))] | A | A | T | L | E | A | V | A | K | 1 | |
| AAF74312(A/Hong Kong/483/1997(H5N1))] | A | A | T | L | E | A | V | A | K | 1 | |
| ACZ45427(A/Hong Kong/483/1997(H5N1))] | A | A | T | L | E | A | V | A | K | 1 | |
| CAB95862(A/Hong Kong/485/1997(H5N1))] | A | A | T | L | E | A | V | A | K | 1 |
+Number of mutations
Signature positions and mutation patterns of PB2 identified by the adjusted method
| Strain | 44 | 199 | 227 | 382 | 475 | 627 | 697 | Mutations+ |
|---|---|---|---|---|---|---|---|---|
| AAK49374(A/Hong Kong/482/97(H5N1)) | A | V | I | L | E | L | 1 | |
| AAK49375(A/Hong Kong/483/1997(H5N1))] | A | A | V | L | L | 2 | ||
| AAF74312(A/Hong Kong/483/1997(H5N1))] | A | A | V | L | L | 2 | ||
| ACZ45427(A/Hong Kong/483/1997(H5N1))] | A | A | V | L | L | 2 | ||
| CAB95862(A/Hong Kong/485/1997(H5N1))] | A | A | V | I | L | L | 1 |
+Number of mutations
False positive and false negative rates for both unadjusted and adjusted entropy methods
| Training dataset | Unadjusted entropy | Adjusted entropy | ||
|---|---|---|---|---|
| False positive rate | False negative rate | False positive rate | False negative rate | |
| Highly divergent | 0.13 | 0.091 | 0.09 | 0 |
| Median divergent | 0 | 0.49 | 0 | 0 |
| Less divergent | 0 | 1 | 0 | 0.101 |
Swine-human signature positions identified using unadjusted (U) and adjusted (A) entropy for PB2 proteins during 2004–2014
| Year | Type+ | Signatures | n++ |
|---|---|---|---|
| 2004 | U | 9 44 81 91 105 114 199 354 355 395 399 411 447 475 490 491 547 567 627 702 | 20 |
| 2004 | A | 9 44 81 91 105 109 114 199 340 354 355 368 395 399 411 447 475 478 490 491 535 547 567 591 627 645 667 702 | 28 |
| 2005 | U | 44 64 81 91 105 114 199 354 395 399 411 447 475 490 491 567 627 702 | 18 |
| 2005 | A | 9 44 64 65 91 109 114 199 340 354 368 395 399 411 475 478 490 491 535 547 567 591 627 667 674 702 | 25 |
| 2006 | U | 9 44 81 91 114 199 354 355 395 399 411 447 475 490 491 547 567 627 702 | 19 |
| 2006 | A | 9 44 65 91 109 114 199 340 354 355 368 395 399 411 443 447 475 478 490 491 547 560 567 591 627 645 702 | 27 |
| 2007 | U | 9 44 64 105 106 109 114 199 354 355 368 395 399 447 475 490 491 547 567 627 661 674 702 | 23 |
| 2007 | A | 9 44 64 81 91 105 106 109 114 199 292 340 354 355 368 375 395 399 411 447 475 478 490 491 535 547 560 567 591 627 645 661 667 674 702 | 35 |
| 2008 | U | 9 44 64 81 105 114 354 355 395 399 447 475 490 491 547 567 627 674 702 | 19 |
| 2008 | A | 9 44 64 65 73 81 105 109 114 127 199 292 340 354 355 395 399 411 447 451 456 475 478 490 491 547 560 567 591 627 645 667 674 702 | 34 |
| 2009 | U | NA | 0 |
| 2009 | A | 54 315 | 2 |
| 2010 | U | NA | 0 |
| 2010 | A | 54 | 1 |
| 2011 | U | NA | 0 |
| 2011 | A | 54 315 354 | 3 |
| 2012 | U | 344 | 1 |
| 2012 | A | 54 315 344 354 | 4 |
| 2013 | U | NA | 0 |
| 2013 | A | 66 293 315 354 560 731 | 6 |
| 2014 | U | NA | 0 |
| 2014 | A | 66 315 354 560 731 | 5 |
+U = unadjusted, A = adjusted entropy
++n = number of signatures
PB2 amino acid mutations from 2008 to 2010 at three positions
| Year | Position | Host | Dominant AA type | Identified as signature | |
|---|---|---|---|---|---|
| Unadjusted entropy | Adjusted entropy | ||||
| 08 | 354 | Swine | I | Yes | Yes |
| Human | L | ||||
| 344 | Swine | V | No | No | |
| Human | V | ||||
| 560 | Swine | L | No | Yes | |
| Human | V | ||||
| 09 | 354 | Swine | I | No | No |
| Human | I | ||||
| 344 | Swine | V | No | No | |
| Human | V | ||||
| 560 | Swine | V | No | No | |
| Human | V | ||||
| 10 | 354 | Swine | I | No | No |
| Human | I | ||||
| 344 | Swine | V | No | No | |
| Human | V | ||||
| 560 | Swine | V | No | No | |
| Human | V | ||||
Fig. 1Host-specific signature identification method based on both adjusted and unadjusted (Shannon) entropy measurement
Fig. 2The BLOSUM62 scoring matrix for amino acid substitution. A table value for a particular pair of amino acids is the log odds defined as 2log2(P(O)/P(E)) where P(O) is the observed probability of occurrence of the pair and P(E) is the expected probability of occurrence of the pair assuming independence [18]. Similarities between amino acid pairs are based on log odds as described in the text