| Literature DB >> 31973709 |
Rui Yin1, Xinrui Zhou2, Shamima Rashid2, Chee Keong Kwoh2.
Abstract
BACKGROUND: Influenza reassortment, a mechanism where influenza viruses exchange their RNA segments by co-infecting a single cell, has been implicated in several major pandemics since 19th century. Owing to the significant impact on public health and social stability, great attention has been received on the identification of influenza reassortment.Entities:
Keywords: Host tropism; Influenza; Random forest; Reassortment estimation
Mesh:
Year: 2020 PMID: 31973709 PMCID: PMC6979075 DOI: 10.1186/s12920-019-0656-7
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Schematic overview of analysis workflow in HopPER. a The general diagram of the host prediction model based on seven physicochemical properties and reassortment probability estimation in the random forest. b Specific algorithmic steps for estimation probability model on influenza genome reassortment detection
The number of influenza sequences for selected segments on avian, human, swine hosts and combined dataset
| Protein | Host type | |||
|---|---|---|---|---|
| Avian | Human | Swine | Combined | |
| HA | 12248 | 13607 | 6257 | 32112 |
| NA | 9452 | 10107 | 5734 | 25293 |
| NP | 4841 | 2659 | 2292 | 9792 |
| PA | 8428 | 5498 | 3059 | 16985 |
| PB1 | 7699 | 4869 | 2892 | 15460 |
| PB2 | 8106 | 5490 | 2901 | 16497 |
| NS1 | 6115 | 4133 | 2662 | 12910 |
| M2 | 2237 | 1404 | 1534 | 5175 |
Fig. 2The structure of random forest T for probability estimation. θ is an independent random draw and f(θ,x0) stands for the probability estimate by associated tree t at point x0. (y|x0) characterizes the aggregation of conditional probability of all trees for label y
Performance of host tropism predictive models for individual proteins on independent training and testing data
| Model | Training data | Testing data | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | G-means | MCC | Accuracy | Precision | Recall | G-means | MCC | |
| HA | 0.966 | 0.967 | 0.956 | 0.953 | 0.943 | 0.965 | 0.969 | 0.956 | 0.955 | 0.947 |
| NA | 0.961 | 0.962 | 0.953 | 0.953 | 0.939 | 0.957 | 0.958 | 0.95 | 0.949 | 0.933 |
| NP | 0.947 | 0.944 | 0.933 | 0.931 | 0.912 | 0.954 | 0.951 | 0.944 | 0.943 | 0.927 |
| PA | 0.929 | 0.916 | 0.893 | 0.89 | 0.881 | 0.922 | 0.906 | 0.892 | 0.888 | 0.875 |
| PB1 | 0.931 | 0.927 | 0.907 | 0.902 | 0.887 | 0.937 | 0.933 | 0.914 | 0.912 | 0.898 |
| PB2 | 0.943 | 0.937 | 0.912 | 0.913 | 0.906 | 0.945 | 0.938 | 0.923 | 0.921 | 0.911 |
| NS1 | 0.934 | 0.928 | 0.917 | 0.916 | 0.896 | 0.931 | 0.93 | 0.919 | 0.917 | 0.896 |
| M2 | 0.876 | 0.866 | 0.856 | 0.854 | 0.805 | 0.865 | 0.86 | 0.853 | 0.848 | 0.795 |
The results of reassortant strains identified using HopPER that was validated by alternative methods for reassortment analysis
| Datasets | Number of genomes | Original Methods | Identified number by HopPER | TPV |
|---|---|---|---|---|
| Karasin et al. | 18 | Genetic and phylogenetic analyses with cycle sequencing and amplification by reverse transcription-PCR. | 16 | 0.889 |
| Kingsford et al. | 16 | Enumerating maximal bicliques with a defined incompatibility graph to detect high-probability inconsistencies between the distributions of trees. | 14 | 0.875 |
| Olsen et al. | 6 | Phylogenetic analysis by the method of maximum parsimony with bootstrap resampling for the genetic characterization of reassortant H3N2 viruses. | 6 | 1.000 |
| Khiabanian et al. | 39 | Applying statistical methods such as diversity and entropy measures of each segment and its correlations to investigate reassortment partterns. | 33 | 0.846 |
| de Silva et al. | 36 | Comprehensive analysis based on neighbourhood of each segment and using only nucleotide distance matrix as input to formulate the phylogeny. | 29 | 0.806 |
| Niranjan et al. | 93 | Graph-incompatibility based reassortment finder that searches large collections of Markov chain Monte Carlo-sampled trees for groups of incompatible splits using a graph mining technique. | 80 | 0.860 |
Reassortment patterns on host distribution of selected avian (0), human (1) and swine (2) strains and the gap ’-’ denoted the missing sequence in the genome
| Strain | Subtype | HA | M2 | NA | NP | NS1 | PA | PB1 | PB2 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Avian | A/domestic teal/Hunan/79/2005 | H5N1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.701 |
| A/pekin duck/California/P30/2006 | H4N2 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0.856 | |
| A/mallard/Pennsylvania/454069-12/2006 | H5N4 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.804 | |
| A/chicken/Hubei/C1/2007 | H9N2 | 0 | 2 | 0 | 0 | 2 | 2 | 0 | 0 | 0.976 | |
| Human | A/California/05/2009 | H1N1 | 1 | 1 | 1 | 1 | - | 1 | 1 | 1 | 0.888 |
| A/Texas/05/2009 | H1N1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 0.993 | |
| A/California/04/2009 | H1N1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 0.885 | |
| A/New Jersey/1976 | H1N1 | 2 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0.984 | |
| Swine | A/Thailand/271/2005 | H1N1 | 1 | - | 1 | 0 | 2 | 2 | 2 | 2 | 0.995 |
| A/swine/Ontario/00130/97 | H3N2 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | |
| A/swine/Ontario/53518/03 | H1N1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 0.959 | |
| A/swine/Hong Kong/273/1994 | H1N1 | 1 | 2 | 1 | 1 | 2 | 2 | 1 | 2 | 0.999 |
The number of predicted reassortant strains identified by HopPER for complete and incomplete genomes in both real and synthetic datasets
| Genomes | Integrity of genome | Predicted reassortants/total number |
|---|---|---|
| Real | Complete | 154/173 |
| Incomplete | 24/35 | |
| Synthetic | Complete | 83/85 |
| Incomplete | 19/25 |
Fig. 3The RSR of three distinct host species of influenza strains across different years detected by HopPER. a The RSR of avian species from 1988 to 2017. b The RSR of human species from 1975 to 2017. c The RSR of swine species from 2000 to 2017