| Literature DB >> 35341060 |
Sk Sarif Hassan1, Vaishnavi Kodakandla2, Elrashdy M Redwan3, Kenneth Lundstrom4, Pabitra Pal Choudhury5, Tarek Mohamed Abd El-Aziz6, Kazuo Takayama7, Ramesh Kandimalla8, Amos Lal9, Ángel Serrano-Aroca10, Gajendra Kumar Azad11, Alaa A A Aljabali12, Giorgio Palù13, Gaurav Chauhan14, Parise Adadi15, Murtaza Tambuwala16, Adam M Brufsky17, Wagner Baetas-da-Cruz18, Debmalya Barh19, Vasco Azevedo20, Nikolas G Bazan21, Bruno Silva Andrade22, Raner José Santana Silva23, Vladimir N Uversky24.
Abstract
Open reading frame 8 (ORF8) shows one of the highest levels of variability among accessory proteins in Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of Coronavirus Disease 2019 (COVID-19). It was previously reported that the ORF8 protein inhibits the presentation of viral antigens by the major histocompatibility complex class I (MHC-I), which interacts with host factors involved in pulmonary inflammation. The ORF8 protein assists SARS-CoV-2 in evading immunity and plays a role in SARS-CoV-2 replication. Among many contributing mutations, Q27STOP, a mutation in the ORF8 protein, defines the B.1.1.7 lineage of SARS-CoV-2, engendering the second wave of COVID-19. In the present study, 47 unique truncated ORF8 proteins (T-ORF8) with the Q27STOP mutations were identified among 49,055 available B.1.1.7 SARS-CoV-2 sequences. The results show that only one of the 47 T-ORF8 variants spread to over 57 geo-locations in North America, and other continents, which include Africa, Asia, Europe and South America. Based on various quantitative features, such as amino acid homology, polar/non-polar sequence homology, Shannon entropy conservation, and other physicochemical properties of all specific 47 T-ORF8 protein variants, nine possible T-ORF8 unique variants were defined. The question as to whether T-ORF8 variants function similarly to the wild type ORF8 is yet to be investigated. A positive response to the question could exacerbate future COVID-19 waves, necessitating severe containment measures.Entities:
Keywords: COVID-19; Continent distribution; Intrinsically disordered region; ORF8; SARS-CoV-2; Truncated; Truncation mutation
Year: 2022 PMID: 35341060 PMCID: PMC8944340 DOI: 10.7717/peerj.13136
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Frequency and percentages unique T-ORF8 variants (continent-wise).
|
| ||||
|---|---|---|---|---|
| Continent |
|
|
|
|
|
| 108 | 1 | 0.926% | 1.96% |
|
| 99 | 1 | 1.01% | 1.96% |
|
| 156 | 1 | 0.641% | 1.96% |
|
| 1 | 1 | 100% | 1.96% |
|
| 48,691 | 47 | 0.096% | 92.16% |
|
| 49,055 | 47 | 0.104% | |
Note:
Here ‘U’ stands for the total number of unique T-ORF8 variants over the total available T-ORF8 sequences, which is denoted by ‘T’.
Truncated ORF8 variants of length other than 26 amino acids.
|
| Length (number of amino acid residues) | Date of collection | Geo-location | Remarks |
|---|---|---|---|---|
|
| 22 | 20-10-2020 | USA: KS | 2*Identical sequence |
|
| 22 | 24-09-2020 | USA: MO | |
|
| 24 | 27-04-2021 | USA: Colorado | Worldwide frequency: 01 |
|
| 40 | 13-12-2020 | USA: MD | Worldwide frequency: 01 |
|
| 41 | 30-10-2020 | USA: OK | Worldwide frequency: 01 |
|
| 41 | 09-04-2020 | USA | 2*Identical sequence |
|
| 41 | 16-04-2020 | USA |
Distribution of cumulative frequency of P15 variants across North America.
| Geo-location | Frequency | Geo-location | Frequency | Geo-location | Frequency |
|---|---|---|---|---|---|
| Wyoming | 56 | North Carolina | 776 | Iowa | 141 |
| Wisconsin | 383 | New York | 887 | Indiana | 823 |
| West Virginia | 289 | New Mexico | 250 | Illinois | 1,426 |
| Washington | 83 | New Jersey | 1,815 | Idaho | 85 |
| Virginia | 917 | New Hampshire | 234 | Hawaii | 16 |
| Vermont | 209 | Nevada | 157 | Guam | 7 |
| Utah | 97 | Nebraska | 105 | Georgia | 1,232 |
| Texas | 3,420 | Montana | 25 | Florida | 6,884 |
| Tennessee | 993 | Missouri | 254 | District of Columbia | 61 |
| South Dakota | 86 | Mississippi | 40 | Delaware | 70 |
| South Carolina | 261 | Minnesota | 7,416 | Connecticut | 496 |
| Rhode Island | 339 | Michigan | 5,084 | Colorado | 533 |
| Puerto Rico | 224 | Massachusetts | 2,761 | California | 1,727 |
| Pennsylvania | 3,285 | Maryland | 1,171 | CA, Santa Clara County | 4 |
| Oregon | 166 | Maine | 79 | CA, Humboldt | 20 |
| Okhlahoma | 81 | Louisiana | 223 | Arkansas | 62 |
| Ohio | 1,191 | Kentucky | 145 | Arizona | 290 |
| North Dakota | 13 | Kansas | 100 | Alaska | 65 |
| Alabama | 168 | ||||
| USA | 654 |
Frequency distribution of unique T-ORF8 variants over the USA.
| USA: states | Unique T-ORF8 variants | USA: states | Unique T-ORF8 variants |
|---|---|---|---|
| USA: California | P1, P30, P40 | USA: Missouri | P28 |
| USA: Connecticut | P32, P33 | USA: New Jersey | P10, P41 |
| USA: Florida | P4, P14, P16 | USA: Ohio | P2 |
| USA: Georgia | P21 | USA: North Carolina | P47 |
| USA: Illinois | P18 | USA: Pennsylvania | P7, P17, P19, P20, P23, P27, P39 |
| USA: Kentucky | P36 | USA: Puerto Rico | P24 |
| USA: Louisiana | P31 | USA: Tennessee | P5, P22 |
| USA: Maryland | P15, P26, P29, P34 | USA: Rhode Island | P35 |
| USA: Massachusetts | P35, P42 | USA: Texas | P6, P9, P11, P25 |
| USA: Michigan | P8, P38, P43, P46 | USA: Utah | P45 |
| USA: Minnesota | P3, P12, P13, P37, P44 |
Figure 1Maximum likelihood phylogenetic tree for the 47 truncated ORF8, using 500 bootstrap replications and the Hasegawa-Kishino-Yano model.
Nine group clades were found, while sequences 35 and 46 (marine blue and purple arrows, respectively) are phylogenetically near to the RATG13 ORF8 sequence. Sequence 15 is indicated by a red arrow.
Figure 2Analysis of intrinsic disorder predisposition of 47 T-ORF8 proteins.
(A) Disorder profiles generated using the PONDR-VSL2 disorder predictor. Three thresholds of predicted disorder scores (PDSs) are shown, 0.15, 0.25, and 0.5, which are used for the classification of protein residues as highly disordered (PDS ≥ 0.5), flexible (0.25 ≤ PDS < 0.5), moderately flexible (0.15 ≤ PDS < 0.25) and mostly ordered (PDS < 0.15). B. Ranking of 47 T-ORF8 proteins based on their mean disorder scores.
Figure 3Pairwise distance matrix of amino acid frequency vectors of the unique T-ORF8 variants.
Figure 4Distance matrix of property vectors and derived phylogenetic tree of 45 T-ORF8 variants.
(A) The distance matrix; (B) phylogenetic tree based on physicochemical properties.
Figure 5A schematic representation of a possible cluster of unique T-ORF8 variants which were residing in the likelihood of P15 variant.
Note: the frequency of each length of T-ORF8 protein was mentioned in parentheses. T-ORF8 variants mentioned in each box were found in the close likelihood of P15 T-ORF8 variants.