| Literature DB >> 25506939 |
Yu-Nong Gong1, Guang-Wu Chen2, Chi-Jene Chen3, Rei-Lin Kuo3, Shin-Ru Shih3.
Abstract
The influenza A virus contains 8 segmented genomic RNAs and was considered to encode 10 viral proteins until investigators identified the 11th viral protein, PB1-F2, which uses an alternative reading frame of the PB1 gene. The recently identified PB1-N40, PA-N155 and PA-N182 influenza A proteins have shown the potential for using a leaking ribosomal scanning mechanism to generate novel open reading frames (ORFs). These novel ORFs provide examples of the manner in which the influenza A virus expands its coding capacity by using overlapping reading frames. In this study, we performed a computational search, based on a ribosome scanning mechanism, on all influenza A coding sequences to identify possible forward-reading ORFs that could be translated into novel viral proteins. We specified that the translated products had a prevalence ≥5% to eliminate sporadic ORFs. A total of 1,982 ORFs were thus identified and presented in terms of their locations, lengths and Kozak sequence strengths. We further provided an abridged list of ORFs by requiring every candidate an upstream start codon (within the upstream third of the primary transcript), a strong Kozak consensus sequence and high prevalence (≥95% and ≥50% for in-frame and alternative-frame ORFs, respectively). The PB1-F2, PB1-N40, PA-N155 and PA-N182 proteins all fulfilled our filtering criteria. Subject to these three stringent settings, we additionally named 16 novel ORFs for all influenza A genomes except for HA and NA, for which 43 HA and 11 NA ORFs from their respective subtypes were also recognized.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25506939 PMCID: PMC4266615 DOI: 10.1371/journal.pone.0115016
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
ORF data for influenza A viruses.
| Gene | Length of transcripts(aa) | Sequence countsanalyzed | Total ORF counts(F1, F2, F3) | Initiation sites peramino acid (×10−2) |
| PB2 | 759 | 19,320 | 70 (37, 26, 7) | 9.2 |
| PB1 | 757 | 19,378 | 81 (38, 33, 10) | 10.7 |
| PA | 716 | 20,404 | 77 (26, 35, 16) | 10.8 |
| NP | 498 | 20,219 | 58 (26, 28, 4) | 11.6 |
| M1 | 252 | 28,216 | 24 (14, 8, 2) | 9.5 |
| M2 | 97 | 24,218 | 8 (2, 3, 3) | 8.2 |
| M42 | 99 | 18,349 | 8 (2, 3, 3) | 8.1 |
| NS1 | 230 | 23,125 | 28 (17, 7, 4) | 12.2 |
| NS2 | 121 | 21,536 | 15 (10, 2, 3) | 12.4 |
| NS3 | 194 | 512 | 16 (11, 4, 1) | 8.2 |
| H1 | 566 | 11,238 | 62 (9, 38, 15) | 11.0 |
| H2 | 562 | 468 | 76 (17, 44, 15) | 13.5 |
| H3 | 566 | 8025 | 69 (11, 46, 12) | 12.2 |
| H4 | 564 | 909 | 50 (6, 32, 12) | 8.9 |
| H5 | 568 | 1501 | 55 (17, 30, 8) | 9.7 |
| H6 | 566 | 1085 | 71 (16, 42, 13) | 12.5 |
| H7 | 560 | 750 | 67 (19, 36, 12) | 12.0 |
| H8 | 566 | 92 | 60 (14, 36, 10) | 10.6 |
| H9 | 560 | 1180 | 66 (16, 37, 13) | 11.8 |
| H10 | 561 | 509 | 57 (17, 27, 13) | 10.2 |
| H11 | 565 | 382 | 59 (8, 36, 15) | 10.4 |
| H12 | 564 | 124 | 60 (12, 34, 14) | 10.6 |
| H13 | 566 | 38 | 68 (12, 44, 12) | 12.0 |
| H14 | 568 | 11 | 42 (7, 29, 6) | 7.4 |
| H15 | 570 | 10 | 48 (14, 24, 10) | 8.4 |
| H16 | 565 | 31 | 65 (12, 38, 15) | 11.5 |
| H17 | 564 | 3 | 40 (11, 24, 5) | 7.1 |
| H18 | 561 | 1 | 39 (8, 23, 8) | 7.0 |
| N1 | 469 | 9918 | 55 (13, 25, 17) | 11.7 |
| N2 | 469 | 9651 | 52 (10, 27, 15) | 11.1 |
| N3 | 469 | 699 | 55 (9, 33, 13) | 11.7 |
| N4 | 470 | 188 | 60 (11, 30, 19) | 12.8 |
| N5 | 473 | 167 | 48 (10, 24, 14) | 10.1 |
| N6 | 470 | 1182 | 57 (15, 23, 19) | 12.1 |
| N7 | 470 | 471 | 44 (9, 22, 13) | 9.4 |
| N8 | 470 | 1362 | 61 (10, 31, 20) | 13.0 |
| N9 | 470 | 421 | 53 (7, 25, 21) | 11.3 |
| N10 | 442 | 3 | 28 (9, 12, 7) | 6.3 |
| N11 | 447 | 1 | 28 (6, 16, 6) | 6.3 |
An ORF is defined as containing a start codon AUG at a given genomic position, with ≥5% prevalence from all analyzed sequences from the NCBI.
Figure 1ORF maps of PB1 and PA genes.
a) Eighty-one putative ORFs of PB1 gene are graphed in 3 forward-reading frames: F1, F2 and F3. Each colored box represents the start codon of an ORF, with red, green and blue indicating strong, medium and weak Kozak sequences, respectively. Numbers from 1 are used by each start codon to indicate positions. For ORFs with mixed populations of different Kozak strengths, multiple colors are used to represent the population proportions. For example, the ORF at position 520 is 74% green and 26% blue. F2 and F3 ORFs of major prevalence (>50%) are additionally labeled by their frequencies. ORFs of lower prevalence (<95% for F1, <50% for F2/F3) are grey-labelled. Each ORF ends with a solid line. For F2 or F3 ORFs of various ORF sizes, the corresponding dominant length is used to close the ORF. Four previously reported PB1 ORFs, PB1-F2, PB1-N40, sORF1 and sORF2, are bold-highlighted. b) Seventy-seven putative ORFs of the PA gene. The previously reported PA-N155 and PA-N182 are bold-highlighted at positions 463 and 544, respectively. Note that some ORFs overlap, such as those at positions 114 and 132. In such cases, 114* and 132*, in grey, are used to mark the end of the individual ORF, starting from position 114 and 132, respectively.
An abridged ORF list with additional constraints.
| ORF | Position (Frame) | Dominant length (aa) | Prevalence (%) |
| PB2-N51 | 151 (1) | 709 | 99 |
| PB2-N90 | 268 (1) | 670 | 98 |
| PB2-F3 | 291 (3) | 3 | 72 |
| PB2-F2 | 305 (2) | 23 | 75 |
| PB2-N164 | 490 (1) | 596 | 99 |
| PB1-N92 | 274 (1) | 666 | 99 |
| PB1-N111 | 331 (1) | 647 | 97 |
| PB1-N171 | 511 (1) | 587 | 96 |
| PB1-N199 | 595 (1) | 559 | 99 |
| PA-F2 | 71 (2) | 37 | 51 |
| PA-N86 | 256 (1) | 631 | 97 |
| NP-N66 | 196 (1) | 433 | 99 |
| NP-F2 | 302 (2) | 4 | 52 |
| NP-F3 | 309 (3) | 8 | 53 |
| NP-N159 | 475 (1) | 340 | 99 |
| M1-F3 | 132 (3) | 12 | 81 |
The ORFs in Table 1 were screened with additional filters, strong Kozak consensus sequence, upstream AUG location (within the first third of the transcript), and high prevalence (≥95% and ≥50% for in-frame and alternative-frame ORFs, respectively), to obtain 16 novel ORFs. These ORFs were named, according to conventions, as PB1-N40, PA-N155, PA-N182 (in-frame) and PB1-F2 (alternative-frame). The 4 known ORFs are excluded from this table.
Figure 2Length variability of PB1-F2.
Of 19,378 PB1 sequences analyzed, 19,339 contain a start codon at position 95, signaling the beginning of the PB1-F2 ORF. The downstream stop codons vary, resulting in different ORF sizes. The 3 major PB1-F2 lengths are 90, 57 and 11 aa. It is generally assumed that an intact PB1-F2 is ≥79 aa in length.
BLASTP hits against the NR database for all putative ORFs.
| Case | Gene/Position (frame) | Accession numbers of database hits | Database hit info |
| 1 | PB1/47 (2) | Q288Y7, P0C5U7, AGR49480, AGO51397, AFV71377, AFR42732, AFJ12638, AEZ01190, AEX35415, AEO89179, ACL12403, ACJ26075, ACH58918, AGQ83473, AFO83333, ACO36494, AHB51242, AHB22709, ABI85213 | PB1-F2 |
| 2 | PB1/1925 (2), PB1/1946 (2), PB1/1970 (2), PB1/1973 (2), PB1/2054 (2) | AHB23179, AHB23700, AHB23730, AHB23769, AHB23797, AHB23845, AHB24489, AHB24622, AHB51934, AHB51987, AHB52010, AHB52086, AHB52204, AHB52240, AHB52291 | PB1-F2 |
| 3 | PA/1949 (2) | BAA01430 | DI-2 protein, A/WSN/1933(H1N1) |
| 4 | H3/1596 (3) | AAA72667, AAA72249 | Fusion protein, synthetic construct |
| 5 | PB1/1680 (3), PB1/1737 (3) | AEI29961 | A/environment/Korea/CSM3/2002(H3N6) |
| 6 | NP/605 (2), NP/632 (2), NP/656 (2) | AAV68025 | A/swine/Korea/S452/2004(H9N2) |
| 7 | H4/1277 (2) | AAA43224 | A/seal/Massachussetts/133/1982(H4N5) |
| 8 | N2/728 (3) | ACA04672 | A/duck/Eastern China/48/2002(H11N2) |
All hits for the putative ORFs listed in Table 1 are grouped into 8 cases according to the queried ORFs and hits returned from the database.
HHblits hits against PDB database for alternative-frame ORFs.
| Query (Count)/Frame No. | Avg. E-value | Avg. identity (%) | Avg. prob. (%) |
| PDB ID: Definition/Classification/Organism/Molecule Name | |||
| One representative alignment | |||
| PB1-1946 (22), PB1-1973 (6), PB1-2054 (1194)/Frame-2 | 6.74E-05 | 68.7 | 95.7 |
|
| |||
|
| |||
|
| |||
|
| |||
|
| |||
| 1 | |||
| NP-1103 (5), NP-1109 (37)/Frame-2 | 3.11E-06 | 53.0 | 96.7 |
|
| |||
|
| |||
|
| |||
|
| |||
|
| |||
| 1 | |||
| H1-1316 (32)/Frame-2 | 1.82E-07 | 50.5 | 97.3 |
| 2DS5: Structure of the ZBD in the orthorhomibic crystal form/Metal Binding Protein, Protein Binding/Escherichia coli/ATP-dependent Clp protease ATP-binding subunit clpX | |||
|
| |||
|
| |||
|
| |||
|
| |||
| 2 | |||
| N7-282 (40)/Frame-3 | 5.00E-06 | 42.0 | 96.6 |
| 1ZY4: Crystal Structure of eIF2alpha Protein Kinase GCN2/Transferase/Saccharomyces cerevisiae/Serine/threonine-protein kinase GCN2 | |||
|
| |||
|
| |||
|
| |||
|
| |||
| 1 | |||
HHblits hits were grouped by PDB ID or putative ORF. E-value, sequence identity and probability were provided by averaging ones from HHblits hits. The alignments between putative ORFs and structures were generated by HHblits tool. The number in parenthesis following a sequence alignment is the length of either query or subject. Consensus sequences are shown in HMM format for both the query and the subject, in which capital and lowercase letters are used to represent conserved columns with ≥60 and ≥40 probabilities, respectively, and a tilde “∼” is used to represent non-conserved column. Based on default settings with two iterations, HHblits tool adds significant hit(s) from the previous search/iteration to the HMM query for the next search/iteration. It results to that our query may not be single sequence. Five symbols are used to show the alignment quality, in which “|”, “+”, “.”, “−” and “ = ” each represents a quality from the perfect to the worst.
Two groups of putative ORFs from PB1 and NP genes hit to 1WRG structure.