| Literature DB >> 32723359 |
Andrew D Davidson1, Maia Kavanagh Williamson2, Sebastian Lewis2, Deborah Shoemark3, Miles W Carroll4,5, Kate J Heesom6, Maria Zambon7, Joanna Ellis7, Philip A Lewis2, Julian A Hiscox5,8,9, David A Matthews10.
Abstract
BACKGROUND:Entities:
Mesh:
Substances:
Year: 2020 PMID: 32723359 PMCID: PMC7386171 DOI: 10.1186/s13073-020-00763-0
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Overview of nanopore inferred transcriptome. a The classical transcription map of coronaviruses adapted for SARS-CoV-2. The genome is itself an mRNA which when translated gives rise to polyproteins pp1a and, upon a ribosomal frameshift, pp1ab. These polyproteins are proteolytically processed down to a range of non-structural proteins termed nsp1–16, some of which will form the viral replication-transcription complex (RTC). The RTC then generates subgenomic mRNA which canonically contains a sequence present at the 5′ end of the viral genome known as the leader sequence. The 3′-end of the leader sequence has a motif, the transcription regulatory sequence (TRS), and there are similar sequences which precede each of the functional ORFs downstream of the replicase gene (pp1ab). This TRS in the leader associates with one of the TRS regions present adjacent to each of the other functional ORFS and this mediates discontinuous transcription between the two during minus-strand RNA synthesis. These minus-strand RNA molecules are used as templates to generate positive sense mRNA, and in this manner, the remaining ORFs on the viral genome are placed 5′ most on the resulting subgenomic mRNAs and are subsequently translated. Orange boxes represent structural proteins and yellow boxes represent accessory proteins. b The total read depth across the viral genome for all reads; the maximum read depth was 511,129. c The structure of only the dominant transcript that codes for each of the identified ORFS. Only transcripts that start inside the leader TRS sequence are considered here. The rectangles represent mapped nucleotides and the arrowed lines represent regions of the genome that are not transcribed during the generation of mRNAs. To the right is noted the 5′ most ORF encoded in the transcript; in parenthesis we note how many individual transcripts were observed. Transcrits coding for proteins we subsequently detected by MS/MS are coloured in green
Count of transcripts where the 5′ most ORF is a recognised ORF
| Feature | Count | Percent of total | Average poly A length of the dominant transcript |
|---|---|---|---|
| Total of all features | 72,172 | ||
| N | 27,882 | 38.6327 | 55 |
| None from list | 10,453 | 14.4834 | 58 |
| M | 10,367 | 14.3642 | 55 |
| ORF 7a | 5162 | 7.1523 | 60 |
| ORF 7b | 4786 | 6.6313 | 61 |
| ORF 3a | 3449 | 4.7788 | 56 |
| ORF 6 | 2649 | 3.6703 | 56 |
| ORF 8 | 1447 | 2.0049 | 57 |
| E | 930 | 1.2885 | 56 |
| ORF 9a | 530 | 0.7343 | 56 |
| S glycoprotein Bristol deletion | 86 | 0.1191 | 55 |
| S glycoprotein | 33 | 0.0457 | 54 |
| ORF 3b | 6 | 0.0083 | 43 |
| ORF 9b | 6 | 0.0083 | 56 |
| ORF 10 | 2 | 0.0027 | 51 |
Only transcripts that start to map within the expected leader TRS are considered. For each transcript group, the average polyA length is also shown for the dominant transcript that codes for the indicated ORF. Note that around 14% of transcripts do not apparently code for a known ORF, noted as “none from list”
Fig. 2Deletions within the viral mRNAs encoding the S glycoprotein and N protein. a The read depth over the region deleted in the S glycoprotein together with information on the sequence in the region and the translation in all three frames. b A clustal alignment of four proteins over this region, wild type SARS-CoV, wild type SARS-CoV-2, the artificially deleted version of the wild type SARS-CoV-2 S glycoprotein as reported in Walls et al. [38] and finally the predicted sequence of the deleted protein described here. Highlighted in yellow is the sequence of the unique peptide generated by chymotrypsin digest of the protein which was identified by tandem mass spectrometry. The positions of predicted protease cleavage sites [23] at the S1/S2 boundary are shown. c A proposed deletion in the N protein predicted by multiple aligned transcripts and subsequently identified in trypsin digested protein samples as indicated by the unique peptide highlighted in yellow
Peptide counts for viral proteins
| Protein name | Unique peptides | PSMs | Polyprotein 1ab component | Unique peptides | PSMs | Polyprotein 1ab component | Unique peptides | PSMs |
|---|---|---|---|---|---|---|---|---|
For each protein, the total number of unique peptides is indicated alongside how many PSMs support the peptides identified. In the case of the viral polyprotein pp1ab, we also list how many peptides uniquely mapped to each nsp region
Peptides unique to processed proteins
| Protein | Contributing PSMs | Sequence identified |
|---|---|---|
The polyprotein pp1ab is processed into matured smaller proteins nsp1–16 during infection; this table indicates which unique peptides were identified that could only arise as a result of full polyprotein processing
Phosphopeptide counts
| Protein | Number of distinct identified phosphorylation sites | Location of phospho sites | Number of contributing PSMs |
|---|---|---|---|
| N | 20 | S2, S105, T141, S176, S180, S183, S184, T391, S78, S79, T76, S206, T205, S23, T24, T166, S194, S201, S202, T198 | 74 |
| M | 5 | S211, S212, T208, S213, S214 | 43 |
| S glycoprotein | 13 | S1261, S1161, S1196, T791, Y789, S459, S816, S349, T240, S31, T29, S637, S640 | 21 |
| nsp3 | 5 | S794, S661, T504, S1826, S660 | 16 |
| ORF 3a | 0 | QGEIKDATPSDFVR | 1 |
| nsp9 | 1 | S5 | 1 |
| nsp12 | 0 | GFFKEGSSVELK | 1 |
Listed for each protein is the number of distinct phospho sites identified along with the locations and amino acid modified as well as how many contributing PSMs there are in total. Each named site has confidence of at least 70% as defined by the PhosphoRS node of Proteome Discoverer software. For proteins ORF3a and nsp12, no distinct site could be identified despite a phosphorylated peptide being found; in these cases, the peptide sequence is provided
Fig. 3A space filled model of the wild type SARS-CoV-2 S glycoprotein in a trimeric form using the sequence of a the native or b spike deletant virus, in which the aa’s 679NSPRRARSV687 have been replaced with isoleucine. The model was built using a cryo-EM structure (6VSB.pdb) of the S glycoprotein in the prefusion form (25). Each of the monomers is coloured differently. The loop containing the furin cleavage site (or the shortened loop in the deleted version in b) is indicated in red. The positions of phosphorylation sites identified by mass spectrometry and surface located were mapped on the native structure and shown in yellow in a
Fig. 4Schematic of the location of phosphorylation sites. Proteins M, N, NSP3, NSP9 and S glycoprotein are shown as we have accurate phospho-site data for these proteins. For each location, we indicate the amino acids (S, T or Y) and the amino acid numbering. The S glycoprotein is shown as S1 and S2 to illustrate where the sites would be relative to the major cleavage site on the wild type S glycoprotein
Fig. 5Modelling phosphorylation on the RNA binding domain of N protein. The positions of phosphorylation sites identified by mass spectrometry were mapped on the x-ray crystal structure of the N-terminal RNA binding domain of the N protein (aa residues 47–173) from SARS-CoV-2 (6YVO.pdb). The four monomer units in one asymmetric unit are distinctly coloured and shown as side (a, b) and top (c, d) views as ribbon (left hand figures) and space filling models (right hand figures)