| Literature DB >> 36044525 |
Toshimichi Ikemura1, Yuki Iwasaki1, Kennosuke Wada1, Yoshiko Wada1, Takashi Abe2.
Abstract
Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population are candidates for advantageous mutations, but neutral mutations hitchhiking with advantageous mutations are also likely to be included. To distinguish these, we focus on mutations that appear to occur independently in different lineages and expand in frequency in a convergent evolutionary manner. Batch-learning SOM (BLSOM) can separate SARS-CoV-2 genome sequences according by lineage from only providing the oligonucleotide composition. Focusing on remarkably expanding 20-mers, each of which is only represented by one copy in the viral genome, allows us to correlate the expanding 20-mers to mutations. Using visualization functions in BLSOM, we can efficiently identify mutations that have expanded remarkably both in the Omicron lineage, which is phylogenetically distinct from other lineages, and in other lineages. Most of these mutations involved changes in amino acids, but there were a few that did not, such as an intergenic mutation.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36044525 PMCID: PMC9432735 DOI: 10.1371/journal.pone.0273860
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Histogram analysis of 20-mers.
(A) Histogram of the level of increase of each 20-mer frequency in the Jan. 2022 population compared to that in the Dec. 19 population. The horizontal axis is divided by the increase level by 0.05. The vertical axis shows normal numbers (i) or logarithms (Log) (ii). Here, nonexistence in the logarithmic display is shown expediently as 0. (B) The level of increase of each 20-mer frequency in the Feb. 2022 population compared to that in the Dec. 19 population is displayed as described in A.
Fig 2BLSOM for 1169 Omicron 20-mers.
(A) The total number of nodes (grid points) was set to approximately 1/100 of the total number of genomes: 817 nodes. Nodes that included sequences from more than one lineage are indicated in black, and those containing sequences from a single lineage are shown in the following lineage-specific color with the number of genomes in parentheses: Omicron (, 9191), Alpha (, 10,000), Beta (, 8978), Delta (, 10,000), Epsilon (, 10,000), Eta (, 2304), Gamma (, 10,000), Iota (, 10,000), Kappa (, 2924), L (, 125), Lambda (, 1942), Mu (, 3889), S (, 271), Theta (, 155), V (, 28) and Zeta(, 1422). Nodes that included no sequences were left in blank (white). (B) Mapping of newly downloaded BA.1 and BA.2 sequences on the BLSOM presented in A. (C) U-matrix. BA.1, BA.2, Alpha and Iota territories, which are visually separated by black lines of the U-matrix, are specified. (D) Heatmaps for five 20-mers. Those for all 1169 20-mers are presented in S1 Fig.
Fig 3BLSOM for heatmap patterns.
(A) BLSOM of 1169 heatmaps from the BLSOM presented in Fig 2A. Nodes including more than one heatmap are indicated in black, and those that include no patterns were left in blank (white). The 2D patten is deliberately tilted to show the relationship with the following 3D display in an easy-to-understand manner. (B) 3D display of the BLSOM presented in A. The number of patterns belonging to each node is indicated by the height of the colored column, and for each column, a representative example of heatmaps attributed to the corresponding node is shown.
Mutations shared between Omicron and some of other lineages.
| Mutation | Gene | Product | A.A change | BA.2 | BA.1 | Lambda | Iota | Delta | Mu | Alpha | Beta | Gamma | Epsilon | S | L |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C10029T | ORF1ab | nsp4 | T492I |
|
|
|
|
| |||||||
| 11288_11296del | ORF1ab | nsp6 | SGF3675del |
|
|
|
| ||||||||
| C23604A | S | spike | P681H |
|
|
|
|
|
| ||||||
| C28311T | N | nucleocapsid | P13L |
|
|
|
|
| |||||||
| G22813T | S | spike | K417N |
|
|
|
|
| |||||||
| G22992A | S | spike | S477N |
|
|
|
|
| |||||||
| A28271T | Intergenic |
|
|
| |||||||||||
| C23525T | S | spike | H655Y |
|
|
| |||||||||
| C22995A | S | spike | T478K |
|
|
| |||||||||
| A23013C | S | spike | E484A |
|
| ||||||||||
| A23055G | S | spike | Q498R |
|
| ||||||||||
| A23063T | S | spike | N501Y |
|
| ||||||||||
| 28363_28374del | N | nucleocapsid | ERS31del |
|
| ||||||||||
| C17410T | ORF1ab | nsp13 | R5716C |
|
|
| |||||||||
| C9344T | ORF1ab | nsp4 | L3027F |
|
| ||||||||||
| C12880T | ORF1ab | nsp9 |
|
| |||||||||||
| C21618T | S | spike | T19I |
|
| ||||||||||
| C22792T | S | spike |
|
| |||||||||||
| C26060T | ORF3a | ORF3a | T223V |
|
|
|
|
|
|
|
|
|
|
|
|
*Intergenic between ORF8 and N.