Literature DB >> 36069983

Application of Statistical Learning to Identify Omicron Mutations in SARS-CoV-2 Viral Genome Sequence Data From Populations in Africa and the United States.

Lue Ping Zhao1, Terry P Lybrand2,3, Peter Gilbert4, Margaret Madeleine1, Thomas H Payne5, Seth Cohen4,5, Daniel E Geraghty6, Keith R Jerome4,5, Lawrence Corey4.   

Abstract

Importance: With timely collection of SARS-CoV-2 viral genome sequences, it is important to apply efficient data analytics to detect emerging variants at the earliest time. Objective: To evaluate the application of a statistical learning strategy (SLS) to improve early detection of novel SARS-CoV-2 variants using viral sequence data from global surveillance. Design, Setting, and Participants: This case series applied an SLS to viral genomic sequence data collected from 63 686 individuals in Africa and 531 827 individuals in the United States with SARS-CoV-2. Data were collected from January 1, 2020, to December 28, 2021. Main Outcomes and Measures: The outcome was an indicator of Omicron variant derived from viral sequences. Centering on a temporally collected outcome, the SLS used the generalized additive model to estimate locally averaged Omicron caseload percentages (OCPs) over time to characterize Omicron expansion and to estimate when OCP exceeded 10%, 25%, 50%, and 75% of the caseload. Additionally, an unsupervised learning technique was applied to visualize Omicron expansions, and temporal and spatial distributions of Omicron cases were investigated.
Results: In total, there were 2698 cases of Omicron in Africa and 12 141 in the United States. The SLS found that Omicron was detectable in South Africa as early as December 31, 2020. With 10% OCP as a threshold, it may have been possible to declare Omicron a variant of concern as early as November 4, 2021, in South Africa. In the United States, the application of SLS suggested that the first case was detectable on November 21, 2021. Conclusions and Relevance: The application of SLS demonstrates how the Omicron variant may have emerged and expanded in Africa and the United States. Earlier detection could help the global effort in disease prevention and control. To optimize early detection, efficient data analytics, such as SLS, could assist in the rapid identification of new variants as soon as they emerge, with or without lineages designated, using viral sequence data from global surveillance.

Entities:  

Mesh:

Year:  2022        PMID: 36069983      PMCID: PMC9453543          DOI: 10.1001/jamanetworkopen.2022.30293

Source DB:  PubMed          Journal:  JAMA Netw Open        ISSN: 2574-3805


Introduction

The COVID-19 Omicron variant has rapidly become a dominant variant in the United States following the first official report to the World Health Organization (WHO) on November 24, 2021, by South Africa and officially designated a variant of concern (VOC) on November 26, 2021, followed by the US Centers for Disease Control and Prevention (CDC) on November 30, 2021. Reports of infections, hospitalizations, and deaths in South Africa, the United Kingdom, and other countries[1,2] suggest that Omicron is highly transmissible and has numerous breakthrough infections but causes relatively minor disease among vaccinated patients. Its rapid spread suggests that an earlier detection might help future planning for such highly transmissible variants. We present here a statistical learning strategy (SLS) for detecting new variants based on an established data resource, the Global Initiative on Sharing Avian Influenza Data (GISAID), which archives COVID-19 sequences worldwide.[3,4] We applied this strategy to assess whether we could have detected the expansion of the Omicron variant earlier in Africa.

Methods

GISAID and Patient Populations

GISAID is a global science initiative and provides open-access to genomic data of SARS-CoV-2 in the COVID-19 pandemic.[3,4,5] We accessed GISAID for full viral genome sequences collected from January 1, 2020, to December 28, 2021, from 63 686 patients in more than 30 African countries and used them to trace the origin of Omicron in Africa. Similarly, we retrieved 531 827 full viral genome sequences from the United States (October 1 to December 27, 2021) to track the expansion of the Omicron variant in all US states and territories. This study was determined to be exempt under 45 CFR § 46.104(d)(4), exempt from informed consent, and waived from further review by Fred Hutchinson Research Center institutional review board.

Viral Sequences and Haplotypes

After obtaining sequences, we aligned them against the reference genome of the SARS-CoV-2 (Covid-ref-NC_045512) and performed quality control on aligned sequences. We then extracted nucleotides in the spike protein (21 563-25 483 base pair) and translated them into protein sequences. We identified 28 amino acid mutating substitutions, known as polymutants (PM) here, that constitute a core Omicron haplotype (A67V, T95I, G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, K796Y, N856K, Q954H, N69K, and L981F)[6] because they were consistently observed among most Omicron genomes (L.P.Z., unpublished data, 2022). For missing PMs, we performed imputation to fill in missing residues (if posterior probabilities exceeded 95%) using observed polymorphic nucleotides (outside of the spike protein) and their haplotype structures. If imputation was not successful, remaining missing residues are denoted as x in sequences. Haplotypes of 28 Omicron mutations were used to define Omicron variants. We found that a core Omicron haplotype (VIDLPFNKSNKARSRYHKGYKHKYKHKF) characterizes 90% of all Omicron viruses (L.P.Z., unpublished data, 2022). However, this haplotype remained polymorphic, with newly emerging mutations or missing residues. To balance sensitivity and specificity, we set up a base set of Omicron haplotypes and then expanded the base set to include additional haplotypes if the maximum sequence similarity to any initial base haplotype exceeded 80% with 4 or fewer missing observations, or a maximum sequence similarity greater than 90% if there were more than 4 missing observations. As a result, we identified a total of 343 polymorphic haplotypes, each of which had from 10 to 28 mutations in the spike protein relative to the original SARS-CoV-2 sequence. Figure 1 presents 20 haplotypes observed 10 or more times, and eTable 1 in the Supplement includes a complete list. Among 12 141 Omicron viruses identified in US using the haplotype protocol, 98% of them were of the BA.1 (n = 11 900) or B.1.1.529 (n = 44) lineages assigned by Phylogenetic Assignment of Named Global Outbreak Lineages (PANGO) software[7] (eTable 2 in the Supplement). We additionally identified 1 case of BA.2, 2 cases of BA.3, 4 cases of B.1, 2 cases of B.1.1., 1 case each of B.1.1.161 and B.1.1.523, and 1 case each of AY.1, AY.103, AY.42 lineages. The remaining 183 cases were unclassified. The primary reason for choosing this haplotype protocol to identify the Omicron cases, rather than using the PANGO assigned lineages, is that this strategy can rapidly identify new variants before any lineage is established and assigned, as explained in the Discussion section.
Figure 1.

Omicron Haplotype and Polymorphic Variations in US Sequences Detected in 10 or More Viruses

Omicron samples were collected from October 1 to December 24, 2021. Omicron haplotypes were assumed to have at least 10 mutations from the reference, and, in the event of missing amino acids, remaining amino acids have at least 80% of selected Omicron haplotypes. A full set of haplotypes in the United States are listed in eTable 1 in the Supplement, while those in Africa are listed in eTable 7 in the Supplement. A total of 28 polymutants (PMs) were selected, as follows: 1, A67; 2, T95; 3, G339; 4, S371; 5, S373; 6, S375; 7, K417; 8, N440; 9, G446; 10, S477; 11, T478; 12, E484; 13, Q493; 14, G496; 15, Q498; 16, N501; 17, Y505; 18, T547; 19, D614; 20, H655; 21, N679; 22, P681; 23, N764; 24, D796; 25, N856; 26, Q954; 27, N969; and 28, L981. ID indicates identifier.

Omicron Haplotype and Polymorphic Variations in US Sequences Detected in 10 or More Viruses

Omicron samples were collected from October 1 to December 24, 2021. Omicron haplotypes were assumed to have at least 10 mutations from the reference, and, in the event of missing amino acids, remaining amino acids have at least 80% of selected Omicron haplotypes. A full set of haplotypes in the United States are listed in eTable 1 in the Supplement, while those in Africa are listed in eTable 7 in the Supplement. A total of 28 polymutants (PMs) were selected, as follows: 1, A67; 2, T95; 3, G339; 4, S371; 5, S373; 6, S375; 7, K417; 8, N440; 9, G446; 10, S477; 11, T478; 12, E484; 13, Q493; 14, G496; 15, Q498; 16, N501; 17, Y505; 18, T547; 19, D614; 20, H655; 21, N679; 22, P681; 23, N764; 24, D796; 25, N856; 26, Q954; 27, N969; and 28, L981. ID indicates identifier.

Outcome

We used the set of 343 Omicron haplotypes to identify Omicron viruses. In total, there were 2698 Omicron cases in Africa (eTable 3 in the Supplement) and 12 141 Omicron cases in the United States (eTable 4 in the Supplement).

Sample Collection Time and Location

Metadata associated with each viral sequence included sample collection date and location. Their distributions in Africa are listed in the eTable 3 in the Supplement, and those in the United States in eTable 4 in the Supplement. The location was organized by continent, country, region, and subregion. Metadata also included assigned clade by GISAID[5] and lineage by PANGO.[7]

Statistical Analysis

To analyze viral sequences, we applied an SLS, including several analytic approaches,[8] including modeling Omicron temporal expansion over time,[8,9] an unsupervised learning technique,[10] and a haplotype-based imputation method and bootstrapping confidence intervals. Technical details are described in the eAppendix in the Supplement. All statistical analyses were performed with functions in R version 4.0.5 (R Project for Statistical Computing). In temporal analyses, a PM was selected if P for nonlinear trend was less than .05.

Results

Omicron Haplotypes

Most Omicron viruses shared 28 mutations in the spike protein (L.P.Z., 2022, unpublished data). Their haplotypes were polymorphic, with a single dominant haplotype VIDLPFNKSNKARSRYHKGYKHKYKHKF in the United States (Figure 1), which accounted 85% of all Omicron viruses. The second most common haplotypes deviated from the dominant haplotype by K417K, while the third deviated by N440N and G446G. Among these Omicron haplotypes, they had at least 23 mutating sites indicated by the value in the last column of Figure 1. In the expanded set of 343 haplotypes (eTable 1 in the Supplement), the Omicron haplotype had variable numbers of mutations (at least 10 mutations as the threshold).

Geospatial and Temporal Distributions of Omicron Viruses in Africa

To investigate geospatial and temporal distributions of the Omicron variant in Africa, we focused on Omicron-positive cases and tabulated their collection dates within countries (eTable 5 in the Supplement). There was a single Omicron case collected on December 31, 2020, in South Africa, and all other Omicron cases were collected during 2021 (eTable 5 in the Supplement). It is noteworthy that the first Omicron sample was collected on December 31, 2020, but only 12 residues in the haplotype were mutated (Table 1). Nearly 10 months later, South Africa began experiencing an escalation of Omicron cases, starting on September 30, 2021. Shortly afterwards, Nigeria identified its first Omicron case on October 17, 2021; Senegal its first case on November 9, 2021; and Botswana its first case on November 11, 2021.
Table 1.

First Omicron Cases Detected in Africa and the United States

IDCollection dateLocationGenderAgeLineageCladeSpike haplotypeMutations, No.
First 13 cases in Africa
112/31/2020South Africa, Eastern CapeMale57B.1.576GRAVIGSSSNXGSTKQGQYYXGYKHXDXHKF12
29/30/2021South Africa, GautengFemale52BA.1GVIDLPFNKSNKARSRYHKGYKHKYKHKF28
310/12/2021South Africa, Eastern CapeFemale16BA.1GRAVIDLPFNKSSKEQGQYYKGYKHKYKHKF22
410/17/2021Nigeria, AbujaMale32BA.1GRAVIGSSSKKSNKARSRYHKGYKHKYKHKF23
510/24/2021South Africa, Eastern CapeFemale22BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
611/2/2021Nigeria, AbujaMale51BA.1GRAVIXXXXXXXXXXXXXXXKGYKHKYNHKF12
711/2/2021South Africa, Northern CapeFemale28BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
811/5/2021South Africa, GautengMale26BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
911/8/2021South Africa, GautengUnknownUnknownBA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
1011/9/2021South Africa, GautengMale23BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
1111/9/2021South Africa, GautengMale34BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
1211/9/2021South Africa, GautengUnknownUnknownBA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
1311/9/2021Senegal/Dakar, Iressef DiamniadioFemale42B.1.1.529GVIDLPFKKSNKARSRYHKGYKHKYKHKF27
First 8 cases in the United States
111/21/2021MarylandFemale40BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKFa28
211/22/2021New York CityMale33BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
311/24/2021MinnesotaUnknownUnknownBA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
411/24/2021New York CityFemale32BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
511/24/2021MissouriFemale25BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
611/24/2021VirginiaFemale23BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
711/25/2021New York CityMale30BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28
811/25/2021New YorkMale26BA.1GRAVIDLPFNKSNKARSRYHKGYKHKYKHKF28

Abbreviation: ID, identification.

This Omicron haplotype is imputed to be the same core haplotype from VIDXXXXXXXXXXXXXXKGYKHKYKHKF, in which multiple missing residues were consecutive and were likely due to sequencing quality.

Abbreviation: ID, identification. This Omicron haplotype is imputed to be the same core haplotype from VIDXXXXXXXXXXXXXXKGYKHKYKHKF, in which multiple missing residues were consecutive and were likely due to sequencing quality.

First 13 Omicron Cases in Africa

Given interest in the origin of the Omicron variant, we listed the first 13 cases detected by November 9, 2021 (Table 1). Ten samples were collected in South Africa, 2 from Nigeria, and 1 from Senegal. Both male and female patients were included. Their ages ranged from 22 to 57 years. Their haplotypes included at least 12 mutated amino acids, as indicated in the last column. As noted previously, the first example of the extreme mutation profile characteristic of the Omicron variant was reported in South Africa on December 31, 2020. These early case reports were from Eastern Cape, South Africa (Table 1 and eTable 6 in the Supplement). Case 1, with 12 mutations, and case 3, with 22 mutations, shared 11 mutations, and case 2 shared 22 mutations with the core Omicron haplotype. Both temporal and geographic factors suggest that these 3 cases may be related, and this may merit further investigation.

Temporal Trends and Detection Times in Africa

We proceeded to model the temporal trends of Omicron occurrence in 12 African countries that deposited 5 or more Omicron genomes (Figure 2A). All countries had significant temporal trends (data not shown). Omicron caseload percentages (OCPs) were computed from January 1, 2020, to December 28, 2021. Most African countries had fewer than 100 Omicron cases during this surveillance period, and Botswana exhibited emergence of Omicron later than South Africa. These considerations led us to use South Africa as a benchmark to evaluate the increase in Omicron caseload over time and the progression time required for the OCP to reach 10%, 25%, 50%, or 75% of total coronavirus cases (Table 2). Since the initial Omicron case identification on December 31, 2020, the OCP reached 10% by November 4 (95% CI, October 31 to November 6), 2021, or in 306 days (95% CI, 302 to 308) from first Omicron sequence identification. Similarly, the OCP reached 25%, 50%, and 75%, respectively, by day 311 (95% CI, 309 to 313), 317 (95% CI, 315 to 318) and 322 (95% CI, 321 to 323) from initial identification date, corresponding to November 8, 12, and 18, 2021. The emergence patterns in other countries were similar and their estimated days to reach respective threshold percentages were computed (Figure2A and Table 2). Note that estimated detection timings in several countries, such as Zambia, were negative, due to a diverse expansion pattern (eFigure 1 in the Supplement) that led to negative values (eFigure 2 in the Supplement).
Figure 2.

Locally Averaged Omicron Caseload Percentages

A, Omicron variant expansion in African countries. B, Omicron percentages over time across US states, organized by temporal pattern similarities.

Table 2.

Estimated Time for Omicron Caseload Percentages to Reach the 10%, 25%, 50%, and 75% Thresholds Across 12 African Countries

African countryFrequencyaFirst detection Time to targeted percentage, d (95% CI)b
NegativePositive10%25%50%75%
South Africa24 132190812/31/2020306 (302 to 308)311 (309 to 313)317 (315 to 318)322 (321 to 323)
Nigeria36268410/17/202131 (25 to 38)37 (34 to 41)43 (41 to 45)49 (46 to 51)
Senegal7042911/9/20215 (−6 to 19)15 (7 to 24)25 (17 to 31)35 (27 to ∞)
Ghana22647711/10/20210 (−4 to 5)5 (1 to 9)11 (7 to 15)20 (16 to ∞)
Botswana178138811/11/20217 (−1 to 9)12 (6 to 14)17 (15 to 19)21 (20 to 23)
Reunion5096711/22/2021NANANANA
Kenya54653511/26/2021−7 (−10 to −4)−2 (−4 to 1)3 (0 to 7)8 (4 to 14)
Mozambique9101711/26/2021−3 (−21 to 2)−2 (−20 to 2)−1 (−20 to 2)1 (−19 to 3)
Uganda9261811/29/2021−9 (−18 to 3)11 (2 to ∞)NANA
Zambia8764611/30/2021−46 (−49 to −45)−44 (−47 to −43)−42 (−45 to −41)−41 (−43 to −39)
Malawi7733412/2/2021−20 (−41 to −19)−19 (−40 to −18)−18 (−38 to −18)−18 (−36 to −17)
Morocco4582112/14/2021−8 (−15 to −2)−3 (−8 to 1)2 (−2 to 6)7 (3 to ∞)

Abbreviation: NA, not applicable.

Frequencies of viruses carrying Omicron (positive) or not (negative).

The 95% CIs were obtained from 2.5% and 97.5% of the empirically computed distribution from 1000 bootstraps. The time was inestimable if the Omicron caseload had not crossed the target percentage. The upper bound of the 95% CI was left as ∞ if it was beyond December 28, 2021. A negative day is an extrapolated value from the model fitting to sparse data that the Omicron percentage jumped from 0% to 100% over a short time (see example in eFigures 1 and 2 in the Supplement on Zambia).

Locally Averaged Omicron Caseload Percentages

A, Omicron variant expansion in African countries. B, Omicron percentages over time across US states, organized by temporal pattern similarities. Abbreviation: NA, not applicable. Frequencies of viruses carrying Omicron (positive) or not (negative). The 95% CIs were obtained from 2.5% and 97.5% of the empirically computed distribution from 1000 bootstraps. The time was inestimable if the Omicron caseload had not crossed the target percentage. The upper bound of the 95% CI was left as ∞ if it was beyond December 28, 2021. A negative day is an extrapolated value from the model fitting to sparse data that the Omicron percentage jumped from 0% to 100% over a short time (see example in eFigures 1 and 2 in the Supplement on Zambia).

First 8 Omicron Cases in the United States Identified Prior to November 26, 2021

At the time of declaring Omicron a VOC by the CDC on November 30, 2021, there were no officially reported Omicron cases in the United States. Eight Omicron samples were collected between November 21 and 25, 2021, and were sequenced according to the collection dates recorded in the GISAID (Table 1). Age ranges of these cases were from 25 to 40 years, including both male and female patients. They were from Maryland, New York City, Virginia, New York State, Minnesota, and Missouri. They all shared the same core Omicron haplotype.

OCP in the United States

Excluding states or territories with fewer than 10 Omicron cases, we computed OCP from October 1 to December 27, 2021, in 38 states and organized their Omicron expansions by their temporal similarities. Since few Omicron cases were identified in November 2021, Figure, B, showed the Omicron caseload percentage from December 1 to 27, 2021. There were no other Omicron cases identified prior to November 26, 2021, except for those 8 cases discussed previously (Table 2). Once an Omicron case was identified in each state, its percentage of all COVID-19 cases rose rapidly. Seven states and territories (Hawaii, Louisiana, Texas, New York, District of Columbia, Georgia, and Florida) were clustered into an early group. The next cluster of 8 states represented the next Omicron wave, the middle group (Figure, B), while the remaining 23 states had not yet experienced the spread of the Omicron variant by December 27. To quantify the spread of the Omicron variant on individual states, we computed the number of days for OCP to reach 25%, 50%, or 75%, following the first case report within that state (Table 3). Hawaii, one of the first states to experience Omicron, documented its first Omicron case on November 27, and the OCP rose to 25% within 7 days (95% CI, 6-8 days), then reached 50% and 75% within 9 days (95% CI, 8-10 days) and 11 days (95% CI, 10-13 days), respectively. Washington State, in the middle phase cluster, reported its first Omicron case on November 27, and its OCP rose to 25% and 50% within 15 days (95% CI, 14-15 days) and 18 days (95% CI, 18-19 days), respectively. Oregon, one of the last states to document Omicron cases, reported its first case on December 7, and its OCP increased to 25% and 50% within 8 days (95% CI, 7-9 days) and 11 days (95% CI, 10-13 days), respectively. The overall trend for Omicron progression in the United States is similar to that observed in South Africa and most other African nations, but temporal patterns vary notably from state to state.
Table 3.

Estimated Time for Omicron Caseload Percentages to Reach the 10%, 25%, 50%, and 75% Thresholds Across 38 US States

StatesFrequencyaFirst detectionTime to targeted percentage, d (95% CI)bCluster group
NegativePositive25%50%75%
Alabama9301512/5/20219 (−1 to 11)12 (10 to ∞)NAMiddle
Arizona15 45216812/5/202111 (11 to 12)14 (14 to ∞)NALate
California78 404342611/26/202118 (18 to 18)20 (20 to 21)NAMiddle
Colorado49 42639611/29/202113 (12 to 13)15 (14 to 16)NAMiddle
Connecticut859616211/28/202118 (16 to ∞)NANALate
DC169730711/30/202110 (9 to 11)12 (11 to 13)14 (14 to 15)Early
Florida969027511/26/202114 (14 to 15)17 (16 to 17)19 (19 to 21)Early
Georgia468727411/30/202111 (10 to 12)13 (12 to 13)14 (14 to 15)Early
Hawaii13649311/27/20217 (6 to 8)9 (8 to 10)11 (10 to 13)Early
Idaho31662112/5/202112 (10 to 14)NANALate
Illinois13 42524811/30/202112 (12 to 14)NANALate
Indiana11 1209312/8/20218 (7 to 8)10 (9 to 11)NALate
Iowa43653112/6/202110 (8 to ∞)NANALate
Kansas42892012/13/20216 (5 to ∞)8 (7 to ∞)NALate
Kentucky41683812/11/20215 (4 to 6)7 (6 to ∞)NALate
Louisiana174233312/1/20215 (4 to 6)7 (6 to 8)12 (9 to 16)Early
Maryland898239511/21/202121 (20 to 21)25 (24 to 26)NAMiddle
Massachusetts40 27028811/27/202118 (17 to 18)19 (19 to 20)22 (21 to ∞)Middle
Michigan19 5682112/1/2021NANANALate
Minnesota34 0047611/24/2021NANANALate
Mississippi14421311/29/202120 (15 to ∞)NANALate
Nebraska54472111/29/2021NANANALate
Nevada66912312/8/2021NANANALate
New Jersey11 47920311/26/202118 (16 to 19)NANALate
New York23 536197511/22/202117 (17 to 18)21 (21 to 22)25 (24 to 25)Early
North Carolina10 83314512/2/202114 (13 to 14)16 (15 to 17)18 (17 to ∞)Late
Ohio968637611/29/202114 (13 to 14)16 (16 to 17)19 (18 to 20)Middle
Oregon49358312/7/20218 (7 to 9)11 (10 to 13)NALate
Pennsylvania13 2109611/28/202115 (15 to 17)NANALate
Rhode Island35071511/30/2021NANANALate
South Carolina27325612/4/202111 (9 to 12)NANALate
Tennessee46023511/26/202119 (18 to 21)NANALate
Texas16 336109611/27/202112 (12 to 13)14 (14 to 15)17 (16 to 17)Early
Utah93382011/29/2021NANANALate
Virginia50889411/24/202120 (19 to 21)22 (21 to ∞)NAMiddle
Washington13 64368811/27/202115 (14 to 15)18 (18 to 19)NAMiddle
West Virginia64151912/2/2021NANANALate
Wisconsin13 47743511/27/202118 (18 to 18)22 (21 to 22)NALate

Abbreviation: NA, not applicable.

Frequencies of viruses carrying Omicron (positive) or not (negative).

The 95% CIs were obtained from 2.5% and 97.5% of the empirically computed distribution from 1000 bootstraps. The time was inestimable if the Omicron caseload had not crossed the targeted percentage. The upper bound of the 95% CI was left as ∞ if it was beyond December 28, 2021.

Abbreviation: NA, not applicable. Frequencies of viruses carrying Omicron (positive) or not (negative). The 95% CIs were obtained from 2.5% and 97.5% of the empirically computed distribution from 1000 bootstraps. The time was inestimable if the Omicron caseload had not crossed the targeted percentage. The upper bound of the 95% CI was left as ∞ if it was beyond December 28, 2021.

Discussion

This case series using viral sequences from African countries and the United States suggests that the first Omicron sequence was identified in South Africa on December 31, 2020. The OCP in South Africa reached 10%, 25%, 50%, and 75% by November 4, 8, 12, and 18, 2021, respectively. While the first Omicron progenitor, with 12 Omicron mutations, was collected on December 31, 2020, and was unlikely to be informative of future expansion, it should not be treated as the first date of detecting the Omicron variant. Instead, we should treat the second identified sequence, with 22 Omicron mutations, as a potential progenitor (Figure2A). By November 4, 2021, the OCP reached 10%, and this threshold was suggested as an empirical threshold for a public health alert.[8] Our retrospective study illustrates that South Africa and other African nations collected valuable sequence data that could have been used to track the emergence of Omicron variants, providing potentially useful information as many as 22 days prior to the officially declaration on November 26, 2021. Earlier detection of the Omicron variant and documentation of its rapid transmission might have been of great benefit. Public health policies may have been implemented to limit or delay rapid global transmission, and clinicians would have had more advance warning of yet another caseload surge. Since structural information is available for several SARS-CoV-2 proteins, particularly the spike protein, early information regarding key mutations could be used in protein homology modeling studies to anticipate potential effects on vaccine or therapeutic antibody effectiveness. For example, the Omicron variants carry mutations in spike protein regions that are known binding sites for some current therapeutic antibodies, and earlier information that the effectiveness of these therapeutic antibodies against Omicron variants might be compromised would be valuable for both clinicians and scientists involved in antibody development. In the current study, we chose to use a set of Omicron haplotypes to define the Omicron virus as opposed to conventional lineages assigned by PANGO.[7] Indeed, use of conventionally designated lineages for this retrospective study is straightforward given that all viral sequences are automatically assigned lineages by GISAID when sequences are submitted. The phylogenic approach accounts for both nucleotide mutations and insertions and deletions. However, the rapid mutation rate observed for coronaviruses can make it challenging to revise and update lineage designations in a timely manner. In contrast, the SLS,[8] relying on PM, identifies new mutation haplotypes with as few as 1 to 3 samples if the haplotype includes, for example, 10 or more mutated residues. Essentially, the haplotype approach enables the discovery of new variants without designated lineages. For example, the first omicron virus was assigned the lineage B.1.576, rather than BA.1. Additionally, when haplotype-tagged variants become dominant, this approach could facilitate revising lineages and designating appropriate variants. Despite its advantages, SLS has benefited from assigned lineages in identifying Omicron core haplotypes and is best viewed as a complementary approach to the phylogenic analysis at this time. Metadata in the GISAID database includes several fields for disease severity or vaccination status. If such data were routinely submitted, it would facilitate correlation of new variants with clinical metrics, such as our recent discovery of a viral haplotype that associates with hospitalization risk.[11] Unfortunately, data submissions at this point are full of missing and/or incoherent values and remain to be improved. An interesting and potentially important observation is that there were 3 early Omicron cases in Eastern Cape, South Africa (case identifications 1, 3, and 5) with 12, 22, and 28 mutations, respectively, in a relatively short period of time (Table 1; eTable 6 in the Supplement). This temporal clustering implies that the original isolate may have been transmitted and that the infected person may have accumulated additional mutations with limited intermediaries. This observation appears to suggest that there may have been micro-outbreaks in a community of immunocompromised persons who are at risk of prolonged persistent infection, instead of originating in 1 or 2 persons.

Limitations

This study has several limitations. Despite a large collection of samples, it was an observational study, and results are descriptive in nature. Given the nature of volunteer submission to GISAID, we have limited information on sampling population, ie, the denominator, and hence are unable to estimate disease prevalence or incidence rates. Instead, what can be reliably estimated are percentages as OCP, which are nevertheless indicative of the prevalence or incidence of patients with the Omicron variant of COVID-19. Another limitation is that viral sequences in several African countries were not collected continuously, and sparse data collection necessitates extrapolations that may rely too much on assumptions in the generalized additive model. Hence, estimated days that OCP reached a designated percentage need to be interpreted with caution. The third limitation is that the Omicron haplotypes used only 28 mutations in the spike protein. When deploying the strategy, it may be necessary to monitor all PM in all viral genes in addition to the spike protein and also to consider synonymous nucleotide substitutions in or outside of genes; although synonymous mutations yield an unchanged protein sequence, previous experimental studies have shown that synonymous mutations can impact ribosomal translation kinetics, which may lead to altered protein folding in some cases.[12,13,14] Whether due to altered protein folding or simply enhanced protein production via accelerated ribosomal translation, synonymous mutations may increase what is known as viral fitness, leading to viral variants with clinical significance. The fourth limitation is that submission date may be delayed by days or weeks after collection date.[15] Any significant lag time between data collection and submission will impede our ability to detect new viral variants in a timely manner, so it may be important to take steps as a community to improve data generation and submission efficiency. The fifth limitation is that determining OCP benefited from fitted temporal OCP curves and relied on all observations. Therefore, estimated timing may be thought of as a best-case scenario when applying this method for detecting future variants.

Conclusions

This study suggests that given the amount and quality of sequence data available from South Africa and other African countries, it may have been possible to detect initial Omicron cases earlier than reported. Building on GISAID, SLS with or without assigned lineages may be used to identify emerging variants.
  11 in total

1.  Genetic code-guided protein synthesis and folding in Escherichia coli.

Authors:  Shaoliang Hu; Mingrong Wang; Guoping Cai; Mingyue He
Journal:  J Biol Chem       Date:  2013-09-03       Impact factor: 5.157

2.  A "silent" polymorphism in the MDR1 gene changes substrate specificity.

Authors:  Chava Kimchi-Sarfaty; Jung Mi Oh; In-Wha Kim; Zuben E Sauna; Anna Maria Calcagno; Suresh V Ambudkar; Michael M Gottesman
Journal:  Science       Date:  2006-12-21       Impact factor: 47.728

Review 3.  Big data bioinformatics.

Authors:  Casey S Greene; Jie Tan; Matthew Ung; Jason H Moore; Chao Cheng
Journal:  J Cell Physiol       Date:  2014-12       Impact factor: 6.384

4.  GISAID: Global initiative on sharing all influenza data - from vision to reality.

Authors:  Yuelong Shu; John McCauley
Journal:  Euro Surveill       Date:  2017-03-30

5.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.

Authors:  Andrew Rambaut; Edward C Holmes; Áine O'Toole; Verity Hill; John T McCrone; Christopher Ruis; Louis du Plessis; Oliver G Pybus
Journal:  Nat Microbiol       Date:  2020-07-15       Impact factor: 17.745

6.  Mutations in viral nucleocapsid protein and endoRNase are discovered to associate with COVID19 hospitalization risk.

Authors:  Lue Ping Zhao; Pavitra Roychoudhury; Peter Gilbert; Joshua Schiffer; Terry P Lybrand; Thomas H Payne; April Randhawa; Sara Thiebaud; Margaret Mills; Alex Greninger; Chul-Woo Pyo; Ruihan Wang; Renyu Li; Alexander Thomas; Brandon Norris; Wyatt C Nelson; Keith R Jerome; Daniel E Geraghty
Journal:  Sci Rep       Date:  2022-01-24       Impact factor: 4.379

7.  The omicron variant of SARS-CoV-2: Understanding the known and living with unknowns.

Authors:  Nicholas E Ingraham; David H Ingbar
Journal:  Clin Transl Med       Date:  2021-12

8.  Tracking SARS-CoV-2 Spike Protein Mutations in the United States (January 2020-March 2021) Using a Statistical Learning Strategy.

Authors:  Lue Ping Zhao; Terry P Lybrand; Peter B Gilbert; Thomas R Hawn; Joshua T Schiffer; Leonidas Stamatatos; Thomas H Payne; Lindsay N Carpp; Daniel E Geraghty; Keith R Jerome
Journal:  Viruses       Date:  2021-12-21       Impact factor: 5.818

9.  Omicron SARS-CoV-2 variant: a new chapter in the COVID-19 pandemic.

Authors:  Salim S Abdool Karim; Quarraisha Abdool Karim
Journal:  Lancet       Date:  2021-12-03       Impact factor: 202.731

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.