Tiago Gräf1, Gonzalo Bello2, Felipe Gomes Naveca3, Marcelo Gomes4, Vanessa Leiko Oikawa Cardoso1, Alexandre Freitas da Silva5, Filipe Zimmer Dezordi5, Mirleide Cordeiro Dos Santos6, Katia Correa de Oliveira Santos7, Érika Lopes Rocha Batista8, Alessandro Leonardo Álvares Magalhães8, Fernando Vinhal9, Fábio Miyajima10, Helisson Faoro11, Ricardo Khouri1, Gabriel Luz Wallau5, Edson Delatorre12, Marilda Mendonça Siqueira13, Paola Cristina Resende13. 1. Instituto Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Brazil. 2. Laboratório de AIDS e Imunologia Molecular, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro, Brazil. 3. Laboratório de Ecologia de Doenças Transmissíveis na Amazônia (EDTA), Leônidas e Maria Deane Institute, Fiocruz, Manaus, Brazil. 4. Grupo de Métodos Analíticos em Vigilância Epidemiológica, Programa de Computação Científica (PROCC), Fiocruz, Rio de Janeiro, Brazil. 5. Instituto Aggeu Magalhães, Fundação Oswaldo Cruz, Recife, Pernambuco, Brazil. 6. Instituto Evandro Chagas, Belem, Para, Brazil. 7. Instituto Adolfo Lutz, São Paulo, Brazil. 8. Secretaria de Saúde de Aparecida de Goiânia, Goiás, Brazil. 9. Lab HLAGYN - Goiânia, Goias, Brazil. 10. Fundação Oswaldo Cruz - Fiocruz Ceará, Fortaleza, Brazil. 11. Instituto Carlos Chagas (ICC), Fiocruz-PR, Curitiba, Parana, Brazil. 12. Departamento de Biologia. Centro de Ciencias Exatas, Naturais e da Saude, Universidade Federal do Espirito Santo, Alegre, Brazil. 13. Laboratory of Respiratory Viruses and Measles, Oswaldo Cruz Institute (IOC), Oswaldo Cruz Foundation (FIOCRUZ), Rio de Janeiro, RJ, Brazil.
Abstract
The COVID-19 epidemic in Brazil experienced two major lineage replacements until mid-2021. The first was driven by lineage P.2, in late 2020, and the second by lineage Gamma, in early 2021. To understand how these SARS-CoV-2 lineages spread in Brazil, we analyzed 11,724 genomes collected throughout the country between September 2020 and April 2021. Our findings indicate that lineage P.2 probably emerged in July 2020 in the Rio de Janeiro state and Gamma in November 2020 in the Amazonas state. Both states were the main hubs of viral disseminations to other Brazilian locations. We estimate that Gamma was 1.56-3.06 times more transmissible than P.2 in Rio de Janeiro and that the median effective reproductive number (Re) of Gamma varied according to the geographic context (Re = 1.59-3.55). In summary, our findings support that lineage Gamma was more transmissible and spread faster than P.2 in Brazil.
The COVID-19 epidemic in Brazil experienced two major lineage replacements until mid-2021. The first was driven by lineage P.2, in late 2020, and the second by lineage Gamma, in early 2021. To understand how these SARS-CoV-2 lineages spread in Brazil, we analyzed 11,724 genomes collected throughout the country between September 2020 and April 2021. Our findings indicate that lineage P.2 probably emerged in July 2020 in the Rio de Janeiro state and Gamma in November 2020 in the Amazonas state. Both states were the main hubs of viral disseminations to other Brazilian locations. We estimate that Gamma was 1.56-3.06 times more transmissible than P.2 in Rio de Janeiro and that the median effective reproductive number (Re) of Gamma varied according to the geographic context (Re = 1.59-3.55). In summary, our findings support that lineage Gamma was more transmissible and spread faster than P.2 in Brazil.
The COVID-19 epidemic in Brazil during 2020 was mostly driven by SARS-CoV-2 lineages B.1.1.28 and B.1.1.33 in most country regions, and by lineages B.1.1 and B.1.195 at some specific states (Resende et al., 2020a; Candido et al., 2020; Naveca et al., 2021). In late 2020, two descendant sub-lineages of B.1.1.28 emerged in Brazil and replaced dominant lineages across the country. The first lineage turnover observed during late 2020 was associated with the emergence and dissemination of the lineage P.2, previously designated variant of interest (VOI) Zeta, which harbors mutation E484K in the spike protein (Voloch et al., 2021). The lineage P.2 was first detected in the Rio de Janeiro (RJ) state in October 2020 and became the most prevalent lineage in several Brazilian states from November 2020 to January 2021 (Voloch et al., 2021; Lamarca et al., 2021; http://www.genomahcov.fiocruz.br/dashboard-en/). The second country-wide lineage turnover event took place after the emergence of the lineage P.1, currently designated variant of concern (VOC) Gamma, in the Amazonas (AM) state in November 2020 (Naveca et al., 2021; Faria et al., 2021). The VOC Gamma harbors multiple mutations in the spike protein and becomes the dominant lineage in all Brazilian regions by early 2021(Lamarca et al., 2021; Martins et al., 2021; Silva et al., 2021; Tosta et al., 2021; Dos Santos et al., 2021).Phylodynamic reconstruction of the spatiotemporal spread pattern of lineages B.1.1.28 and B.1.1.33 supports that densely populated and well-connected urban centers in the Southeastern region were the main sources of inter-regional SARS-CoV-2 spread during the first epidemic wave in 2020 in Brazil (Candido et al., 2020). Mathematical modeling also supports that São Paulo (SP) and RJ were the main superspreader cities of SARS-CoV-2 during the first months of the pandemic in Brazil (Nicolelis et al., 2021). Even though lineage P.2 and VOC Gamma spread widely through Brazil since late 2020, the routes of regional dissemination and the onset date of community transmission across different states remain unclear. Furthermore, despite previous studies estimated that the effective reproductive number (Re) of the VOC Gamma was nearly two times higher than non-VOC co-circulating in the AM state (Naveca et al., 2021; Faria et al., 2021; Coutinho et al., 2021), the Re of this VOC in other states and its relative transmissibility concerning the lineage P.2 has not been addressed yet.Here, we used a large dataset of SARS-CoV-2 positive samples sequenced by the Brazilian Ministry of Health COVID-19 Genomic Surveillance Network (see STAR Methods for details). Genomes retrieved from patients of all Brazilian states sampled between September 2020 and April 2021 was analyzed alongside publicly available SARS-CoV-2 sequences sampled throughout Brazil in the same period. Our findings show the changing prevalence of lineage P.2 and VOC Gamma in different country regions, -reconstruct the pathways of inter-state spread, estimate the onset date of community transmission in such locations, and compare the relative transmissibility of both co-circulating variants.
Results
Genomic surveillance of SARS-CoV-2 in Brazil: September 2020 to March 2021
The COVID-19 epidemic in Brazil displayed two distinct phases over the period of investigation: a decreasing phase from around 900 deaths/day in early September to 300 deaths/day in mid-November 2020, followed by an increasing phase up to 3,000 deaths/day in late March 2021 (Figure 1A). A total of 11,724 high-quality (<5% N) Brazilian SARS-CoV-2 whole-genomes (29 kb) sampled between September 01st, 2020 and April 16th, 2021 were analyzed, including 5,472 generated by our collaborative genomic surveillance network (Figure 1B and Data S2) and 6,252 obtained from the EpiCoV database in GISAID (https://www.gisaid.org/), covering all Brazilian states (Table S1. Genomes sampling spatial distribution). Based on this sampling, our genomic analysis revealed a changing pattern of SARS-CoV-2 lineage frequencies in the studied period with a high overall prevalence of lineages B.1.1.28 and B.1.1.33 during the decreasing phase, a subsequent predominance of lineage P.2 from December 2020 to January 2021, and the dominance of the VOC Gamma in February and March 2021 (Figure 1C). In addition, it is interesting to note that the overall proportion of local variants harboring the mutation Spike:E484K, which comprises the VOC Gamma and the lineages P.2, N.9, and N.10, sharply increased in Brazil from <1% in September 2020 to >90% in March 2021 (Figure 1D).
Figure 1
Genomic epidemiology of COVID-19 in Brazil from September 2020 to March 2021
(A) Daily number of deaths due to COVID-19 and 7-day rolling average (line).
(B) Weekly number of genomes generated in this study.
(C) Proportion of SARS-CoV-2 lineages circulating in Brazil in the study period.
(D) Proportion of SARS-CoV-2 genomes harboring mutation Spike:484E/K in Brazil in the study period.
Genomic epidemiology of COVID-19 in Brazil from September 2020 to March 2021(A) Daily number of deaths due to COVID-19 and 7-day rolling average (line).(B) Weekly number of genomes generated in this study.(C) Proportion of SARS-CoV-2 lineages circulating in Brazil in the study period.(D) Proportion of SARS-CoV-2 genomes harboring mutation Spike:484E/K in Brazil in the study period.
Timing the origin of variants Gamma and P.2 in Brazil
To estimate the precise onset date of the lineage P.2 and the VOC Gamma, we first excluded all Gamma or P.2 sequences that did not display the full set of variant-defining mutations and those recently classified as Gamma-like (Naveca et al., 2022). Next, we assessed the temporal signal of the P.2 and Gamma datasets and excluded several outliers, including some of the sequences with the earliest recorded sampling dates (Figure 2A). The leaf-dating Bayesian molecular clock method confirmed that the earliest Gamma sequence recovered in November 2020 and three P.2 genomes sampled between April and August 2020 clearly violate the assumption of a molecular clock as displayed median expected collection dates much more recent than their recorded sampling dates (Figure 2B). This quality control process resulted in two final datasets of 1,357 P.2 and 2,477 Gamma Brazilian sequences sampled between 25th September 2020–26th March 2021 and 03rd December 2020–16th April 2021, respectively, mostly (>70%) generated by our collaborative genomic surveillance network. Finally, to estimate the time of the most recent common ancestor (TMRCA) of each variant, we generated spatiotemporal representative subsets containing 219 P.2 and 191 Gamma sequences. Bayesian molecular clock analyses traced the TMRCA of the lineage P.2 to 28th July 2020 [95% high posterior density (HPD): 11th June–3rd September 2020] and of that of the VOC Gamma to 30th November 2020 (95% HPD: 14th November–3rd December 2020) (Figure 2C).
Figure 2
Temporal analysis of the P.2 and Gamma sequence datasets
(A) Plots of the root-to-tip divergence against dates of sample collection for P.2 and Gamma datasets. Outliers excluded from the further analysis were colored gray.
(B) Leaf-dating Bayesian molecular clock of the four sequences excluded due to probably wrong collection date annotation. Reported collection dates are indicated by the dotted lines. Boxplots represent the median and the 95% HPD interval of the estimated collection dates.
(C) Time-calibrated maximum clade credibility tree for P.2 and Gamma spatiotemporal representative datasets. The median estimated TMRCA (with 95% HPD in parentheses) for each clade are shown on the root.
Temporal analysis of the P.2 and Gamma sequence datasets(A) Plots of the root-to-tip divergence against dates of sample collection for P.2 and Gamma datasets. Outliers excluded from the further analysis were colored gray.(B) Leaf-dating Bayesian molecular clock of the four sequences excluded due to probably wrong collection date annotation. Reported collection dates are indicated by the dotted lines. Boxplots represent the median and the 95% HPD interval of the estimated collection dates.(C) Time-calibrated maximum clade credibility tree for P.2 and Gamma spatiotemporal representative datasets. The median estimated TMRCA (with 95% HPD in parentheses) for each clade are shown on the root.
Inter-state spread and domestic dissemination of variants Gamma and P.2 in Brazil
To describe the pathways that shaped the country-wide dissemination of the lineage P.2 and the VOC Gamma in Brazil, we performed a discrete Bayesian phylogeographic reconstruction. Bayesian analyses support that the lineage P.2 most probably [Posterior State Probability (PSP) = 0.71] arose in RJ (Southeastern region). This state was also the main epicenter of the regional spread of this lineage in Brazil, followed by the Southern states of Santa Catarina (SC) and Rio Grande do Sul (RS) and the Southeastern state of SP (Figure 3A and Video S1. SARS-CoV-2 lineage P.2 migration history and SARI cases in Brazil). RJ contributed with ∼50% of all inter-state disseminations of lineage P.2 between October 2020 and February 2021, while the four major spreading states (RJ, SP, SC, and RS) combined accounted for >90% of all inter-state viral disseminations in that period (Figure 3B). The relative contribution of the secondary hubs SC, RS, and SP to the overall dissemination of P.2 remained roughly constant between mid-October 2020 and late February 2021. The P.2 migrations from RJ to other states started in September 2020 but were mostly distributed between October 2020 and January 2021 (Figure 3C). The earliest onset date of community transmission of the lineage P.2 was traced to October 2020 in most Brazilian states (Figure 3D).
Figure 3
Spatiotemporal spread of SARS-CoV-2 lineage P.2 in Brazil
(A) Inter-state circular migration flow plot as estimated from the Markov jumps analysis complementary to the phylogeographic model. Arrows indicate the direction of the migration and thickness is relative to the number of jumps.
(B) Contribution of each state in seeding P.2 to other locations through time measured as the proportion of the total Markov per day.
(C) Time distribution (95% HPD) of jumps from RJ to other Brazilian states.
(D) TMRCA of each highly supported [Shimodaira–Hasegawa-like approximate Likelihood Ratio Test (SH-aLRT) > 80] state-specific clades with more than three genomes.
Spatiotemporal spread of SARS-CoV-2 lineage P.2 in Brazil(A) Inter-state circular migration flow plot as estimated from the Markov jumps analysis complementary to the phylogeographic model. Arrows indicate the direction of the migration and thickness is relative to the number of jumps.(B) Contribution of each state in seeding P.2 to other locations through time measured as the proportion of the total Markov per day.(C) Time distribution (95% HPD) of jumps from RJ to other Brazilian states.(D) TMRCA of each highly supported [Shimodaira–Hasegawa-like approximate Likelihood Ratio Test (SH-aLRT) > 80] state-specific clades with more than three genomes.
Video S1. SARS-CoV-2 lineage P.2 migration history and SARI cases in Brazil, related to Figure 3
Map of Brazil with different colors for each state is shown on the left. Inferred movements of SARS-CoV-2 P.2 lineage are indicated with tapered arrows, colored by its origin state. Phylogenetic tree in the upper right shows virus migration history as reconstructed by the phylogeographical analysis, with branches colored by location (Brazilian states). Migrations inferred between any two locations in the tree are animated on the map on the left. Plot on the lower right shows the sum of weekly SARI cases per 100k inhabitants reported for each Brazilian state. SARI cases are the sum of any SARS-CoV-2 lineages circulating in each week; thus, they do not specifically represent lineage P.2 circulation.Bayesian analyses support that the VOC Gamma most probably (PSP = 1.0) arose in the AM (Northern region) and this state was also the main hub of Gamma spread throughout the country, followed by the Southeastern states of SP and RJ (Figure 4A and Video S2. SARS-CoV-2 VOC Gamma migration history and SARI cases in Brazil). AM was responsible for 60%–70% of the inter-state disseminations of the VOC Gamma between December 2020 and March 2021, while the three major spreading states combined accounted for >90% of all inter-state viral transmissions in that period (Figure 4B). The relative contribution of the secondary hubs (SP and RJ) to the overall dissemination of Gamma in Brazil increased from mid-December 2020 to late March 2021. The first transition events of VOC Gamma from AM into other Brazilian states occurred in December 2020 and reached a peak between January and February 2021 (Figure 4C), while the earliest timing of community transmission of this VOC in most Brazilian states was traced to December 2020 (Figure 4D). By early January 2021, this variant was already disseminated and established local transmission chains in nine out of 23 Brazilian states from all major country regions: North (Pará), Northeastern (Bahia, Maranhão, and Sergipe), Southeastern (SP and RJ), Southern (SC and Paraná), and Central-Western (Goiás) regions.
Figure 4
Spatiotemporal spread of SARS-CoV-2 VOC Gamma in Brazil
(A) Inter-state circular migration flow plot as estimated from the Markov jumps analysis complementary to the phylogeographic model. Arrows indicate the direction of the migration and thickness is relative to the number of jumps.
(B) Contribution of each state in seeding Gamma to other locations through time measured as the proportion of the total Markov jumps per day.
(C) Time distribution (95% HPD) of jumps from AM to other Brazilian states. The vertical green line shows the time of Gamma first notification and vertical red line shows the starting date of patients with COVID-19 transferring from AM to other states.
(D) TMRCA of each highly supported (SH-aLRT > 80) state-specific clades with more than three genomes.
Spatiotemporal spread of SARS-CoV-2 VOC Gamma in Brazil(A) Inter-state circular migration flow plot as estimated from the Markov jumps analysis complementary to the phylogeographic model. Arrows indicate the direction of the migration and thickness is relative to the number of jumps.(B) Contribution of each state in seeding Gamma to other locations through time measured as the proportion of the total Markov jumps per day.(C) Time distribution (95% HPD) of jumps from AM to other Brazilian states. The vertical green line shows the time of Gamma first notification and vertical red line shows the starting date of patients with COVID-19 transferring from AM to other states.(D) TMRCA of each highly supported (SH-aLRT > 80) state-specific clades with more than three genomes.
Video S2. SARS-CoV-2 VOC Gamma migration history and SARI cases in Brazil, related to Figure 4
Map of Brazil with different colors for each state is shown on the left. Inferred movements of SARS-CoV-2 VOC Gamma are indicated with tapered arrows, colored by its origin state. Phylogenetic tree in the upper right shows virus migration history as reconstructed by the phylogeographical analysis, with branches colored by location (Brazilian states). Migrations inferred between any two locations in the tree are animated on the map on the left. Plot on the lower right shows the sum of weekly SARI cases per 100k inhabitants reported for each Brazilian state. SARI cases are the sum of any SARS-CoV-2 lineages circulating in each week; thus, they do not specifically represent VOC Gamma circulation.
Estimating the Re of variants Gamma and P.2 in Brazil
We applied the birth-death skyline (BDSKY) model to estimate the Re of P. 2 and Gamma at source locations by selecting all P.2 sequences from RJ (P.2RJ) and all Gamma sequences from AM (GammaAM), and at secondary outbreaks by combining large (n > 10) highly supported (SH-aLRT > 0.80) clades representative of the circulation in a single state. Only Gamma clusters from SP and RJ, and none for P.2, met the criteria for analysis of secondary outbreaks (see STAR Methods). The temporal trajectory of the median Re of clade P.2RJ supports an initial expansion phase (Re > 1), followed by a phase of stabilization (Re ∼ 1) and subsequent decrease (Re < 1) (Figure 5A). Interestingly, when clade P.2RJ reached the highest median Re in September 2020, the lineage P.2 was still undetected 4 and the number of severe acute respiratory illness (SARI) cases remained relatively constant in RJ (Figure 5A). When lineage P.2 became the dominant variant in RJ in December 2020, the clade P.2RJ displayed a median Re∼1.
Figure 5
Phylodynamics analyses of SARS-CoV-2 variants P.2 and Gamma
(A) in Rio de Janeiro state.
(B) in Amazonas state.
(C) in São Paulo state. Effective reproductive number (Re) [median (solid lines) and 95% HPD (highlighted area)], as estimated by Bayesian phylogenetics methods, is shown in green and brown for Gamma and P.2, respectively. Along with Re, SARI cases are shown as a black line. PANGO lineages proportions per month are shown in the bottom panel, colored according to the legend.
Phylodynamics analyses of SARS-CoV-2 variants P.2 and Gamma(A) in Rio de Janeiro state.(B) in Amazonas state.(C) in São Paulo state. Effective reproductive number (Re) [median (solid lines) and 95% HPD (highlighted area)], as estimated by Bayesian phylogenetics methods, is shown in green and brown for Gamma and P.2, respectively. Along with Re, SARI cases are shown as a black line. PANGO lineages proportions per month are shown in the bottom panel, colored according to the legend.Similar to clade P.2RJ, the clade GammaAM showed an epidemic dynamic with three phases of expansion (Re > 1), stabilization (Re ∼ 1), and subsequent decrease (Re < 1) (Figure 5B). When clade GammaAM reached the highest median Re in December 2020, the relative prevalence of Gamma in Amazonas was low (<10%), but the number of SARI cases already started to increase (Figure 5B). When Gamma became the dominant variant in AM in January 2021, the clade GammaAM also displayed a median Re∼1. The initial expansion phase of major local Gamma clades in RJ (GammaRJ) and SP (GammaSP) from late December 2020 to late January 2021 occurred when Gamma was barely detected, and the number of SARI cases remained relatively constant in those states (Figures 5A and 5C). The clade GammaRJ expanded in RJ during a period (December 2020 to March 2021) when the prevalence of clade P.2RJ was declining and the ratio Re(GammaRJ)/Re(P.2RJ) was 1.56 (December-January), 3.06 (February), and 2.33 (March) along the co-circulating period, supporting the higher transmissibility of the VOC Gamma concerning lineage P.2. Comparison of the median Re of VOC Gamma in different states revealed that clade GammaAM displayed a much faster initial expansion (Re = 3.55) than clades GammaRJ and GammaSP (Re = 1.59 and 1.91, respectively). The growth rate of clade GammaAM, however, rapidly decreased in the second half of December 2020 (Re = 1.40) and during early 2021 (Re =<1), while clades GammaRJ and GammaSP maintained longer phases of epidemic growth (Re = 1.45–2.01) until March 2021.
Discussion
We combine spatial and genomic data to reconstruct the spatiotemporal dynamics of the VOC Gamma and lineage P.2 in Brazil from September 2020 through April 2021, a study period that covers a decreasing phase of COVID-19 incidence from September to November 2020 and a subsequent increasing phase up to March–April 2021. Although both SARS-CoV-2 variants harbor the same mutation of concern Spike:E484K, they displayed different spatiotemporal dynamics probably shaped by the higher transmissibility of the VOC Gamma concerning lineage P.2.Our study indicates that variants P.2 and Gamma probably arose in RJ and AM states around July and November 2020, respectively. This is consistent with initial estimations (Voloch et al., 2021; Faria et al., 2021; Resende et al., 2021) and rejects the hypothesis of a much older origin of lineage P.2 in February 2020 and of VOC Gamma in August 2020 (Lamarca et al., 2021). Our analyses suggest that some Gamma and P.2 Brazilian sequences with the earliest recorded sampling times available at the EpiCoV database in GISAID are probably the result of sample contamination or spelling errors in metadata entries. The inclusion of those problematic sequences, as well as of Gamma sequences belonging to Gamma-like clades (Lamarca et al., 2021), certainly pushed back the TMRCA of Brazilian variants to unreliable old dates. The emergence of lineage P.2 in February 2020 is highly improbable because it overlaps with the estimated origin of the parental lineage B.1.1.28 (Candido et al., 2020), while the emergence of the VOC Gamma in August 2020 is inconsistent with the absence of this variant among hundreds of Amazonian SARS-CoV-2 positive samples genotyped between August and November 2020(Naveca et al., 2021).Although the densely populated and well-connected states in the Southeastern region (RJ and SP) were the main sources of dissemination of lineage P.2 in Brazil, this variant spread at a slower rate than the VOC Gamma that was mainly disseminated from the poorly connected state of AM. According to our estimations, the earliest onset date of community transmission of lineage P.2 in most sampled states was traced to October 2020 (∼3 months after its emergence), coinciding with the first detection of this variant in such locations (Resende et al., 2021), and it only became the most prevalent variant in some states around December 2020 (∼5 months after its emergence). By comparison, the earliest timing of community transmission of the VOC Gamma in many Brazilian states was traced to December 2020 (∼1 month after its emergence) and became the most prevalent variant in all Brazilian states by February 2021 (∼3 months after its emergence). By the time when the VOC Gamma was first described on 12th January 2021 in Japanese travelers returning from the Amazonas state (Fujino et al., 2021), this variant had already established local transmission chains in at least nine out of 23 Brazilian states from all major country regions.The rapid geographic spread of the VOC Gamma and the consistent fixation as the most prevalent SARS-CoV-2 lineage across all Brazilian regions during 2021 supports that this VOC is more transmissible than lineage P.2. The comparison of the Re of local clades of variants Gamma and P.2 that co-circulated in the RJ state from December 2020 to March 2021 allowed us to quantify the relative transmissibility of those lineages while controlling for confounding ecological factors such as local levels of immunity and social distancing measures. Our phylodynamics estimates indicate that the VOC Gamma displayed a median Re that was between 1.56 and 3.06 higher than that of lineage P.2 during the period of co-circulation. Notably, this result is consistent with a previous study conducted in the AM state that estimated that the median Re of the VOC Gamma was 2.2 times higher than that estimated for local B.1.1.28 clades that co-circulated during December 2020 (Naveca et al., 2021). These findings confirm the notion that the VOC Gamma was more transmissible than pre-existing SARS-CoV-2 lineages in Brazil, including the lineage P.2, in different local settings.Even though our phylogeographic reconstructions support that RJ and SP were important hubs of inter-regional spread of SARS-CoV-2 during the epidemic wave in late 2020 and early 2021 as observed in early 2020 (Candido et al., 2020), they also pointed that other Brazilian states from the Northern (AM) and Southern (SC and RS) regions accounted for a substantial fraction of all inter-state disseminations of variants Gamma and P.2, respectively. Our findings also support that inter-state viral transmissions not only occurred between close geographic locations but also between distant states of Northern, Southern, and Northeastern regions. The diversification of geographic hubs of viral dispersion and the detection of long-distance viral migrations suggest that reactivation of domestic passenger air traffic since the second half of 2020 may have been an important driver of dissemination of variants P.2 and Gamma in Brazil. These findings also emphasize that implementation of more strict control measures for domestic air passengers, like the mandatory exigence of a negative SARS-CoV-2 RT-PCR/antigen test, could be an effective measure to reduce the dissemination rate of SARS-CoV-2 within Brazil (Murano et al., 2021).Our findings revealed that variants P.2 and Gamma were disseminated for several weeks before being detected in a new location and their early phase of cryptic expansion is challenging to detect or predict by epidemiological surveillance systems. When the lineage P.2 reached the highest median Re in September 2020 in RJ, the mean daily number of SARS-CoV-2 cases in the state was decreasing and the incidence of SARI cases only started to increase in mid-October 2020. Similarly, the VOC Gamma in RJ and SP probably started to expand in December 2020, well-before the first detection of this variant in January 2021 and the peak of SARI cases associated with Gamma in March 2021 in both states. We hypothesize that the unnoticed early expansion of variants P.2 or Gamma in many locations happened in an epidemic scenario where previous dominant lineages were declining, so the growth of daily SARI cases only became apparent when these new variants were already established as the most prevalent lineage. It may also be possible that initial circulation of variants P.2 and Gamma might have been driven by young people, and SARI cases only increased when older individuals got infected.Because the levels of immunity and social distancing varied over time among different Brazilian states, we compared the Re of the VOC Gamma in AM, RJ, and SP between December 2020 and March 2021 to quantify the potential impact of ecological factors on Gamma spread. Although the Re estimates displayed large credibility intervals with considerable overlap, our data suggest that the VOC Gamma displayed a much faster initial expansion in AM (Re ∼ 3.6) than in RJ or SP (Re ∼ 1.5–2.0), but the growth phase in AM extends over a shorter time than in the Southeastern states. We speculate that the weaker mitigation measures implemented in AM with respect to RJ and SP at late 2020 and/or large-scale gatherings may have amplified the intrinsic higher transmissibility of the VOC Gamma in this Northern state (Naveca et al., 2021). The steep decrease of Re in AM in January 2021 might be the result of non-pharmaceutical interventions (NPIs) implemented after the health system collapse (Naveca et al., 2021) and/or of high levels of population immunity estimated at ∼70% in Manaus (Sabino et al., 2021). On the other hand, the larger number of susceptible individuals in the Southeastern region may have sustained a longer phase of Gamma expansion.In summary, this molecular epidemiological study showed that the COVID-19 epidemic in Brazil from September 2020 to March 2021 was characterized by the emergence and spread of the lineage P.2 and the VOC Gamma that were the most prevalent SARS-CoV-2 variants at different time periods in the country. The spatial dispersion of these variants in Brazil was driven by short and long-distance viral transmission and both variants circulated cryptically in several locations for some weeks before being detected. The VOC Gamma displayed a higher transmissibility than lineage P.2 which explained the faster rate of spatial spread of this variant and its establishment as the dominant SARS-CoV-2 lineage in Brazil within a few (∼3) months after its emergence. Our findings also support that the transmissibility of the VOC Gamma varied according to the geographic context probably due to regional differences in social distancing measures and/or fraction of population previously infected. These findings also strengthen the need for ongoing genomic surveillance to provide early warnings about spread of emergent SARS-CoV-2 variants in Brazil.
Limitations of the study
The most important limitation of our study was the uneven sampling among Brazilian states and throughout the time, which may have introduced some bias in phylogeographic analyses. Although the number of genomes analyzed in this study roughly follows the number of COVID-19 confirmed cases in Brazil (Figures 1A and 1B), the nonrandom temporal sampling across locations might limit the potential of our analyses to identify large state-specific viral clusters and to accurately estimate the earliest onset date of communitarian transmission of SARS-CoV-2 variants in several locations. In addition, even though the datasets here analyzed comprise SARS-CoV-2 sequences from 24 out of 27 Brazilian states, the uneven distribution of these genomes among locations might have biased contributions to the overall inter-states’ transmissions.
STAR★Methods
Key resources table
Resource availability
Lead contact
Further information and requests for laboratory resources and reagents should be directed to and will be fulfilled by Paola Cristina Resende (paola@ioc.fiocruz.br) and/or Felipe Gomes Naveca (felipe.naveca@fiocruz.br). Requests for computational resources and files should be directed to and will be fulfilled by the corresponding authors, Tiago Gräf (tiago.graf@fiocruz.br) and/or Gonzalo Bello (gbello@ioc.fiocruz.br).
Materials availability
This study did not generate new unique reagents.
Experimental model and subject details
The genomic surveillance network
The Brazilian Ministry of Health COVID-19 Genomic Surveillance Network is composed by Oswaldo Cruz Foundation (Fiocruz), Adolfo Lutz Institute (IAL), Evandro Chagas Institute (IEC), the Central Laboratories of each State of Brazil (LACENs) and other partners, such as Health Secretariat of Aparecida de Goiânia, Goiás. Saliva or nasopharyngeal swabs (NPS) samples were collected from suspect COVID-19 cases by sentinel hospitals or health care units and initially tested by real time RT-PCR as a routine diagnostic for SARS-CoV-2 by LACENs using any of the following different commercial assays: SARS-CoV2 (E/RP) (Biomanguinhos, https://www.bio.fiocruz.br/index.php/br/produtos/reativos/testes-moleculares/novo-coronavirus-sars-cov2/kit-molecular-sars-cov-2-e-rp), Allplex 2019-nCoV Assay (Seegene) or an in-house protocol following the USA/CDC guidelines (https://www.fda.gov/media/134922/download). SARS-CoV-2 positive samples with cycling threshold (Ct) below 25 were randomly selected for genome sequencing. Alternatively, targeted cases, such as reinfection cases and fatal cases without comorbidities, were also selected for sequencing. Samples are then sent to one of the genomic reference laboratories participating in the network for whole genome sequencing.
Ethical aspects
This study was approved by the Ethics Committee of the of the FIOCRUZ (CAAE: 68118417.6.0000.5248 and CAAE:32333120.4.0000.5190) and Amazonas State University (CAAE: 25430719.6.0000.5016), which waived the signed informed consent. SISGEN AAE5985.
Method details
SARS-CoV-2 amplification and sequencing
The viral RNA was subjected to reverse transcription and PCR amplification using in-house protocols developed by COVID-19 Fiocruz Genomic Network (Nascimento et al., 2020; Resende et al., 2020b) or the Illumina COVIDSeq Test (Illumina), including some primers to cover regions with dropout (Naveca et al., 2022). Normalized pooled amplicons of each sample were used to prepare NGS libraries with Nextera XT (FC-131-1096) and clustered with MiSeq Reagent Kit v2 (500-cycles - MS-102-2003) on 2 × 250 cycles (in-house protocols) or 2 × 150 cycles (MS-103-1002) paired-end runs. All sequencing data was collected using the Illumina MiSeq sequencing platforms and MiSeq Control software v2.6.2.1 (Illumina).
SARS-CoV-2 whole-genome consensus sequences and genotyping
FastQ files generated by the Illumina pipeline at BaseSpace (Illumina) were downloaded and submitted to the Viralflow v.0.0.6 workflow (Dezordi et al., 2022). SARS-CoV-2 genome assembly was performed using the reference NC_045512.2 after trimming primers sequences of the ARTIC v3 primer set and the additional primer sequences used in the COVIDSeq adaptation performed by COVID-19 Fiocruz Genomic Surveillance network (Naveca et al., 2022). Alternatively, some sequencing hubs from downloaded and imported the FASTQ files into Geneious v10.2.6 (Biomatters) for trimming and assembling. Consensus sequences were initially assigned to viral lineages according to the nomenclature proposed by Rambaut et al. (2020), using the Pangolin software (https://cov-lineages.org/resources/pangolin.html) and later confirmed using phylogenetic analyses.
Dataset assembling
We complemented the Gamma and P.2 genomes dataset identified in our genomic surveillance by retrieving all Brazilian high quality (<5% N) whole-genomes (29 kb) identified as Gamma and P.2 from EpiCoV database in GISAID (https://www.gisaid.org) and that were deposited up to April 16th. Mutational composition of these sequences was analyzed in Nextclade (https://clades.nextstrain.org) (Aksamentov et al., 2021) and due to highly variable patterns of substitutions, we have retained in the dataset only those genomes with the full-set of P.2 and Gamma lineages defining synapomorphies, thus excluding from the analysis all Gamma-like sequences (Table S2. Gamma lineage defining synapomorphies and Table S3. P.2 lineage defining synapomorphies).
Analysis of temporal signal
SARS-CoV-2 Gamma and P.2 complete genome sequences were aligned using MAFFT v7.467 (Katoh Standley, 2013) and subject to maximum likelihood (ML) phylogenetic analysis using IQ-TREE v2.1.2 (Nguyen et al., 2015) under the general time-reversible (GTR) model of nucleotide substitution with a gamma-distributed rate variation among sites, four rate categories (G4), a proportion of invariable sites (I) and empirical base frequencies (F) nucleotide substitution model, as selected by the ModelFinder application (Kalyaanamoorthy et al., 2017). The branch support was assessed by the approximate likelihood-ratio test based on the Shimodaira–Hasegawa-like procedure (SH-aLRT) with 1,000 replicates. The temporal signal of the Gamma and P.2 assembled datasets was assessed from the ML tree by performing a regression analysis of the root-to-tip divergence against sampling time using TempEst (Rambaut et al., 2016) and excluding outlier sequences that deviate more than 1.5 interquartile ranges from root-to-tip regression line, which included those Gamma and P.2 sequences with the oldest sampling dates. The expected collection dates of the oldest Gamma and P.2 sequences were next estimated by using the Bayesian leaf-dating method developed to estimate the age of sequences with unknown collection date (Shapiro et al., 2011). For this analysis, the oldest Gamma and P.2 sequences were combined with sequences from Amazonas (Gamma epicenter) and RJ P.2 epicenter), respectively, and were analyzed with the BEAST v1.10 software (Suchard et al., 2018) using a GTR + F + I + G4 nucleotide substitution model, an exponential growth coalescent model for the tree prior and a strict molecular clock with a uniform substitution rate prior (8–10 × 10−4 substitutions/site/year). To estimate the expected sampling dates of the oldest sequences we set a precision value of two years for the recorded date (year and month) of sampling and select the tip date option sampling uniformly from precision. Sequences for which the recorded collection date was not contained within the 95% High Posterior Density (HPD) interval of the leaf-age estimate were excluded. Markov Chain Monte Carlo (MCMC) was run sufficiently long to ensure convergence (effective sample size> 200) in all parameter estimates as assessed in TRACER v1.7 (Rambaut et al., 2018).
Estimating time-scaled phylogenetic trees
Due to the size of the datasets, we firstly estimated the time of the most recent common ancestor (TMRCA) of each lineage by applying a full phylogenetic Bayesian analysis in downsampled datasets. To do so, 10 genomes per Brazilian state per week were randomly chosen using Augur (Huddleston et al., 2021), resulting in a dataset of 191 Gamma and 219 P.2 sequences. Alignments were generated using MAFFT v7.475 (Katoh Standley, 2013) and visually inspected in AliView (Larsson, 2014). Time-scaled phylogenetic trees were estimated in BEAST v.1.10 (Suchard et al., 2018) under a GTR + F + I + G4 model. The evolutionary model was complemented with the non-parametric Bayesian skyline (BSKL) model as the coalescent tree prior (Drummond et al., 2005) and a strict molecular clock model with a uniform substitution rate prior (8–10 × 10−4 substitutions/site/year). MCMC was run sufficiently long to ensure convergence (effective sample size> 200) in all parameter estimates as assessed in TRACER v1.7 (Rambaut et al., 2018). The maximum clade credibility (MCC) tree was summarized with TreeAnnotator v1.10. ML and MCC trees were visualized using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). The time-scaled tree for the full Gamma (N = 2,477) and P.2 (N = 1,357) datasets was performed using a modified version of BEAST (https://beast.community/thorney_beast) that makes feasible the analysis of big datasets. In summary the method alleviates most of the computational burden by fixing the tree topology and then re-scales the branch lengths based on a clock and coalescent models. To do so, ML phylogenetic trees constructed for both Gamma and P.2 datasets as explained above, were inputted in the BEAST xml file as starting and data trees and analyses were performed under a logistic coalescent prior, which outperformed [Bayes Factor (BF) > 3] the exponential prior in a Marginal Likelihood Estimation (MLE) of model fitness, and a strict molecular clock, as specified above. Additionally, we set a uniform prior on the root height estimates based on the 95% HPD determined in the BEAST analysis of the downsampled datasets. Four and three MCMC were run for 100 million generations and then combined to ensure stationarity and good mixing for the Gamma and P.2 datasets, respectively.
Discrete bayesian phylogeography
A set of 1000 and 900 trees was randomly selected from the posterior distribution of trees resulting from the BEAST analysis of the full Gamma and P.2 datasets, respectively. Sampling locations (Brazilian states) were used as traits in the phylogeographic model and reconstruction of the ancestral states was performed using a discrete symmetric model with BSSVS (Lemey et al., 2009) on the posterior sampling of trees. SPREAD software (Bielejec et al., 2011) was used to identify the well-supported transition rates based (BF > 3). We complemented this analysis with Markov jump estimation of the number of location transitions throughout evolutionary history (O’Brien et al., 2009), which was used to explore directionality of the transitions. Viral migration history between all supported location transitions was then visualized in circular migration flow plots using the package “circlize” (Gu et al., 2014) available in R software (https://www.r-project.org). The temporal history of transitions from the lineage epicenter to other Brazilian states was summarized as histograms in one-week periods. Because many spatial transitions connected nodes with low support, we used a conservative approach to estimate the onset date of community transmission in each state from the TMRCA of only highly supported (SH-aLRT > 80) clusters comprising at least three sequences from a given state and whose location root was most probably traced (Posterior State Probability [PSP] > 0.50) to that state. Additionally, we summarize P.2 and Gamma lineages spatial-temporal history alongside with SARI cases trajectories in Brazil, by generating an animation with baltic library (https://github.com/evogytis/baltic).
Effective reproductive number (Re) estimation
To estimate the Re of lineages Gamma and P.2 through time in the source locations, we selected all Gamma sequences from Amazonas and all P.2 sequences from RJ whose state of the ancestral node in the phylogeographic trees remained in Amazonas or RJ, respectively. To estimate the Re of Gamma and P.2 dissemination outside the source locations, state specific clusters in the phylogenetic trees were selected based on the following criteria: a) high branch support (SH-aLRT > 80); b) cluster size bigger than 10 sequences; c) > 80% of sequences from a single state and origin of the cluster in the same state. Selected clusters were then combined and analyzed when the dataset comprised more than 50 sequences. Following these criteria, we could only analyze the phylodynamic of local dissemination of Gamma in the states of RJ and SP. The assembled datasets comprised 298 and 199 Gamma genomes for RJ and SP, respectively. For P.2 we could not analyze local dissemination outside the source region. The assembled datasets were then analyzed using the birth-death skyline (BDSKY) model (Stadler et al., 2013) which reconstructs Re trajectories and is implemented within BEAST 2 v2.6.5 (Bouckaert et al., 2019). The sampling rate (d) was set to zero for the period before the oldest sample and estimated from the data afterward. The BDSKY prior settings were as follows: Become Uninfectious Rate (exponential, mean = 36); Reproductive Number (log normal, mean = 0.8, sd = 0.5); Sampling Proportion (beta, alpha = 1, beta = 100). Origin parameter was conditioned to the root height, and the Re was estimated in a piecewise manner over intervals defining to match state specific epidemiological trajectories. Molecular clock was as in the time-scaled trees analysis and the HKY + G4+F substitution model was used. MCMC chains were run until all relevant parameters reached ESS >200, as explained above.
REAGENT or RESOURCE
SOURCE
IDENTIFIER
Biological samples
Saliva or nasopharyngeal swabs (NPS) sample
State Central Laboratories (LACENs)
N/A
Critical commercial assays
Maxwell® RSC Viral Total Nucleic Acid Purification Kit
Authors: Felipe Gomes Naveca; Valdinete Nascimento; Victor Costa de Souza; André de Lima Corado; Fernanda Nascimento; George Silva; Ágatha Costa; Débora Duarte; Karina Pessoa; Matilde Mejía; Maria Júlia Brandão; Michele Jesus; Luciana Gonçalves; Cristiano Fernandes da Costa; Vanderson Sampaio; Daniel Barros; Marineide Silva; Tirza Mattos; Gemilson Pontes; Ligia Abdalla; João Hugo Santos; Ighor Arantes; Filipe Zimmer Dezordi; Marilda Mendonça Siqueira; Gabriel Luz Wallau; Paola Cristina Resende; Edson Delatorre; Tiago Gräf; Gonzalo Bello Journal: Nat Med Date: 2021-05-25 Impact factor: 53.440
Authors: Remco Bouckaert; Timothy G Vaughan; Joëlle Barido-Sottani; Sebastián Duchêne; Mathieu Fourment; Alexandra Gavryushkina; Joseph Heled; Graham Jones; Denise Kühnert; Nicola De Maio; Michael Matschiner; Fábio K Mendes; Nicola F Müller; Huw A Ogilvie; Louis du Plessis; Alex Popinga; Andrew Rambaut; David Rasmussen; Igor Siveroni; Marc A Suchard; Chieh-Hsi Wu; Dong Xie; Chi Zhang; Tanja Stadler; Alexei J Drummond Journal: PLoS Comput Biol Date: 2019-04-08 Impact factor: 4.475
Authors: Valdinete Alves do Nascimento; André de Lima Guerra Corado; Fernanda Oliveira do Nascimento; Ágatha Kelly Araújo da Costa; Debora Camila Gomes Duarte; Sérgio Luiz Bessa Luz; Luciana Mara Fé Gonçalves; Michele Silva de Jesus; Cristiano Fernandes da Costa; Edson Delatorre; Felipe Gomes Naveca Journal: Mem Inst Oswaldo Cruz Date: 2020-09-25 Impact factor: 2.743
Authors: Nuno R Faria; Thomas A Mellan; Charles Whittaker; Ingra M Claro; Darlan da S Candido; Swapnil Mishra; Myuki A E Crispim; Flavia C Sales; Iwona Hawryluk; John T McCrone; Ruben J G Hulswit; Lucas A M Franco; Mariana S Ramundo; Jaqueline G de Jesus; Pamela S Andrade; Thais M Coletti; Giulia M Ferreira; Camila A M Silva; Erika R Manuli; Rafael H M Pereira; Pedro S Peixoto; Moritz U Kraemer; Nelson Gaburo; Cecilia da C Camilo; Henrique Hoeltgebaum; William M Souza; Esmenia C Rocha; Leandro M de Souza; Mariana C de Pinho; Leonardo J T Araujo; Frederico S V Malta; Aline B de Lima; Joice do P Silva; Danielle A G Zauli; Alessandro C de S Ferreira; Ricardo P Schnekenberg; Daniel J Laydon; Patrick G T Walker; Hannah M Schlüter; Ana L P Dos Santos; Maria S Vidal; Valentina S Del Caro; Rosinaldo M F Filho; Helem M Dos Santos; Renato S Aguiar; José L P Modena; Bruce Nelson; James A Hay; Melodie Monod; Xenia Miscouridou; Helen Coupland; Raphael Sonabend; Michaela Vollmer; Axel Gandy; Marc A Suchard; Thomas A Bowden; Sergei L K Pond; Chieh-Hsi Wu; Oliver Ratmann; Neil M Ferguson; Christopher Dye; Nick J Loman; Philippe Lemey; Andrew Rambaut; Nelson A Fraiji; Maria do P S S Carvalho; Oliver G Pybus; Seth Flaxman; Samir Bhatt; Ester C Sabino Journal: medRxiv Date: 2021-03-03