| Literature DB >> 34642605 |
Carolina M Voloch1, Ronaldo da Silva Francisco2, Luiz G P de Almeida2, Otavio J Brustolini2, Cynthia C Cardoso1, Alexandra L Gerber2, Ana Paula de C Guimarães2, Isabela de Carvalho Leitão3, Diana Mariani1, Victor Akira Ota4, Cristiano X Lima5, Mauro M Teixeira6, Ana Carolina F Dias7, Rafael Mello Galliez4, Débora Souza Faffe3, Luís Cristóvão Pôrto8, Renato S Aguiar1, Terezinha M P P Castiñeira4, Orlando C Ferreira1, Amilcar Tanuri1, Ana Tereza R de Vasconcelos2.
Abstract
Long-term infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) represents a challenge to virus dispersion and the control of coronavirus disease 2019 (COVID-19) pandemic. The reason why some people have prolonged infection and how the virus persists for so long are still not fully understood. Recent studies suggested that the accumulation of intra-host single nucleotide variants (iSNVs) over the course of the infection might play an important role in persistence as well as emergence of mutations of concern. For this reason, we aimed to investigate the intra-host evolution of SARS-CoV-2 during prolonged infection. Thirty-three patients who remained reverse transcription polymerase chain reaction (RT-PCR) positive in the nasopharynx for on average 18 days from the symptoms onset were included in this study. Whole-genome sequences were obtained for each patient at two different time points. Phylogenetic, populational, and computational analyses of viral sequences were consistent with prolonged infection without evidence of coinfection in our cohort. We observed an elevated within-host genomic diversity at the second time point samples positively correlated with cycle threshold (Ct) values (lower viral load). Direct transmission was also confirmed in a small cluster of healthcare professionals that shared the same workplace by the presence of common iSNVs. A differential accumulation of missense variants between the time points was detected targeting crucial structural and non-structural proteins such as Spike and helicase. Interestingly, longitudinal acquisition of iSNVs in Spike protein coincided in many cases with SARS-CoV-2 reactive and predicted T cell epitopes. We observed a distinguishing pattern of mutations over the course of the infection mainly driven by increasing A→U and decreasing G→A signatures. G→A mutations may be associated with RNA-editing enzyme activities; therefore, the mutational profiles observed in our analysis were suggestive of innate immune mechanisms of the host cell defense. Therefore, we unveiled a dynamic and complex landscape of host and pathogen interaction during prolonged infection of SARS-CoV-2, suggesting that the host's innate immunity shapes the increase of intra-host diversity. Our findings may also shed light on possible mechanisms underlying the emergence and spread of new variants resistant to the host immune response as recently observed in COVID-19 pandemic.Entities:
Keywords: COVID-19; RNA-editing enzymes; Spike gene; helicase gene; prolonged infection
Year: 2021 PMID: 34642605 PMCID: PMC8500031 DOI: 10.1093/ve/veab078
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Characterization of patients with prolonged SARS-CoV-2 infection. (A) Time intervals between onset of patients’ symptoms (blue circles), first sample sequenced—T1 (orange circles), first positive serological test (black rhombuses), second sample sequenced—T2 (white circles), and last positive RT-PCR test (orange circles), respectively. The overlapping between dates are characterized by overlapping between circles as observed, for example, in some cases for white and purple circles. Blue lines represent the interval between onset of patients’ symptoms and first sample sequenced. Black lines indicate the difference between the two samples sequenced whereas in purple we show the difference between the second time sample and the last positive RT-PCR test. (B) Key clinical features for the patients analyzed in this study. Each row indicates two different features, first row: gender and occupation; second-to-fourth rows: symptoms, fifth-to-seventh rows: comorbidities. Patients are represented in the columns.
Figure 2.Intra-host genetic evaluation of SARS-CoV-2 genomes. (A) Distribution of iSNVs across the SARS-CoV-2 genome. Vertical line represents the within-host iSNV frequency targeting the protein products of the virus. In red, we showed the lineage-defining sites of each lineage identified in our samples. Dashed line indicates 95 per cent of allele frequency (B) Comparison of unique mapped reads versus number of iSNVs with MAF > 5 per cent of frequency identified in each of the 66 samples. (C) Spearman’s correlation tests between the Ct values and number of iSNVs. (D) Bar plot showing the distribution of iSNVs across the 66 samples.
Figure 3.Phylogenetic analysis of prolonged samples. Maximum Likelihood tree obtained with Consensus dataset analysis under GTR + I model, containing a consensus genome for each time point of the 33 patients plus 135 populational samples. Populational sample names were excluded from the figure for clarity. Text boxes indicate the Pangolin lineage classification. A different color represents each sampled lineage in the tree. Numbers indicate each patient’s samples. Orange and purple circles represent T1 and T2 samples, respectively. Patients whose two time point samples are clustered are highlighted in light-gray boxes. Red numbers indicate patients that are not monophyletic in the haplotypes tree. Inset on the right side of the figure indicates the number of SNPs between the consensus sequences of T1 and T2 for each patient.
Figure 4.Differential mutational signatures and prolonged infection sample classification using machine learning models. (A) Distribution of the overall proportion of transitions and transversion in SARS-CoV-2 genomes in our study (B) A → U and (C) G → A proportions in samples from T1 and T2. (D) ROC curve showing a graphical representation of the relationship between sensitivity and specificity of the time point (T1 and T2) classification. The metrics table displays model performance. (E) Overall feature the importance and exhibits the most significant variables to separate the T1 and T2 classes.