| Literature DB >> 33958444 |
Liguo Zhang1, Alexsia Richards1, M Inmaculada Barrasa1, Stephen H Hughes2, Richard A Young1,3, Rudolf Jaenisch4,3.
Abstract
Prolonged detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA and recurrence of PCR-positive tests have been widely reported in patients after recovery from COVID-19, but some of these patients do not appear to shed infectious virus. We investigated the possibility that SARS-CoV-2 RNAs can be reverse-transcribed and integrated into the DNA of human cells in culture and that transcription of the integrated sequences might account for some of the positive PCR tests seen in patients. In support of this hypothesis, we found that DNA copies of SARS-CoV-2 sequences can be integrated into the genome of infected human cells. We found target site duplications flanking the viral sequences and consensus LINE1 endonuclease recognition sequences at the integration sites, consistent with a LINE1 retrotransposon-mediated, target-primed reverse transcription and retroposition mechanism. We also found, in some patient-derived tissues, evidence suggesting that a large fraction of the viral sequences is transcribed from integrated DNA copies of viral sequences, generating viral-host chimeric transcripts. The integration and transcription of viral sequences may thus contribute to the detection of viral RNA by PCR in patients after infection and clinical recovery. Because we have detected only subgenomic sequences derived mainly from the 3' end of the viral genome integrated into the DNA of the host cell, infectious virus cannot be produced from the integrated subgenomic SARS-CoV-2 sequences.Entities:
Keywords: LINE1; SARS-CoV-2; chimeric RNAs; genomic integration; reverse transcription
Mesh:
Substances:
Year: 2021 PMID: 33958444 PMCID: PMC8166107 DOI: 10.1073/pnas.2105968118
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.SARS-CoV-2 RNA can be reverse transcribed and integrated into the host cell genome. (A) Experimental workflow. (B) Chimeric sequence from a Nanopore sequencing read showing integration of a full-length SARS-CoV-2 NC subgenomic RNA sequence (magenta) and human genomic sequences (blue) flanking both sides of the integrated viral sequence. Features indicative of LINE1-mediated “target-primed reverse transcription” include the target site duplication (yellow highlight) and the LINE1 endonuclease recognition sequence (underlined). Sequences that could be mapped to both genomes are shown in purple with mismatches to the human genomic sequences in italics. The arrows indicate sequence orientation with regard to the human and SARS-CoV-2 genomes as shown in C and D. (C) Alignment of the Nanopore read in B with the human genome (chromosome X) showing the integration site. The human sequences at the junction region show the target site, which was duplicated when the SARS-CoV-2 cDNA was integrated (yellow highlight) and the LINE1 endonuclease recognition sequence (underlined). (D) Alignment of the Nanopore read in B with the SARS-CoV-2 genome showing the integrated viral DNA is a copy of the full-length NC subgenomic RNA. The light blue highlighted regions are enlarged to show TRS-L (I) and TRS-B (II) sequences (underlined, these are the sequences where the viral polymerase jumps to generate the subgenomic RNA) and the end of the viral sequence at the poly(A) tail (III). These viral sequence features (I–III) show that a DNA copy of the full-length NC subgenomic RNA was retro-integrated. (E) A human–viral chimeric read pair from Illumina paired-end whole-genome sequencing. The read pair is shown with alignment to the human (blue) and SARS-CoV-2 (magenta) genomes. The arrows indicate the read orientations relative to the human and SARS-CoV-2 genomes. The highlighted (light blue) region of the human read mapping is enlarged to show the LINE1 recognition sequence (underlined). (F) Distributions of human–CoV2 chimeric junctions from Nanopore (Left) and Illumina (Right) sequencing with regard to features of the human genome.
Summary of the human-CoV2 chimeric sequences obtained by Nanopore DNA sequencing of infected LINE1-overexpressing HEK293T cells
| Number of sequences with human-CoV2 junction | With LINE1 recognition sequence at/near junction (e.g., TTTT/A) | Junction at human intergenic | Junction at human intron | Junction at human exon/UTR | |
| chr1 | 10 | 6 | 0 | 6 | 4 |
| chr2 | 2 | 2 | 0 | 2 | 0 |
| chr3 | 3 | 3 | 0 | 3 | 0 |
| chr4 | 2 | 2 | 0 | 1 | 1 |
| chr5 | 1 | 1 | 0 | 1 | 0 |
| chr6 | 4 | 2 | 3 | 0 | 1 |
| chr7 | 2 | 2 | 1 | 1 | 0 |
| chr8 | 0 | 0 | 0 | 0 | 0 |
| chr9 | 4 | 2 | 0 | 2 | 2 |
| chr10 | 5 | 1 | 2 | 1 | 2 |
| chr11 | 3 | 2 | 1 | 1 | 1 |
| chr12 | 6 | 4 | 2 | 2 | 2 |
| chr13 | 3 | 3 | 3 | 0 | 0 |
| chr14 | 2 | 2 | 1 | 1 | 0 |
| chr15 | 0 | 0 | 0 | 0 | 0 |
| chr16 | 2 | 1 | 1 | 1 | 0 |
| chr17 | 2 | 0 | 1 | 0 | 1 |
| chr18 | 2 | 1 | 0 | 2 | 0 |
| chr19 | 1 | 1 | 0 | 0 | 1 |
| chr20 | 0 | 0 | 0 | 0 | 0 |
| chr21 | 2 | 1 | 1 | 1 | 0 |
| chr22 | 1 | 1 | 0 | 1 | 0 |
| chrX | 6 | 5 | 2 | 1 | 3 |
| Total | 63 | 42 | 18 | 27 | 18 |
| Fraction | 66.7% | 28.6% | 42.9% | 28.6% |
Summary of the human-CoV2 chimeric sequences obtained by Illumina paired-end whole-genome DNA sequencing of infected LINE1-overexpressing HEK293T cells
| Region features (human) | Intergenic | Intron | Exon/UTR |
| Region number | 4 | 9 | 4 |
| With L1 recognition sequence at/near junction | 2 | 3 | 2 |
Fig. 2.Evidence for integration of SARS-CoV-2 cDNA in cultured cells that do not overexpress a reverse transcriptase. (A) Experimental workflow. (B) Experimental design for the Tn5 tagmentation-mediated enrichment sequencing method used to map integration sites in the host cell genome. (C) A human–viral chimeric read pair supporting viral integration. The reads are aligned with the human (blue) and SARS-CoV-2 (magenta) genomic sequences. The arrows indicate the read orientations relative to the human and SARS-CoV-2 genomes as shown in D and E. Sequence of the viral primer used for enrichment is shown with green highlight in the read (corresponding to the green arrow illustrated in B). Sequences that could be mapped to both genomes are shown in purple. (D) Alignment of the read pair in C with the human genome (chromosome 15, blue arrow). The highlighted (light blue) region of the human sequence is enlarged to show the LINE1 recognition sequence (underlined) with a 19-base poly-dT sequence (purple highlight) that could be annealed by the viral poly-A tail for “target-primed reverse transcription.” Additional 5-bp human sequence (GAATG, blue) was captured in read 2 (C), supporting a bona fide integration site. (E) Alignment of the read pair in C with the SARS-CoV-2 genome (magenta). The viral primer sequence is shown with green highlight. (F) Summary of seven human–viral chimeric sequences identified by the enrichment sequencing method in the two cell lines showing the integrated human chromosomes, LINE1 recognition sequences close to the chimeric junction, and human genomic features at the read junction.
Fig. 3.Negative-strand viral RNA-seq reads suggest that integrated SARS-CoV-2 sequences are expressed. (A) Schema predicting fractions of positive- or negative-strand SARS-CoV-2 RNA-seq reads that are derived from viral (sub)genomic RNAs or from transcripts of integrated viral sequences. The arrows (Right) showing the orientation of an integrated SARS-CoV-2 (magenta) positive strand relative to the orientation of the host cellular gene (blue). (B) Fractions of SARS-CoV-2 sequences integrated into human genes with same (n = 15) or opposite (n = 13) orientation of the viral positive strand relative to the positive strand of the human gene. A total of 28 integration events at human genes with LINE1 endonuclease recognition sequences were identified from our Nanopore DNA sequencing of infected LINE1-overexpressing HEK293T cells (Fig. 1). (C) Fraction of total viral reads that are derived from negative-strand viral RNA in acutely infected cells or organoids (see for details). (D) Fraction of human–viral chimeric reads that contain viral sequences derived from negative-strand viral RNA in acutely infected cells or organoids (see for details). (E) Fraction of total viral reads that are derived from negative-strand viral RNA in published patient RNA-seq data (autopsy FFPE samples, GSE150316, samples with no viral reads or of low library strandedness quality not included; see for details; reanalysis results consistent with the original publication). (F) Fraction of human–viral chimeric reads that contain viral sequences derived from negative-strand viral RNA in published patient RNA-seq data (autopsy FFPE samples, GSE150316; see for details). (G) Fraction of total viral reads that are derived from negative-strand viral RNA in published patient RNA-seq data (BALF samples, GSE145926; see for details). The red dashed lines in E–G indicate the level at which 50% of all viral reads (E and G) or viral sequences in human–viral chimeric reads (F) were from negative-strand viral RNAs, a level expected if all the viral sequences were derived from integrated sequences.