| Literature DB >> 32824272 |
Jalees A Nasir1,2, Robert A Kozak3, Patryk Aftanas3, Amogelang R Raphenya1,2, Kendrick M Smith4, Finlay Maguire5, Hassaan Maan6, Muhannad Alruwaili7, Arinjay Banerjee1,8,9, Hamza Mbareche3,10, Brian P Alcock1,2, Natalie C Knox11,12, Karen Mossman1,8,9, Bo Wang6,13,14, Julian A Hiscox7, Andrew G McArthur1,2, Samira Mubareka3,10.
Abstract
Genome sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is increasingly important to monitor the transmission and adaptive evolution of the virus. The accessibility of high-throughput methods and polymerase chain reaction (PCR) has facilitated a growing ecosystem of protocols. Two differing protocols are tiling multiplex PCR and bait capture enrichment. Each method has advantages and disadvantages but a direct comparison with different viral RNA concentrations has not been performed to assess the performance of these approaches. Here we compare Liverpool amplification, ARTIC amplification, and bait capture using clinical diagnostics samples. All libraries were sequenced using an Illumina MiniSeq with data analyzed using a standardized bioinformatics workflow (SARS-CoV-2 Illumina GeNome Assembly Line; SIGNAL). One sample showed poor SARS-CoV-2 genome coverage and consensus, reflective of low viral RNA concentration. In contrast, the second sample had a higher viral RNA concentration, which yielded good genome coverage and consensus. ARTIC amplification showed the highest depth of coverage results for both samples, suggesting this protocol is effective for low concentrations. Liverpool amplification provided a more even read coverage of the SARS-CoV-2 genome, but at a lower depth of coverage. Bait capture enrichment of SARS-CoV-2 cDNA provided results on par with amplification. While only two clinical samples were examined in this comparative analysis, both the Liverpool and ARTIC amplification methods showed differing efficacy for high and low concentration samples. In addition, amplification-free bait capture enriched sequencing of cDNA is a viable method for generating a SARS-CoV-2 genome sequence and for identification of amplification artifacts.Entities:
Keywords: SARS-CoV-2; amplicon sequencing; bait capture; genome sequencing
Mesh:
Substances:
Year: 2020 PMID: 32824272 PMCID: PMC7472420 DOI: 10.3390/v12080895
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Plot showing the percent of sequencing reads mapping to the SARS-CoV-2 reference genome against the total number of paired reads acquired from each library preparation. Each data point is additionally labelled with a percent fraction and average read coverage of the SARS-CoV-2 genome.
Sequencing read and genome assembly statistics including the total raw read pairs obtained and fraction captured from SARS-CoV-2 RNA, the fraction of 29,903 bp MN908947.3 genome sequence covered, depth of coverage, and number of variants detected relative to MN908947.3.
| Sample | Amplification | Enrichment | Number of Paired Reads | Reads from SARS-CoV-2 (%) | SARS-CoV-2 Genome Fraction (%) | Average Depth of Coverage | 0–100x Coverage (%) | 101–1000x Coverage (%) | >1000x Coverage (%) | # iVar Variants |
|---|---|---|---|---|---|---|---|---|---|---|
| Negative | ARTIC | No | 938,693 | 0.01 | 0 | 4.1x | 99.2 | 0.8 | 0.1 | n/a |
| Wuhan | Liverpool | No | 883,212 | 0.52 | 19.587 | 37.9x | 93.88 | 6.08 | 0.04 | 1 |
| Wuhan | Liverpool | Yes | 22,119 | 58.73 | 20.811 | 98.6x | 89.6 | 6.8 | 3.6 | 1 |
| Wuhan | Hexamers | No | 585,396 | 0.01 | 0 | 0.3x | 99.9 | 0.1 | 0.00 | n/a |
| Wuhan | Hexamers | Yes | 1536 | 1.56 | 0 | n/a | n/a | n/a | n/a | n/a |
| Wuhan | ARTIC | No | 2,271,152 | 73.86 | 59.104 | 15,604.0x | 10.6 | 35.5 | 53.9 | 5 |
| Iran | Liverpool | No | 813,975 | 90.13 | 98.53 | 6528.3x | 1.2 | 3.1 | 95.6 | 6 |
| Iran | Liverpool | Yes | 901,124 | 89.76 | 98.54 | 8214.4x | 0.7 | 0.2 | 99.1 | 6 |
| Iran | Hexamers | No | 1,091,011 | 2.77 | 99.89 | 215.3x | 0.43 | 99.56 | 0.00 | 7 |
| Iran | Hexamers | Yes | 619,661 | 89.17 | 99.83 | 4383.9x | 0.2 | 0.3 | 99.6 | 7 |
| Iran | ARTIC | No | 1,935,748 | 88.25 | 99.31 | 14,032.7x | 0.2 | 1.7 | 98.1 | 7 |
Predicted mutations relative to the MN908947.3 SARS-CoV-2 genome for each library for the high titre Iran-derived sample identified by BreSeq analysis of sequencing reads. Mutations within codons are underlined. All mutations were predicted by 100% of sequencing reads mapping to that position unless otherwise noted. Mutations in bold existed in the final iVar-called genome sequence, while those in italics exist in the final iVar-called genome sequence but were obscured by deletion predictions in the minority reads for BreSeq.
| Mutation | Liverpool Alone | Liverpool + Enrichment | Hexamers Alone | Hexamers + Enrichment | ARTIC Amplification | Clinical Diagnostic Primer Mismatch |
|---|---|---|---|---|---|---|
| Unresolved 5′ sequence | 259 bp | 258 bp | 40 bp | 0 bp | 49 bp | |
| Unresolved 3′ sequence | 200 bp | 190 bp | 77 bp | 139 bp | 67 bp | |
| pos. 835 (orf1ab polyprotein) | F190F (TT | F190F (TT | F190F (TT | F190F (TT | F190F (TT | NIID_WH-1_R854 |
| pos. 884 (orf1ab polyprotein) | R207C ( | R207C ( | R207C ( | R207C ( | R207C ( | NIID_WH-1_R913 |
| pos. 1397 (orf1ab polyprotein) | V378I ( | V378I ( | V378I ( | V378I ( | V378I ( | |
| pos. 8653 (orf1ab polyprotein) | M2796I (AT | M2796I (AT | M2796I (AT | M2796I (AT | M2796I (AT | Spike_F1 |
| pos. 9502 (orf1ab polyprotein) | 5.0% of reads suggest | Spike_F1 | ||||
| pos. 11,074 (orf1ab polyprotein) | 11.8% of reads suggest a deletion between positions 10,809 and 13,203 | 11.8% of reads suggest a deletion between positions 10,809 and 13,203 | 10.9% of reads suggest a deletion between positions 10,809 and 13,203 | Spike_F1 | ||
| pos. 11,082 (orf1ab polyprotein) | 18.1% of reads suggest a deletion between positions 10,817 and 10,819 | 22.8% of reads suggest a deletion between positions 10,817 and 10,819 | Spike_F1 | |||
| pos. 11,083 (orf1ab polyprotein) |
|
| L3606F (TT | L3606F (TT | L3606F (TT | Spike_F1 |
| pos. 19,285–19,603 (orf1ab polyprotein) | 319 bp coverage gap (no aligned reads); amplicon 64 | |||||
| pos. 27,156 (membrane glycoprotein) | 5.3% of reads suggest | |||||
| pos. 28,688 (nucleocapsid phosphoprotein) | L139L ( | L139L ( | L139L ( | L139L ( | L139L ( | 2019-nCoV_N3-F |
| pos. 29,742 (intergenic) |
|
| G→T | G→T | G→T |
Figure 2Mapping and semi-log depth of coverage of trimmed sequencing reads for each library preparation against the first Wuhan SARS-CoV-2 genome sequence (NCBI accession: MN908947•3). Y-axis dimensions vary among samples (maximum indicated beside label) and colored positions reflect frequency of SNPs relative to the MN908947•3 genome among the reads (green = A, blue = C, orange = G, red = T). The plus (+) symbol indicates secondary bait capture enrichment. SARS-CoV-2 genome length and organization is highlighted on top.
Figure 3Uniform manifold approximation and projection (UMAP) involving the aligned genomes of 8075 SARS-CoV-2 isolates labelled by country of origin. The Iran-derived sample is indicated by an arrow. The top inset illustrates the analysis of all 8075 isolates, labelled by region, with the zoomed region indicated by the hashed box.