| Literature DB >> 32824573 |
Sureshnee Pillay1, Jennifer Giandhari1, Houriiyah Tegally1, Eduan Wilkinson1, Benjamin Chimukangara1,2,3, Richard Lessells1,4, Yunus Moosa4, Stacey Mattison1, Inbal Gazy1, Maryam Fish1, Lavanya Singh1, Khulekani Sedwell Khanyile1, James Emmanuel San1, Vagner Fonseca1,5,6, Marta Giovanetti6, Luiz Carlos Alcantara5,6, Tulio de Oliveira1,2,7.
Abstract
The COVID-19 pandemic has spread very fast around the world. A few days after the first detected case in South Africa, an infection started in a large hospital outbreak in Durban, KwaZulu-Natal (KZN). Phylogenetic analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes can be used to trace the path of transmission within a hospital. It can also identify the source of the outbreak and provide lessons to improve infection prevention and control strategies. This manuscript outlines the obstacles encountered in order to genotype SARS-CoV-2 in near-real time during an urgent outbreak investigation. This included problems with the length of the original genotyping protocol, unavailability of reagents, and sample degradation and storage. Despite this, three different library preparation methods for Illumina sequencing were set up, and the hands-on library preparation time was decreased from twelve to three hours, which enabled the outbreak investigation to be completed in just a few weeks. Furthermore, the new protocols increased the success rate of sequencing whole viral genomes. A simple bioinformatics workflow for the assembly of high-quality genomes in near-real time was also fine-tuned. In order to allow other laboratories to learn from our experience, all of the library preparation and bioinformatics protocols are publicly available at protocols.io and distributed to other laboratories of the Network for Genomics Surveillance in South Africa (NGS-SA) consortium.Entities:
Keywords: COVID-19; Illumina; SARS-CoV2; bioinformatics; protocols; sequencing
Mesh:
Year: 2020 PMID: 32824573 PMCID: PMC7464704 DOI: 10.3390/genes11080949
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Processes to generate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes and qPCR diagnostics in the KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP) laboratory. The figure also shows the number of days needed by two senior scientists to generate 24 whole genomes by using an Illumina Miseq Nano kit V2. It is possible to generate 94 whole genomes with one extra day of sequencing with the use of a MiSeq Reagent Kit v2 (500 cycles).
Figure 2Three-step workflow for generation of high-quality genomes. Step 1: Raw reads from Illumina and Nanopore sequencing were assembled by using the web-based Genome Detective 1.126 (https://www.genomedetective.com/) platform and its coronavirus typing tool. Step 2: The initial assembly obtained from Genome Detective was polished by aligning mapped reads to the references and filtering out mutations with low genotype likelihoods, using bcftools 1.7-2 mpileup method. This calculation determines the probability of a genotype at sites containing reads with various bases (e.g., the probability that position 27,784 is A vs. T in illustration above). Step 3: All mutations were validated visually with BAM files viewed in Geneious software, to ensure that called mutations were true and not part of lingering adapter sites.
Figure 3Association between cycle threshold (Ct) value and genome length. (A) Regression plot of mean Ct value of all unique samples against their genome lengths (% coverage against SARS-CoV-2 reference). Samples with missing Ct value information (n = 8) are shown in red. Forty-four assembled genomes of >90% were produced from samples having Ct value <27 (blue); six genomes of >90% and Ct value >27 (green); 12 genomes <90% coverage and Ct value <27 (purple); and 37 genomes <90% coverage and Ct value >27 (orange). (B) Box plot and statistical comparison of genome coverage obtained from samples grouped in three mean Ct value thresholds (25, 27, and 30), showing statistically significant (t-tests) differences between lower and higher Ct value samples. ****: level of significance.
Comparison of coverage and Ct values between the different library preparation methods for repeat samples only.
| Coverage (% of SARS-CoV-2 Genome) | ||||
|---|---|---|---|---|
| Sequence | TruSeq DNA Nano | NEBnext Ultra II | Nextera Flex | Ct Value |
| KRISP_0002 | 97.5 | 98.3 | 98.5 | 24.0 |
| KRISP_0004 | 99.5 | 98.3 | 98.1 | 24.1 |
| KRISP_0019 | 97.4 | 89.9 | 90.2 | NA |
| KRISP_0021 | 63.5 | 99.1 | 82.7 | 14.1 |
| KRISP_0024 | - | 95.2 | 94.6 | 21.5 |
| KRISP_0026 | - | 99.8 | 99.9 | 17.9 |
| KRISP_0028 | - | 96.1 | 98.1 | 21.4 |
| KRISP_0031 | - | 86.2 | 84.1 | 24.3 |
| KRISP_016 | 99.8 | - | 99.2 | NA |
| KRISP_017 | 99.9 | - | 99.9 | NA |
| KRISP_006 | 99.5 | - | 96.3 | 21.0 |
| KRISP_007 | 99.9 | - | 99.6 | 25.5 |
| KRISP_003 | 25.3 | 92.4 | - | Undetermined |
| KRISP_010 | 93.4 | 92.3 | - | 25.6 |
| KRISP_014 | 81.9 | 70.6 | - | NA |
| KRISP_013 | 63.4 | 73.0 | - | NA |
Figure 4Association between Ct value and genome length by library preparation method. (A) Regression plot of mean Ct value of all unique samples against their genome lengths (% coverage against SARS-CoV-2 reference). Samples with missing Ct value information (n = 8) are shown in red. A total of 114 assembled genomes of >90% were produced (80 with Ct value <27, 29 with Ct value >27, and five with missing Ct values). (B) Box plot and statistical comparison of genome coverage obtained from samples grouped in three mean Ct value thresholds (25, 27, and 30) by library preparation method, showing statistically significant (t-tests) differences between lower and higher Ct value samples. ****: level of significance.
Figure 5Phylogenetic tree. Showing a Maximum-Likelihood (ML) tree of the 54 genomes (orange circles) against publicly available SARS-CoV-2 genomes as reference. The 54 genomes fall mostly in the B.1 (n = 50), B (n = 3), or B.2 (n = 1) lineages.