| Literature DB >> 26840485 |
Joshua Quick1, Nicholas J Loman1, Sophie Duraffour2,3, Jared T Simpson4,5, Ettore Severi6, Lauren Cowley7, Joseph Akoi Bore2, Raymond Koundouno2, Gytis Dudas8, Amy Mikhail7, Nobila Ouédraogo9, Babak Afrough2,10, Amadou Bah2,11, Jonathan Hj Baum2, Beate Becker-Ziaja2,3, Jan-Peter Boettcher2,12, Mar Cabeza-Cabrerizo2,3, Alvaro Camino-Sanchez2, Lisa L Carter2,13, Juiliane Doerrbecker2,3, Theresa Enkirch2,14, Isabel Graciela García Dorival2,15, Nicole Hetzelt2,12, Julia Hinzmann2,12, Tobias Holm2,3, Liana Eleni Kafetzopoulou2,16, Michel Koropogui2,17, Abigail Kosgey2,18, Eeva Kuisma2,10, Christopher H Logue2,10, Antonio Mazzarelli2,19, Sarah Meisel2,3, Marc Mertens2,20, Janine Michel2,12, Didier Ngabo2,10, Katja Nitzsche2,3, Elisa Pallash2,3, Livia Victoria Patrono2,3, Jasmine Portmann2,21, Johanna Gabriella Repits2,22, Natasha Yasmin Rickett2,15,23, Andrea Sachse2,12, Katrin Singethan2,24, Inês Vitoriano2,10, Rahel L Yemanaberhan2,3, Elsa G Zekeng2,15,23, Racine Trina25, Alexander Bello25, Amadou Alpha Sall26, Ousmane Faye26, Oumar Faye26, N'Faly Magassouba27, Cecelia V Williams28,29, Victoria Amburgey28,29, Linda Winona28,29, Emily Davis29,30, Jon Gerlach29,30, Franck Washington29,30, Vanessa Monteil31, Marine Jourdain31, Marion Bererd31, Alimou Camara31, Hermann Somlare31, Abdoulaye Camara31, Marianne Gerard31, Guillaume Bado31, Bernard Baillet31, Déborah Delaune32,33, Koumpingnin Yacouba Nebie34, Abdoulaye Diarra34, Yacouba Savane34, Raymond Bernard Pallawo34, Giovanna Jaramillo Gutierrez35, Natacha Milhano36,6, Isabelle Roger34, Christopher J Williams37,6, Facinet Yattara17, Kuiama Lewandowski10, Jamie Taylor38, Philip Rachwal38, Daniel Turner39, Georgios Pollakis15,23, Julian A Hiscox15,23, David A Matthews40, Matthew K O'Shea1,41, Andrew McD Johnston41, Duncan Wilson41, Emma Hutley42, Erasmus Smit43, Antonino Di Caro19, Roman Woelfel2,44, Kilian Stoecker3,44, Erna Fleischmann2,44, Martin Gabriel2,3, Simon A Weller38, Lamine Koivogui45, Boubacar Diallo34, Sakoba Keita17, Andrew Rambaut8,46,47, Pierre Formenty34, Stephan Gunther2,3, Miles W Carroll2,10,48,49.
Abstract
The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.Entities:
Mesh:
Year: 2016 PMID: 26840485 PMCID: PMC4817224 DOI: 10.1038/nature16996
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 69.504
.Primer schemes employed during the study
We designed PCR primers to generate amplicons that would span the EBOV genome. We initially designed 38 primer pairs which were used in the initial validation study and which cover >98% of the EBOV genome (Panel A). During in-field sequencing we used a 19 reaction scheme or 11 reaction scheme which generated longer products. The predicted amplicon products are shown with forward primers and reverse primers indicated by green bars on the forward and reverse strand respectively, scaled according to the EBOV virus coordinates. The amplicon product sizes expected are shown for the 19 reaction scheme (Panel B) and the 11 reaction scheme (Panel C). No amplicon covers the extreme 3′ region of the genome. The last primer pair, 38_R, ends at position 18578, 381 bases away from the end of the virus genome. The primer diagram was created with Biopython [33].
.List of equipment and consumables to establish the genome surveillance system
We show the list of equipment (Panel A), disposable consumables (Panel B) and reagents (Panel C) to establish in-field genomic surveillance. Sufficient reagents were shipped for 20 samples. MinION sequencing requires a mix of chilled and frozen reagents. Recommended shipping conditions are specified. The picture underneath depicts MinION flowcells ready for shipping with insulating material (left) and frozen reagents (right).
Figure 1Deployment of the portable genome surveillance system in Guinea
We were able to pack all instruments, reagents and disposable consumables within aircraft baggage (Panel A). We initially established the genomic surveillance laboratory in Donka Hospital, Conakry (Panel B). Later we moved the laboratory to a dedicated sequencing laboratory in Coyah prefecture (Panel C). Within this laboratory (Panel D) we separated the sequencing instruments (on the left) from the PCR bench (to the right). An uninterruptable power supply can be seen in the middle which provides power to the thermocycler. (Photographs taken by Josh Quick and Sophie Duraffour.)
.Bioinformatics workflow
This figure summarises the steps performed during bioinformatics analysis (ordered from top to bottom), in order to generate consensus sequences. The right column shows the example UNIX command executed at each step.
.Results of MinION validation
The results of comparing four MinION sequences with Illumina sequences generated as part of a previous study [3] are shown in Panel A. Each row in the table demonstrates the number of true positives, false positives and false negatives for a sample. False negatives may result in masked sequences, due to being outside of regions covered by the amplicon scheme, having low coverage or falling within a primer binding site. Results before and after quality filtering (log-likelihood ratio of >200) are shown. After quality filtering, no false positive calls were detected. All detected false negatives were masked with Ns in the final consensus sequence. No positions were called incorrectly. The four consensus sequences, plus an additional sample that had missing coverage in one amplicon are shown as part of a phylogenetic reconstruction with genomes from Carroll et al. [3]. Sample labels in red, blue, pink, yellow and blue represent pairs of sequences generated on MinION and llumina that fall into identical clusters.
.Relationship between coverage and log-likelihood ratio for sample 076769
Line-plot showing the relationship between sequence depth of coverage (x axis) and the log-likelihood ratio for detected SNPs derived by subsampling reads from a single sequencing run to simulate the effect of low coverage. The horizontal and vertical line indicates the cut-offs (quality and coverage respectively) for consensus calling. Therefore, all variants are detected below 25x coverage, and the vast majority meet the threshold quality at 25x coverage or slightly above. Any combination of log likelihood ratio or coverage which placed variants in the grey box would be represented as a masked position in the final consensus sequence.
.Duration of MinION sequencing runs
For each sequence run the sequencing duration, measured as the difference between timestamp of the first read seen and the last read transferred for analysis. 127 runs are shown, with 15 outliers with duration greater than 200 minutes excluded.
Figure 2Real-time genomics surveillance in context of the Guinea EVD epidemic
Here we show the number of reported cases of EVD in Guinea (red) in relation to the number of EBOV new patient samples (n=137, in blue) generated during this study (Panel A). For each of the 142 sequenced samples, we show the relationship between sample collection date (red) and the date of sequencing (blue) (Panel B). Twenty-eight samples were sequenced within three days of the sample being taken, and sixty-eight samples within a week. Larger gaps represent retrospective sequencing of cases to provide additional epidemiological context.
.Histogram of Ct values for study samples
Ct values for samples in the study (where information was available) ranged between 13.8 and 35.7, with a mean of 22.
.Sequence accuracy for samples
Accuracy measurements for the entire set of two-direction reads were made for the validation samples, sequenced in the United Kingdom (Panel A) and each of the 142 samples from real-time genomic surveillance (Panel B). Accuracy is defined according to the definition from Quick et al. [11]. Vertical dashed lines indicate the mean accuracy for the sample.
Figure 3Evolution of EBOV over the course of the EVD epidemic
Time-scaled phylogeny of 603 published sequences with 125 high quality sequences from this study (Panel A). The shape of nodes on the tree demonstrates country of origin. Our results show Guinean samples (coloured circles) belong to two previously identified lineages, GN1 and SL3. GN1 is deeply branching with early epidemic samples (Panel B). SL3 is related to cases identified in Sierra Leone (Panel C). Samples are frequently clustered by geography (indicated by colour of circle) and this provides information as to origins of new introductions, such as in the Boké epidemic in May 2015. Map figure adapted from SimpleMaps website (http://simplemaps.com/resources/svg-gn).
.Maximum Likelihood phylogenetic inference of 125 Ebola virus samples from this study with 603 previously published sequences
Coloured nodes are from this study. Node shape reflects country of origin. Panel A depicts the entire dataset, with zoomed regions focusing on lineages GN1 (Panel B) and SL3 (Panel C) identified during real-time sequencing. Map figure adapted from SimpleMaps website (http://simplemaps.com/resources/svg-gn).
.Root-to-tip divergence plot for the 728 Ebola samples generated through Maximum-Likelihood analysis (Panel A). Samples from real-time genomic surveillance are coloured as per Figure 3 and Extended Figure 2. Panel B. Mean evolutionary rate estimate (in substitutions per site per year) across the EBOV phylogeny recovered using BEAST under a relaxed lognormal molecular clock Blue area corresponds to the 95% highest posterior density (HPD) (mean of the distribution is 1.19E-3, 95% HPDs: 1.09 - 1.29 E-3 substitutions per site per year). Hatched regions in red are outside the 95% HPD intervals.