Literature DB >> 30261842

Real-time analysis of nanopore-based metagenomic sequencing from infected orthopaedic devices.

Nicholas D Sanderson¹, Teresa L Street², Dona Foster², Jeremy Swann², Bridget L Atkins^3,4, Andrew J Brent^2,3, Martin A McNally³, Sarah Oakley⁴, Adrian Taylor⁴, Tim E A Peto^2,5, Derrick W Crook^2,5, David W Eyre^2,5.

Abstract

BACKGROUND: Prosthetic joint infections are clinically difficult to diagnose and treat. Previously, we demonstrated metagenomic sequencing on an Illumina MiSeq replicates the findings of current gold standard microbiological diagnostic techniques. Nanopore sequencing offers advantages in speed of detection over MiSeq. Here, we report a real-time analytical pathway for Nanopore sequence data, designed for detecting bacterial composition of prosthetic joint infections but potentially useful for any microbial sequencing, and compare detection by direct-from-clinical-sample metagenomic nanopore sequencing with Illumina sequencing and standard microbiological diagnostic techniques.
RESULTS: DNA was extracted from the sonication fluids of seven explanted orthopaedic devices, and additionally from two culture negative controls, and was sequenced on the Oxford Nanopore Technologies MinION platform. A specific analysis pipeline was assembled to overcome the challenges of identifying the true infecting pathogen, given high levels of host contamination and unavoidable background lab and kit contamination. The majority of DNA classified (> 90%) was host contamination and discarded. Using negative control filtering thresholds, the species identified corresponded with both routine microbiological diagnosis and MiSeq results. By analysing sequences in real time, causes of infection were robustly detected within minutes from initiation of sequencing.
CONCLUSIONS: We demonstrate a novel, scalable pipeline for real-time analysis of MinION sequence data and use of this pipeline to show initial proof of concept that metagenomic MinION sequencing can provide rapid, accurate diagnosis for prosthetic joint infections. The high proportion of human DNA in prosthetic joint infection extracts prevents full genome analysis from complete coverage, and methods to reduce this could increase genome depth and allow antimicrobial resistance profiling. The nine samples sequenced in this pilot study have shown a proof of concept for sequencing and analysis that will enable us to investigate further sequencing to improve specificity and sensitivity.

Entities: Chemical Disease Species

Keywords: Clinical; Device-related infection; Metagenomics; Nanopore; Prosthetic joint infection; Real-time

Mesh：

Substances：
DNA, Bacterial

Year: 2018 PMID： 30261842 PMCID： PMC6161345 DOI： 10.1186/s12864-018-5094-y

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

Infection remains a feared and devastating complication of orthopaedic implant surgery. It occurs in up to 2% of prosthetic joint replacements [1] and may present several years after implantation [2]. Recent studies in England of joint revisions undertaken for infection report an increase in prevalence for both knee and hip revisions between 2003 and 2014 (2.5-fold and 7.5-fold and 2.3-fold and 3.0-fold increase following primary and revision knee and hip replacements respectively [3, 4]). It has been estimated that in the USA, there will be more than 65,500 infected joint replacements per year by 2020 [5]. Improvements in speed and accuracy of diagnosis may improve outcomes following revision surgery by allowing more targeted therapy. PJI diagnosis can be challenging as infections may be associated with biofilms that colonise the orthopaedic devices [6], with a small but potentially problematic number caused by fastidious or slow-growing organisms that are not detectable by culture or from patients who have received prior antibiotics. Although culture of multiple periprosthetic tissue (PPT) samples remains the gold standard for microbial detection, it is relatively insensitive, with only approximately 65% of causative bacteria detected even when multiple PPT samples are collected [7-9]. Development of molecular methods, such as 16 s rRNA sequencing, can be more sensitive in detection of PJI [10]. An alternative is the use of metagenomic shotgun sequencing that can detect full bacteria genomes directly from a sample. Sequencing directly from samples can provide accurate diagnostic information for PJIs when compared to laboratory culture and can also detect additional organisms [11, 12] and potentially provide additional information such as presence of antimicrobial resistance genes [12]. Using third generation sequencing technology, developed by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), longer read lengths in faster turnarounds are possible. The ONT MinION potentially could allow analysis to be conducted in real-time with obvious advantages to clinical diagnosis of infection. Examples of metagenomic pathogen studies using MinION include viral detection from serum [13] and bacteria from urines [12]. These previous studies have shown proof-of- principle for direct from sample clinical sequencing using ONT MinION. However, PJI sequencing has a further challenge of high human DNA contamination which require specific laboratory preparation and bioinformatic analyses to overcome. A previous study using ONT MinION sequencing to identify pathogens within highly human DNA contaminated pleura effusion samples used 16 s rDNA sequencing [14]. This proved quick identification was possible in high host DNA samples but could not provide further genomic information. Here we describe proof-of-principle for the use of ONT MinION sequencing for the diagnosis of PJI when compared to standard microbiological culture and Illumina sequencing. We describe an analysis work-flow that differentiates between predicted infection species and background contamination and can be run during sequencing for real-time species detection.

Methods

Samples

Samples used in this study were collected by the Bone Infection Unit at the Nuffield Orthopaedic Centre (NOC) in Oxford University Hospitals (OUH), UK, as previously described [11]. Nine samples previously assessed by Illumina MiSeq sequencing were chosen for further analysis by ONT MinION sequencing. Samples were chosen from the remaining DNA extracts that had sufficient DNA to either be sequenced directly, or amplified and sequenced, and to represent a range of disparate species and compositions.

DNA preparation and sequencing

Libraries were prepared for sequencing on an Oxford Nanopore MinION (Oxford Nanopore Technologies (ONT)) using genomic DNA previously extracted from sonication fluid samples [11]. Samples 259, 312, 335, 352 and 354 were prepared using the 1D genomic DNA by ligation protocol (SQK-LSK108) (ONT). Samples 229, 249, 506 and 509 had insufficient DNA for this protocol so were prepared using either a PCR-based protocol for low input genomic DNA with modified primers (DP006_revB_14Aug2015), followed by rapid sequencing adapter ligation (ONT) (sample 229) or the 1D low input genomic DNA with PCR protocol (SQK-LSK108) (ONT) (samples 249, 506 and 509). Briefly, the protocols comprise DNA end-repair and dA-tailing (NEBNext Ultra II End Repair/dA-Tailing Module, New England Biolabs (NEB), Ipswich, MA, USA) followed by purification using AMPure XP solid phase reversible immobilisation (SPRI) beads (Beckman Coulter, High Wycombe, UK); Sequencing adapter ligation (Blunt/TA Ligase Master Mix, NEB) followed by additional SPRI bead purification. For the samples with insufficient DNA requiring PCR amplification, additional steps between end-repair and sequencing adapter ligation included; PCR adapter ligation (Blunt/TA Ligase Master Mix, NEB) followed by SPRI bead purification; PCR amplification (Phusion High Fidelity PCR Master Mix, NEB) with 18 cycles (samples 229 and 249) or 24 cycles (samples 506 and 509) followed by additional SPRI bead purification. Samples were sequenced on FLO-MIN105 (v.R9) (sample 229) or FLO-MIN106 (v.R9.4) (all other samples) SpotON flowcells.

PCR analysis of sample 354a

Quantitative real-time PCR (q-PCR) was performed for sample 354a to determine relative amounts of both Arcanobacterium haemolyticum and Fusobacterium nucleatum DNA in the original sonication fluid genomic DNA extract. qPCR was performed on a Stratagene MX3005P QPCR System (Agilent Technologies, Santa Clara, CA, USA) using Luna Universal Probe qPCR Master Mix (New England Biolabs, Ipswich, MA, USA). For A. haemolyticum, primers and probe were designed to target the phospholipase D gene: forward primer ATGTACGACGATGAAGACGCG (previously published, [15]), reverse primer TTGATTGCGTCATCGACACT, probe [6FAM]-TTGGTAGTGCGGCTGCTGCGCC-[TAM]. For F. nucleatum, primers and probe were designed to target the nusG gene: forward primer CAGCAACTTGTCCTTCTTGATCC, reverse primer CTGGATTTGTAGGAGTTGGTTC, probe [6FAM]-AGACCCTATTCCTATGGAAGAGGAAGAAGTA-[TAM]. Reactions were performed in 20 μl with 2ul of template DNA, 0.4 μM of each primer and 0.2 μM of the probe. Cycling conditions were an initial denaturation at 95 °C for 1 min, followed by 40 cycles of 95 °C denaturation for 15 s and 60 °C extension for 30 s. Genomic DNA, extracted from cultures of A. haemolyticum (Type Strain NCTC 8452) and F. nucleatum subspecies vincentii (Type Strain ATCC 49256), was diluted to 100,000 genome copies per μl then serially diluted to 10 genome copies per μl and used to create copy number standard curves for both species. Negative controls, replacing template DNA with water, were also performed. All reactions were performed in triplicate.

Bioinformatics analysis

We assembled an analysis pipeline for detection of bacterial pathogens using ONT MinION sequencing of orthopaedic device infections. The pipeline includes filtering steps for the genetic sequence data that have been tuned on seven positive samples with known infections and two culture negative samples. The analysis was performed within a Nextflow workflow [16] with the software contained within a Singularity [17] image generated from a Docker repository [18]. This workflow and software are available for public use, [19], with our intention for the analysis to be reproducible or replicable with other datasets on most systems. The workflow, CRuMPIT, has three major components, as shown in Fig. 1. The first monitors the output of a MinION device or devices and creates batches of fast5 files (default 1000) as they are written to a storage drive location, Fig. 1 (a,b). The second receives the fast5 files and uses a Nextflow workflow that basecalls data to be classified and aligns them to specific reference sequences with results pushed to a database, Fig. 1(c). Thirdly, analysis results including species identified, are determined and continually updated as the run progresses, Fig. 1(d).

Fig. 1

Diagram of analysis process. a MinION sequencing using MinKNOW (runs outside of CRuMPIT). b Fast5 files are detected and submitted as batches for the Nextflow workflow. c Nextflow workflow which is contained within a singularity image and can be distributed across a cluster (SLURM used here) or on a local machine. d Run analysis using data pushed to a MongoDB database, this can be conducted separately on any machine with network access to the database. Each component (green or blue rounded rectangle) of CRuMPIT can be run independently from the same or different networked computers, (e) or the entire process can be run from a single program. Square rectangles represent programs, some of which are within python wrappers. Arrows represent direction of data transfer within the workflow or between componants During the progression of this project, ONT have released several different software applications for basecalling, with each version improving accuracy [20]; we used the most up to date and reliable version at the time of sequencing. Basecalling from the fast5 files used different versions of either Metrichor (dragonet), MinKNOW-Live or ONT Albacore, Table 1. Fastq files were generated from the Metrichor or MinKNOW basecalled fast5 files using fast5watcher.py (commit b88e14a) [21] for downstream analysis. Albacore is now used as the basecaller within the CRuMPIT workflow, with sequences basecalled directly to fastq files for analysis. Experimental use of Guppy (ONT developer access required, version 0.3.0) as a basecaller was performed to compare speeds. An additional Porechop [22] (v0.2.3) step for de multiplexing barcodes was added for use with Guppy.

Table 1

Nanopore basecallers and versions used for each sample

Sample	Basecaller	Software version
229a	Metrichor (dragonet)	1.22.4
249a	MinKNOW-Live-Basecalling	1.4.3
259a	MinKNOW-Live-Basecalling	1.3.30
312a	Metrichor (dragonet)	1.23.0
335a	Metrichor (dragonet)	1.23.0
352a	MinKNOW-Live-Basecalling	1.1.21
354a	MinKNOW-Live-Basecalling	1.1.20
509a	ONT Albacore Sequencing Software	1.1.0
506a	ONT Albacore Sequencing Software	1.1.0

Nanopore basecallers and versions used for each sample To minimise spurious read classifications caused by repeat regions, sequences within the fastq files were separated based on molecular complexity, with only high complexity reads analysed further. Complexity was calculated using a dust score threshold of seven with prinseq-lite-0.20.4 [23] which removes reads containing sequences consisting only of homopolymer, dipolymer and triploymer repeats. Centrifuge [24] was used to classify sequencing reads to a taxonomic identifier. We used Centrifuge instead of Kraken [25] for this analysis because the initial starting match uses kmers of length 16, which is more suited to the Nanopore error profile compared to Kraken where databases are built with a default kmer size of 31. Additionally, the Centrifuge indexes require significantly less storage and memory compared to Kraken. A Centrifuge index [24] was constructed using bacterial and viral genomes downloaded from NCBI RefSeq as of 03-March-2017, and the human reference genome (GRCh38). Low complexity regions with a dust score greater than 20 in the reference sequences were masked using dustmasker (v 1.0.0, NCBI). Alternatively, the precompiled “P_compressed_b + v + h” available to download from the Centrifuge authors was also used, yielding very similar results to our database. We used our database for this analysis because it is a more recent and complete dataset. However, for ease of reproducibility, the precompiled databases can also be used. Sequences with a taxonomic id, or a descendant, that belonged to a list of bacterial reference genome sequences downloaded from NCBI RefSeq, were mapped using minimap2 [26] (v2.2-r409). To be considered for detection, bacterial species were first classified by Centrifuge with a score of 150 or greater with over 10% of the classified bacterial bases. The score of 150 was chosen as a suitable cutoff after several thresholds were tested, Additional file 1: Figure S1. To remove spurious hits and background lab contamination, species were reported if they accounted for over 10% of the classified bacterial bases by Centrifuge which also removed the majority of negative control hits, Additional file 2: Figure S2. Alternatively, a read number threshold could have been chosen, however the margin of proportional read numbers was deemed too narrow between positive samples and negative controls. Therefore, a further mapping step was added to validate the Centrifuge classification. To be confirmed as a positive the mapped reads required a mapping quality score (mapq) of 50 or above and had to account for greater than 1% of the classified bacterial bases. Mapq 50 was used to ensure high quality alignments and helped to remove any remaining indiscriminate alignments, Additional file 3: Figure S3. The 1% bases threshold was used after plotting bases over reads for positive samples and negative controls, Additional file 4: Figure S4. However, if a detection species meets these criteria, the mapped reads can have any Centrifuge score and are included in further analysis. Therefore, more reads can be included if mapping provides satisfactory alignment over Centrifuge classification. This filtering method was tuned to remove all hits from the negative controls but leave as many validated positive detection species reads as possible. It is therefore a heuristic method and can be tuned with greater power when more samples have been processed. The entire workflow was run in Nextflow [16] with the software contained inside a Singularity [17] image. This has enabled the entire pipeline to run on a distributed cluster (SLURM [27]) with the flexibility to run on other platforms including locally on a single computer. A SLURM cluster was setup and used to handle the high computational demands of basecalling with Albacore, with the remaining pipeline requiring less computer time to complete. The cluster setup was built from a head node and four worker nodes with a total of 21 worker cores. Centrifuge was only run on two of the nodes, each with at least 16gb of memory. The workflow can be run in real time and detect new fast5 files from a MinION sequencing run, process them and push the data to a MongoDB database for analysis.

Results

Sample composition after analysis

Nine samples previously sequenced with an Illumina MiSeq were sequenced using the Oxford Nanopore MinION platform. Seven samples were extracted from bacterial culture positive sonication fluid. The remaining two samples, extracted from culture negative sonication fluid, were used as negative controls. Between 0.2 and 2.8 gigabases were basecalled for each sequencing run, with read lengths averaging between 500 bp and 1.7 kb (Table 2).

Table 2

Oxford nanopore technologies MinION sequencing yields and basic details and breakdown of centrifuge classification

	Total		Mean	Median	Low complexity		Human		Bacteria
Sample	Bases	Reads	read length	read length	bases	reads	bases	reads	bases	reads
229a	204,346,556	124,218	1645.06	1745	113,836	69	198,972,861	117,821	1,692,097	914
249a	723,925,562	585,098	1237.27	1006	403,668	370	563,888,189	411,612	44,502,912	34,773
259a	1,057,865,247	600,291	1762.25	1321	390,209	289	949,663,786	502,426	512,827	312
312a	1,121,119,742	1,004,818	1115.74	674	1,905,423	1044	1,038,235,876	882,763	30,426,198	14,948
335a	2,847,687,425	1,717,810	1657.74	1171	2,835,054	1362	2,783,128,118	1,605,466	1,388,748	989
352a	803,638,340	986,867	814.33	609	567,656	630	669,796,136	752,022	459,779	579
354a	706,380,170	945,929	746.76	596	680,560	848	570,485,740	717,662	2,151,551	2443
509a	2,740,060,527	4,940,241	554.64	439	16,355,839	24,413	1,199,779,866	1,352,438	6,240,425	2628
506a	2,451,399,949	4,700,013	521.57	431	20,014,343	23,631	1,161,796,584	1,671,726	4,705,919	2139

Bacteria, Human with a centrifuge score greater than 150, and total reads including unclassified reads. Samples 509a and 506a are culture negatives and used as negative controls. Results are after removing low complexity reads

Oxford nanopore technologies MinION sequencing yields and basic details and breakdown of centrifuge classification Bacteria, Human with a centrifuge score greater than 150, and total reads including unclassified reads. Samples 509a and 506a are culture negatives and used as negative controls. Results are after removing low complexity reads The majority of classified reads were human, Table 2, with a range of 80% to 97% of bases in the sequenced culture positive samples coming from host contamination. A range of 0.04% to over 6% of bases were classified as bacterial by Centrifuge in the culture positive samples, Table 2. Our analysis workflow identified one or more bacterial species per sample, with the exception of the two culture negative samples, 509a and 506a (Table 3). One sample, 354a, was polymicrobial, with Enterococcus faecalis, Arcanobacterium haemolyticum and Fusobacterium nucleatum identified. Two species of the same genus, Bacillus cereus and Bacillus thuringiensis, were identified in sample 352a. All other samples had only a single bacterial species identified.

Table 3

Species detected after read classification and reference genome alignment in CRuMPIT

Sample	ONT minion species	TaxID	Mapped reads (% of identified bacterial)	Mapped bases (% of identified bacterial)	Sonication species	Tissue culture species	MiSeq reads (% of bacterial)
229a	Staphylococcus aureus	1280	815 (89)	1,912,820 (113)	S. aureus	S. aureus	6038 (98)
249a	Cutibacterium acnes	1747	23,500 (68)	29,443,269 (66)	P. acnes	P. acnes	108,940 (100)
259a	Staphylococcus epidermidis	1282	155 (50)	223,611 (44)	S. epidermidis	S. epidermidis	749 (86)
312a	Citrobacter koseri	545	11,629 (78)	24,631,203 (81)	C. koseri	C. koseri	221,516 (95)
335a	Morganella morganii	582	613 (62)	515,991 (37)	M. morganii	M. morganii	3555 (94)
352a	Bacillus thuringiensis	1428	41 (7)	27,026 (6)	Bacillus species	Bacillus species	1109 (86*)
352a	Bacillus cereus	1396	119 (21)	85,627 (19)	Bacillus species	Bacillus species	1109 (86*)
354a	Arcanobacterium haemolyticum	28,264	584 (24)	547,413 (25)	A. haemolyticum		11,182 (72)
	Fusobacterium nucleatum	851	529 (22)	493,717 (23)			1156 (7)
	Enterococcus faecalis	1351	225 (9)	223,665 (10)	E. faecalis	E. faecalis	1173 (8)
506a	Non detected				No growth	No growth	Non detected
509a	Non detected				No growth	No growth	Non detected

Samples 509a and 506a are culture negatives and used as negative controls, no bacterial species were detected after filtering thresholds were used. Species detected from sonication fluid, tissue culture and MiSeq sequence analysis using Kraken. Adapted from [11]. (*) indicates % of bacterial reads taken from the Bacillus cereus group level (taxonomic id of 86,661)

Species detected after read classification and reference genome alignment in CRuMPIT Samples 509a and 506a are culture negatives and used as negative controls, no bacterial species were detected after filtering thresholds were used. Species detected from sonication fluid, tissue culture and MiSeq sequence analysis using Kraken. Adapted from [11]. (*) indicates % of bacterial reads taken from the Bacillus cereus group level (taxonomic id of 86,661) The results from ONT MinION sequencing correspond with previously published analysis of the same samples by conventional microbiology culture and metagenomic Illumina MiSeq sequencing, Table 2 [11]. A notable difference between the two molecular analyses can be seen in sample 352a, where ONT MinION sequencing enabled species level detection. The Illumina short read sequencing identified Bacillus spp. only (agreeing with the corresponding culture results) whereas ONT MinION sequencing identified two species from the Bacillus cereus group: Bacillus cereus and Bacillus thuringiensis. It is worth noting that speciation within the Bacillus cerus group is problematic as species within this group share a high level of genome sequence identity [28]. Further investigation would be required to determine whether both species are actually present in this sample. Another difference observed between the two sequencing techniques is in sample 354a, and concerns the relative abundance of sequencing reads/bases for the multiple species classified in this polymicrobial sample. The Illumina MiSeq sequencing identified A. haemolyticum as the most abundant species, at 72% of bacterial reads, with F. nucleatum representing 7% of bacterial reads. However, ONT MinION sequencing classified very similar base numbers for both F. nucleatum and A. haemolyticum (493,717 and 547,413 bases respectively) We speculated that this observed difference in proportions of reads for the F. nucleatum and A. haemolyticum was caused by platform sequencing bias, possibly as a result of variable genome GC content: The A. haemolyticum genome is 54% GC, compared to 27% for F. nucleatum. We used qPCR to test our hypothesis, and investigate which platform represents an estimate of genome abundance of these two species that is closest to the original DNA extract from sample 354a. qPCR results detected approximately equal copy numbers of both A. haemolyticum and F. nucleatum genomes in the original DNA extract, suggesting that ONT MinION sequencing has given a more accurate representation of species abundance in sample 354a, Table 4. However, standard deviations were high therefore further investigation will be needed to confirm this.

Table 4

qPCR results

Species	Std curve RSq	Efficiency	Replicate	CT	Copies	Average ± Std Dev
Arcanobacterium haemolyticum	0.991	89.20%	1	29.12	2356	3214 ± 965
			2	28.72	3028
			3	28.19	4258
Fusobacterium nucleatum	0.999	86.00%	1	28.93	4269	3421 ± 1304
			2	30.22	1919
			3	29.01	4075

qPCR results

Real time analysis

Using the ONT MinION platform, it was possible to analyse sequences in real-time, and predict the species composition of culture positive samples minutes after data acquisition. Samples containing a larger yield of bacterial DNA, such as 354a and 249a, produced several hundred kilobases of sequences within the first two of hours, Fig. 2a, b. Samples with lower yields, such as 352a, produced less sequence data, with several kilobases generated in the first 2 hours, Fig. 2c. For all the species identified that passed the analysis thresholds, however, the sequences generated after data acquisition were consistent with the species identified by traditional culture methods and MiSeq sequencing, Fig. 3. Each batch analysed within the Nextflow workflow took between four and fifteen minutes to process using a single core, depending on which node the job was submitted to, Additional file 5: Figure S5a. Therefore, real-time in this context needs to include this bioinformatics analysis time, the majority of which is basecalling. Encouragingly, basecalling speed was improved dramatically by using Guppy and utilising the graphics card of a single local PC, Additional file 5: Figure S5b. This enabled CRuMPIT analysis to be fully conducted on a single computer and time to detection more than halved.

Fig. 2

Fig. 3

Percentage of mapped bases (minimap2) to total centrifuge classified bacterial bases over the first two hours of sequencing. As with Fig. 2, each marker on the plots represents a new sequence classified. Times are on the day of sequencing. Three samples shown showcasing the best and worst performers. a Sample 354a containing three different species. b Sample 249a containing Cutibacterium acne. c Sample 352a containing two different Bacillus species

Cumulative bases classified by Centrifuge and minimap2 reference alignment over the first few hours of sequencing on the MinION. Each marker on the plots represents a new sequence classified. Times are on the day of sequencing and taken from the read timestamp and doesn’t include bioinformatic time. Three samples shown showcasing the best and worst performers. a Sample 354a containing three different species. b Sample 249a containing Cutibacterium acne. c Sample 352a containing two different Bacillus species Percentage of mapped bases (minimap2) to total centrifuge classified bacterial bases over the first two hours of sequencing. As with Fig. 2, each marker on the plots represents a new sequence classified. Times are on the day of sequencing. Three samples shown showcasing the best and worst performers. a Sample 354a containing three different species. b Sample 249a containing Cutibacterium acne. c Sample 352a containing two different Bacillus species

Discussion

Here we demonstrate proof-of-principle that long-read sequencing using the ONT MinION can detect bacterial infections from DNA extracted directly from sonication fluid samples, and potentially do so within minutes of starting sequencing. If DNA extraction techniques can be similarly optimised, these technologies have the potential to make intra-operative diagnosis of the causes of specific infections possible. This would allow both local and systemic antibiotics to be targeted to the causative organisms in prosthetic joint infections, starting at the time of surgery. Analysis of the MinION data indicates concordance with the current gold standard laboratory culture and also Illumina short-read sequencing. In addition, we present a new analytical tool, CRuMPIT, which automates analysis of MinION data in this setting, and could be applied by other researchers and clinicians. By using negative controls we were able to determine signatures of background contamination - a challenge to diagnostic metagenomic interpretation [11, 29]. The thresholds and scores used within our bioinformatics workflow were determined after sequencing two negative controls that allow us to create heuristic thresholds to remove background sequences from kit contamination and false positives without masking the infection species. It will be important to determine the limits of detection for bacterial DNA in high host contaminated samples. Future studies will involve sequencing more samples and-spiked in references so refined threshold scores can be determined. This can be done as before with a Youden Index and J-statistic [11]. Sensitivity and specificity of MinION cannot be determined from this study and therefore further, more extensive studies are required before use in a routine diagnostic microbiology laboratory can be recommended. Although we were able to predict each species present within the sequenced samples, the vast majority of DNA sequenced was human, from host contamination, despite efforts to reduce this in the laboratory preparation. Depletion of host DNA contamination will facilitate greater pathogen genome sequencing coverage but this continues to present challenges as the numbers of bacterial cells in joint infections is low [7] in relation to human cells. Previous studies with ONT MinION on direct clinical samples have used samples with relatively high concentrations of bacteria in urine [30] (compared to PJI samples) or moderate to high viral titres in blood [13]. The MinION has also been used for metagenomics in environmental samples [31]. However, reduction of human DNA could allow better genotyping, transmission analysis and antimicrobial resistance gene prediction as the proportion of bacterial DNA increases. Currently, this depends on laboratory development to reduce the number of human cells in samples rather than downstream bioinformatic analysis. The sequencing yields here were low compared to other ONT MinION sequencing yields sequenced within the same lab (data not shown). DNA read lengths sequenced in this project are also relatively short, with the average under 1 kilobase, where mean read lengths can be expected over 10 kilobases with this method. This is likely due to the DNA extraction methods used, as they were optimised for MiSeq sequencing. However, of the four samples processed by PCR due to low DNA concentration, there was variation in read length and depth ranging from highest to lowest. There are known biases for organisms associated with GC content in using PCR-based methods for sample preparation [32] and with Illumina metagenomic data [33]. We found some evidence that MinION sequencing may better reflect the relative abundance of pathogen DNA in polymicrobial infections, as it appeared less prone to GC biases than Illumina MiSeq short-read sequencing. Detection of the species was possible within minutes of the sequencing run starting, and this includes the time required to process the sequencing data, with basecalling being the biggest bottleneck. The fast5 file batch size has an effect on turnaround time and reducing batch sizes is preferable for longer reads that take more time to basecall. We have tested the pipeline on a single PC and on a SLURM cluster on the same network as the computers running the MinION sequencers, enabling us to scale to the rate of sequencing and basecall with greater throughput than we could with a single machine, and analyse multiple sequencing runs in parallel. A limitation of this study was seen in runs where reads were live basecalled with the MinKNOW basecaller: the runs produced data too quickly for the system to keep up. Retrospective basecalling was not possible at the time and the skipped reads have since been discarded. Therefore, in future studies using Albacore, as is the case with the most recent two sequencing runs (506a and 509a), we expect the average DNA yields to increase, which will aid species classification and potential genome completion. The ONT MinION sequencing process has undergone continual development with substantial improvements since this project began. Therefore, we have used three different basecallers, Metrichor, MinKNOW and Albacore, for converting the raw signal or event data to DNA sequences. It is possible to rebasecall some of this data, but as we no longer have access to some sample raw data files, we cannot rebasecall all the samples. Also, as this would not reflect the real-time analysis carried out, we have not rebasecalled all samples with the same software version. Future studies should continue to use the most accurate, current, and efficient basecaller for real-time analysis. Furthermore, as ONT routinely updates protocols and computational tools, the impact on clinical diagnostics would need to be constantly evaluated and tested to achieve and maintain accreditation. Although analysis of the sequencing is close to real-time, the DNA extraction and library preparation takes several hours, with 1D ligation preparation currently taking approximately 70 min or PCR amplification taking 150 mins. There are rapid library preparation kits available, however we feel the sequencing yield is currently too low for these to be a viable route to detection of pathogens directly from samples, particularly in samples with high host contamination. In addition, future studies will need to replicate samples to show this process is reproducible. This project was a proof of concept, but to be cost effective in the future, multiplexing of samples, smaller cheaper flowcells or reusable/washable flowcells may need to be employed.

Conclusions

The study shows reliable detection of infection species composition in prosthetic joint infections using ONT MinION sequencing. This represents proof of concept for utilising real time ONT MinION sequencing for PJI diagnostics. The speed of detection indicates that this technology has the potential to deliver results to the clinician in a timelier manner than traditional microbiological methods. Reduction of diagnostic time could have a significant positive influence on patient outcome, allowing prompt, targeted antimicrobial therapy. The development of a reproducible workflow, as described in this study, has potential use for any clinical sample metagenomic ONT MinION sequencing, not just sonication fluids. The software used for analysis is provided [19] and can be installed and run locally or in a distributed cluster to scale with throughput. The controlling and analysis of the workflow is written in a python3 wrapper that relies on open source tools including, pysam [34], Biopython [35], Pandas [36], Matplotlib [37], ETE3 toolkit [38] and Numpy [39]. Figure S1. Bases classified total or target over centrifuge score. Each sample has two lines of the same colour. The top line is total bacterial bases identified by centrifuge over the score threshold used. The second lower line is the validated detected species/infection for the sample (Target). As the score threshold increases, the number of total classified bases reduces at a great rate than the target bases, until a plateau and diminishing returns at approximately 150. (PDF 15 kb) Figure S2. Each species identified by centrifuge showing total bases over number of reads as proportions of total bacterial bases and total bacterial reads respectively. Species detections below the 0.1 proportion (i.e. less than 10%) of bases threshold are dots and species detections above the 0.1 proportion threshold are crosses. Culture negative controls are red and Culture negative positive samples are blue. (PDF 25 kb) Figure S3. Indiscriminate(indis) read and discriminate(dis) mapping qualities. Quality scores taken from mapping all reads to a reference with minimap2. Discriminate scores are from reads that have passed through the pipeline filtering thresholds and are determined to be reads specific to the reference. The indiscriminate are other reads that were likely to be host and/or contamination. (PDF 12 kb) Figure S4. Each species identified by minimap2 mapping showing total bases over number of reads as proportions of total bacterial bases (centrifuge) and total bacterial reads (centrifuge) respectively. Species detections below the 0.1 proportion (i.e. less than 1%) of bases threshold are dots and species detections above the 0.01 proportion threshold are crosses. Culture negative controls are red and Culture negative positive samples are blue. Shows shortened axis of below threshold hits. (PDF 13 kb) Figure S5. Batch job duration times in minutes sample report taken from Nextflow. Using sample 354a as a representative for the bioinformatic analysis. (A) Batches were run over a heterogeneous SLURM cluster with variable node CPU speeds affecting Albacore performance. (B) Batches were run on a single machine with an Nvidia GTX 1050ti graphics card using guppy v0.3.0 for basecalling. (PDF 280 kb)

28 in total

1. Role of universal 16S rRNA gene PCR and sequencing in diagnosis of prosthetic joint infection.

Authors: M Marín; J M Garcia-Lechuz; P Alonso; M Villanueva; L Alcalá; M Gimeno; E Cercenado; M Sánchez-Somolinos; C Radice; E Bouza
Journal: J Clin Microbiol Date: 2011-12-14 Impact factor: 5.948

2. Nextflow enables reproducible computational workflows.

Authors: Paolo Di Tommaso; Maria Chatzou; Evan W Floden; Pablo Prieto Barja; Emilio Palumbo; Cedric Notredame
Journal: Nat Biotechnol Date: 2017-04-11 Impact factor: 54.908

3. Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines by nanopore-based metagenomic sequencing.

Authors: K Schmidt; S Mwaigwisya; L C Crossman; M Doumith; D Munroe; C Pires; A M Khan; N Woodford; N J Saunders; J Wain; J O'Grady; D M Livermore
Journal: J Antimicrob Chemother Date: 2016-09-25 Impact factor: 5.790

4. Prospective evaluation of criteria for microbiological diagnosis of prosthetic-joint infection at revision arthroplasty. The OSIRIS Collaborative Study Group.

Authors: B L Atkins; N Athanasou; J J Deeks; D W Crook; H Simpson; T E Peto; P McLardy-Smith; A R Berendt
Journal: J Clin Microbiol Date: 1998-10 Impact factor: 5.948

Review 5. Influence of material on the development of device-associated infections.

Authors: E T J Rochford; R G Richards; T F Moriarty
Journal: Clin Microbiol Infect Date: 2012-08-27 Impact factor: 8.067

6. Phenotypic and genotypic characterization of Arcanobacterium haemolyticum isolates from infections of horses.

Authors: Abdulwahed Ahmed Hassan; Hivda Ulbegi-Mohyla; Talah Kanbar; Jörg Alber; Christoph Lämmler; Amir Abdulmawjood; Michael Zschöck; Reinhard Weiss
Journal: J Clin Microbiol Date: 2008-11-19 Impact factor: 5.948

7. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data.

Authors: Melanie Schirmer; Rosalinda D'Amore; Umer Z Ijaz; Neil Hall; Christopher Quince
Journal: BMC Bioinformatics Date: 2016-03-11 Impact factor: 3.169

8. Centrifuge: rapid and sensitive classification of metagenomic sequences.

Authors: Daehwan Kim; Li Song; Florian P Breitwieser; Steven L Salzberg
Journal: Genome Res Date: 2016-10-17 Impact factor: 9.043

9. Singularity: Scientific containers for mobility of compute.

Authors: Gregory M Kurtzer; Vanessa Sochat; Michael W Bauer
Journal: PLoS One Date: 2017-05-11 Impact factor: 3.240

10. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis.

Authors: Alexander L Greninger; Samia N Naccache; Scot Federman; Guixia Yu; Placide Mbala; Vanessa Bres; Doug Stryke; Jerome Bouquet; Sneha Somasekar; Jeffrey M Linnen; Roger Dodd; Prime Mulembakani; Bradley S Schneider; Jean-Jacques Muyembe-Tamfum; Susan L Stramer; Charles Y Chiu
Journal: Genome Med Date: 2015-09-29 Impact factor: 11.117

42 in total

Review 1. Third-Generation Sequencing in the Clinical Laboratory: Exploring the Advantages and Challenges of Nanopore Sequencing.

Authors: Lauren M Petersen; Isabella W Martin; Wayne E Moschetti; Colleen M Kershaw; Gregory J Tsongalis
Journal: J Clin Microbiol Date: 2019-12-23 Impact factor: 5.948

Review 2. Nanopore sequencing technology, bioinformatics and applications.

Authors: Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal: Nat Biotechnol Date: 2021-11-08 Impact factor: 54.908

3. Rapid Detection of Bacterial Pathogens and Antimicrobial Resistance Genes in Clinical Urine Samples With Urinary Tract Infection by Metagenomic Nanopore Sequencing.

Authors: Lei Zhang; Wenhua Huang; Shengwei Zhang; Qian Li; Ye Wang; Ting Chen; Hua Jiang; Decong Kong; Qingyu Lv; Yuling Zheng; Yuhao Ren; Peng Liu; Yongqiang Jiang; Ying Chen
Journal: Front Microbiol Date: 2022-05-17 Impact factor: 6.064

Review 4. Metagenomics to Identify Pathogens in Diabetic Foot Ulcers and the Potential Impact for Clinical Care.

Authors: Brian M Schmidt; John Erb-Downward; Piyush Ranjan; Robert Dickson
Journal: Curr Diab Rep Date: 2021-06-21 Impact factor: 4.810

5. NanoSPC: a scalable, portable, cloud compatible viral nanopore metagenomic data processing pipeline.

Authors: Yifei Xu; Fan Yang-Turner; Denis Volk; Derrick Crook
Journal: Nucleic Acids Res Date: 2020-07-02 Impact factor: 16.971

6. Metagenomic Diagnosis for a Culture-Negative Sample From a Patient With Severe Pneumonia by Nanopore and Next-Generation Sequencing.

Authors: Kaiying Wang; Peihan Li; Yanfeng Lin; Hongbin Chen; Lang Yang; Jinhui Li; Tingyan Zhang; Qichao Chen; Zhonghong Li; Xinying Du; Yusen Zhou; Peng Li; Hui Wang; Hongbin Song
Journal: Front Cell Infect Microbiol Date: 2020-05-05 Impact factor: 5.293

7. Mobile antibiotic resistome in wastewater treatment plants revealed by Nanopore metagenomic sequencing.

Authors: You Che; Yu Xia; Lei Liu; An-Dong Li; Yu Yang; Tong Zhang
Journal: Microbiome Date: 2019-03-21 Impact factor: 14.650

8. Rapid bacterial identification by direct PCR amplification of 16S rRNA genes using the MinION™ nanopore sequencer.

Authors: Shinichi Kai; Yoshiyuki Matsuo; So Nakagawa; Kirill Kryukov; Shino Matsukawa; Hiromasa Tanaka; Teppei Iwai; Tadashi Imanishi; Kiichi Hirota
Journal: FEBS Open Bio Date: 2019-01-29 Impact factor: 2.693

9. Ultra-deep, long-read nanopore sequencing of mock microbial community standards.

Authors: Samuel M Nicholls; Joshua C Quick; Shuiquan Tang; Nicholas J Loman
Journal: Gigascience Date: 2019-05-01 Impact factor: 6.524

10. Nanopore metagenomic sequencing of influenza virus directly from respiratory samples: diagnosis, drug resistance and nosocomial transmission, United Kingdom, 2018/19 influenza season.

Authors: Yifei Xu; Kuiama Lewandowski; Louise O Downs; James Kavanagh; Thomas Hender; Sheila Lumley; Katie Jeffery; Dona Foster; Nicholas D Sanderson; Ali Vaughan; Marcus Morgan; Richard Vipond; Miles Carroll; Timothy Peto; Derrick Crook; A Sarah Walker; Philippa C Matthews; Steven T Pullan
Journal: Euro Surveill Date: 2021-07