Literature DB >> 26709640

Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing.

Edward T Mee¹, Mark D Preston², Philip D Minor³, Silke Schepelmann³.

Abstract

BACKGROUND: Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity.
METHODS: A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay.
RESULTS: Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4-14 laboratories. Six non-target viruses were detected by three or more laboratories.
CONCLUSION: The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Adventitious virus; Collaborative study; Deep sequencing; Reference material; Vaccine

Mesh：

Substances：

Year: 2015 PMID： 26709640 PMCID： PMC4823300 DOI： 10.1016/j.vaccine.2015.12.020

Source DB: PubMed Journal: Vaccine ISSN： 0264-410X Impact factor: 3.641

Introduction

Production of live viral vaccines on animal cell or egg substrates carries the risk of adventitious virus contamination of the final product [1], [2]. Testing for adventitious viruses is therefore an essential quality control step in the manufacture of vaccines and other biological medicines. Non-specific screening for adventitious viruses is partly based on animal tests which have served well for decades, but there are legal and ethical imperatives to replace such tests. Cell culture tests largely solve the ethical issues and are cheaper to perform, and recent efforts to compare sensitivity and specificity with animal tests have been promising [3]. Nevertheless, cell and animal tests are limited by the restricted tropism of some viruses and may not detect non-cytopathic, non-pathogenic or non-haemadsorbing viruses. For example, porcine circovirus (PCV) DNA was detected in two rotavirus vaccines [1], [4], [5] despite these routine adventitious virus tests showing no evidence of contamination. PCR-based tests offer sensitive and specific detection of their target pathogens, however screening for all potential viruses by PCR is impractical, and non-target viruses would remain undetected. Deep sequencing (DS, also referred to as massively parallel or high throughput sequencing) offers the potential for identification of extraneous nucleic acid in samples without a priori knowledge of the likely contaminant and without the requirement for propagation of the virus. Such methods have already been successfully applied to detection of adventitious agents in vaccines [1], cell lines [6], [7], serum [8], [9] and bioreactors [10] and multiple laboratory and informatics methods for viral metagenomics have been developed for clinical and other biological specimens [11], [12], [13], [14], [15], [16], [17]. There is substantial interest among vaccine manufacturers, contract research organisations, regulators and medicines control laboratories in evaluating the method for routine safety testing, and potentially replacing some or all of the existing in vitro and in vivo tests. A major challenge to the realisation of this potential is the identification of a robust, sensitive and specific assay design. A wide range of methods exist for viral metagenomics, many of which are early in their development: multiple options exist for generation of sequencing libraries; several commercial sequencing platforms exist, based on fundamentally different chemistry, with more in development; numerous bioinformatics pipelines are used for sequence classification, both academically and commercially developed; and the databases against which the reads are searched are constantly evolving. Given these parameters, it is important to have suitable reference materials to ensure that different methods generate comparable results. In addition to reagents for comparison of methods and determination of run performance, well-characterised materials of defined virus concentration will be required in order to determine limits of detection for particular viruses or virus types. We describe here a candidate material for qualitative comparison of methods and run performance and its evaluation in an international collaborative study encompassing 15 laboratories. The study highlighted that a broad range of laboratory and informatics techniques are in use, and no consensus exists on the most appropriate combination of methods to achieve maximum sensitivity. We discuss the major challenges for the incorporation of deep sequencing into adventitious agent testing workflows, highlight areas requiring particular attention and describe the requirements of future reference materials to enable validation and comparison of methods.

Methods

Aim and scope

The primary aim of the study was to evaluate the suitability of reagent 11/242-001 as a reference material for deep sequencing-based adventitious virus detection by comparing the results obtained from 15 independent laboratories using a variety of sample preparation, sequencing and informatics methods. Identifying the optimal processing parameters for each step of the process was not feasible given the large number of variables. This study did not aim to assess sensitivity of any particular method, nor the proficiency of the individual laboratories. An outline of this project was presented to the World Health Organisation (WHO) Expert Committee for Biological Standardisation (ECBS) at the 2013 meeting and the committee felt that the project could provide useful information on the value of the reference material and the merits of currently used methods [18].

Participants

Participants were identified through existing networks of contacts and via the Parenteral Drug Association (PDA)/Food and Drug Administration (FDA) Advanced Virus Detection Technologies Interest Group. Participants included vaccine manufacturers, contract research organisations, academic laboratories, regulatory agencies and medicines control laboratories with an interest in virus detection in biological medicines. A full list of participating laboratories is shown in Collaboration Group.

Material

An existing multiplex quantitative polymerase chain reaction (qPCR) run control reagent, 11/242-001, was available for the study. This reagent contains 25 viruses representing a range of common hazard group 2 human viruses (United Kingdom Advisory Committee on Dangerous Pathogens classification) with a variety of genome and envelope types (Table 1 ).

Table 1

Virus composition of multiplex reagent 11/242-001.

Group	Family	Envelope	Species/serotype	Genome size (kb)	PCR Ct value	Sample origin
dsDNA	Adenoviridae	No	Adenovirus 2	35.9	29.71	293 cell culture
			Adenovirus 41	34.2	ND	Clinical specimen
	Herpesviridae	Yes	Human herpesvirus 1	151.2	30.59	MRC5 cell culture
			Human herpesvirus 2	154.7	32.48	MRC5 cell culture
			Human herpesvirus 3 (VZV)	124.8	29.02	MeWo cell culture
			Human herpesvirus 4 (EBV)	171.7	31.27	B95-8 cell culture
			Human herpesvirus 5 (CMV)	233.7	28.95	MRC5 cell culture

dsRNA	Reoviridae	No	Rotavirus A	18.5	24.49	Clinical specimen

ssRNA (+)	Astroviridae	No	Astrovirus	6.8	30.53	Clinical specimen
	Caliciviridae	No	Norovirus GI	7.6	ND	Clinical specimen
			Norovirus GII	7.5	ND	Clinical specimen
			Sapovirus C12	7.5	33.37	Clinical specimen
	Coronaviridae	Yes	Coronavirus 229E	27.2	ND	MRC5 cell culture
	Picornaviridae	No	Coxsackievirus B4	7.4	30.72	Hep-2 cell culture
			Rhinovirus A39	7.1	31.16	MRC5 cell culture
			Parechovirus 3	7.2	29.35	LLC-MK2 cell culture

ssRNA (−)	Orthomyxoviridae	Yes	Influenza A virus H1N1	13.2	32.02	Egg passage
			Influenza A virus H3N2	13.6	ND	Egg passage
			Influenza B virus	14.2	ND	Egg passage
	Paramyxoviridae	Yes	Metapneumovirus A	13.3	31.86	LLC-MK2 cell culture
			Parainfluenzavirus 1	15.5	34.43	PRF5 cell culture
			Parainfluenzavirus 2	15.7	33.87	PRF5 cell culture
			Parainfluenzavirus 3	15.4	ND	PRF5 cell culture
			Parainfluenzavirus 4	17.4	31.83	PRF5 cell culture
			Respiratory syncytial virus A2	15.2	34.33	Hep-2 cell culture

ds double-stranded, ss single-stranded, VZV Varicella Zoster Virus, EBV Epstein Barr Virus, CMV Cytomegalovirus, ND not detectable. Ct values provide a crude estimate of viral genome abundance; quantitative PCR data are not available.

Virus composition of multiplex reagent 11/242-001. ds double-stranded, ss single-stranded, VZV Varicella Zoster Virus, EBV Epstein Barr Virus, CMV Cytomegalovirus, ND not detectable. Ct values provide a crude estimate of viral genome abundance; quantitative PCR data are not available. Individual viruses were propagated in cell culture or by egg passage, and non-cultivable viruses were isolated from clinical specimens. The origin of each virus is described in Table 1. Real-time PCR (RT-PCR) Cycle Threshold (Ct) values were determined for individual virus stocks, and the viruses were then pooled such that the predicted Ct value of each would be approximately 30. Pooled virus was formulated in 10 mM Tris, pH 7.4, supplemented with 2% foetal calf serum. 1 ml of reagent was filled into 2856 2 ml screw-cap Sarstedt vials and frozen at −70 °C. Samples of pooled material were assessed by in-house RT-PCR (see Supplemental Table 1 for PCR conditions) to determine the presence of the 25 viruses (Table 1). Not all viruses were detected following formulation – hence development of the reagent as a qPCR control was ceased and the material was deemed an ideal candidate for the current study. Infectivity of pooled viruses was not confirmed as the intended use was in nucleic-acid based detection methods. The precise concentrations of individual viruses are not known, however RT-PCR data suggest the viruses are present at a range of nucleic acid concentrations (Ct values range from ∼24 to not detectable, Table 1). A previous study found that up to 22 of the 25 viruses were detectable by sequencing at modest read depth (∼2,000,000 reads) [14]. The presence of additional viruses in the reagent was considered a possibility due to the isolation of several of the target viruses from human clinical specimens, the propagation of others in cell culture and the addition of foetal calf serum to the reagent. The presence of such viruses was not known in advance.

Study design

Two vials of reagent were shipped to each laboratory on dry ice. The list of target viruses was known to the laboratories to facilitate import and appropriate biocontainment. Laboratories processed the reagent according to their preferred method. Technical replicates were requested, but not mandatory due to the high costs involved. Laboratories were asked to analyse the entire data set, plus a random subset of 2 million reads.

Analysis

The parameters reported in the primary analysis were (a) total number of target viruses detected; (b) rankings of target viruses; and (c) correlation between read depth and number of target viruses detected. The majority of laboratories ran two technical replicates, with the replicate detecting most target viruses selected for primary analysis. In the event that both replicates detected equal numbers of target viruses, the replicate containing fewer total sequencing reads was selected. Secondary analysis included (a) reporting of non-target viruses detected by three or more labs; (b) ranking of all target and non-target viruses detected by three or more labs; and (c) consistency of virus detection between replicates. Rankings were determined based on the absolute numbers or proportions of reads matching the indicated viruses, to account for the fact that total numbers of reads, and proportion of reads identified as viral differed greatly between laboratories. One lab reported difficulty in distinguishing Norovirus strains due to a large number of closely related sequences available in public databases. This lab provided an explanation and reported hits to ‘Human norovirus’ rather than to serotypes GI and GII. For result plotting, hits to ‘Human norovirus’ (for this laboratory only) were considered to be to Norovirus GII that had a higher overall rank. Hits to Ad2 and AdC were merged as the reference sequences are identical. Similarly hits to Mastadenovirus F were merged with Ad41.

Reagent availability

The reagent is available via the NIBSC catalogue (nibsc.org/products), reference 11/242-001.

Results

Return of data

Data were returned from 15 of 16 laboratories. One laboratory was unable to complete analysis within the study time frame. A wide range of sample preparation and informatics methods was used by the laboratories, with no two laboratories using identical methods (Table 2, Supplemental Tables 2 and 3). Four laboratories returned data generated using two different methods. The majority of laboratories employed informatics methodology that identified viruses in a blind manner rather than specifically targeting the 25 known viruses (Supplemental Table 3).

Table 2

Summary methods by laboratory.

Lab	Nuclease	Extraction	Primary lib	Seq library	Platform	Database(s)	Primary identification	Blind/targeted
L01-A	Yes	Column/silica	Ion Torrent	Ion Torrent	Ion Proton	In-house	Smith-Waterman/BLAST	Blind
L01-B	Yes	Column/silica	Ion Torrent	Ion Torrent	Ion Proton	In-house	Mapping to targets	Targeted
L02	Yes	Column/silica	Fragmentation-ligation	Ion Torrent	Ion Proton	Virus RefSeq/in-house	Proprietary	Blind
L03	DNAse	EZ1	cDNA amplificationNucleic acid extract	Nextera XT	MiSeq	nt (Jun 2014)	SURPI	Blind
L04	No	Column/silica	cDNA synthesis	Nextera XT	MiSeq	NCBI	Align to NCBI	Blind
L05	No	Column/silica	Adaptor ligation	TruSeq	HiSeq2500	NCBI Viral Genome Neighbor	Alignment to references	Blind
L06	No	Beads	Adaptor ligation	Custom	HiSeq2000	In-house	BWA alignment	Blind
L07	No	Phenol/chloroform	Confidential	Nextera	MiSeq	Virus RefSeq/NCBI	Mapping to all viruses	Both
L08	No	Maxwell	None	ScriptSeq	MiSeq	Virus RefSeq/nr	BLAST	Blind
L09	No	Column/silica	Proprietary	Custom	454	In-house (Ref/Seq GenBank derived)	Proprietary	Not specified
L10-A	No	Column/silica	Fragmentation-ligation	TruSeq	HiSeq2500	GenBank 2013, clustered viral partition	BLASTn	Blind
L10-B	No	Column/silica	Fragmentation-ligation	TruSeq	HiSeq2500	GenBank 2013, clustered viral partition	BLASTn	Blind
L11	No	Column/silica	cDNA	Nextera XT	HiSeq1500	In-house	BLAST	Blind
L12-A	No	Column/silica	Random RT-PCR	Nextera XT	MiSeq	Virus RefSeq/nt	BLASTn	Blind
L12-B	No	Column/silica	MDA/SPIA	Nextera XT	MiSeq	Virus RefSeq/nt	BLASTn	Blind
L13	No	Magnetic beads	TruSeq	TruSeq	MiSeq	Virus RefSeq	BLAST/CENSUSCOPE	Blind
L14	DNAse	Column/silica	Fragmentation-ligation	Illumina PCR	MiSeq	GenBank	SLIM	Both
L15-A*	Yes	Column/silica	MDA/SPIA	Nextera XT	MiSeq	In-house	BWA then BLAST	Targeted
L15-B*	Yes	Column/silica	MDA/SPIA	Nextera XT	MiSeq	In-house	BWA then BLAST	Targeted

Individual laboratories are represented by coded identifiers unrelated to the order in Supplemental Table 2. Separate methods performed by the same laboratory have the suffix-A/-B. Blind indicates methods where viruses were identified without reference to the 25 target viruses. Targeted indicates methods where these 25 viruses were specifically targeted. *L15-A and L15-B represent similar methodology, but performed using variable amounts of starting material (Supplemental Table 2).

Summary methods by laboratory. Individual laboratories are represented by coded identifiers unrelated to the order in Supplemental Table 2. Separate methods performed by the same laboratory have the suffix-A/-B. Blind indicates methods where viruses were identified without reference to the 25 target viruses. Targeted indicates methods where these 25 viruses were specifically targeted. *L15-A and L15-B represent similar methodology, but performed using variable amounts of starting material (Supplemental Table 2).

Number of target viruses detected and effect of read number

Participating laboratories generated differing numbers of reads depending on the library preparation and sequencing platform used. Using a subset of 2,000,000 reads, a single lab detected all 25 target viruses (range 6–25). Using all reads, two labs detected all 25 viruses (range 6–25, Fig. 1 ). The majority of methods detected at least 20 target viruses using 2,000,000 reads (median 20.5) and at least 21 viruses using all reads (median 22) (Fig. 2 ). The number of target viruses detected did not correlate with total read numbers (Fig. 2), though it is expected that the underlying variation between methods masked any effect. For a given method, increasing read depth would be expected to increase the probability of detecting a given virus, though this analysis was beyond the scope of the current study.

Fig. 1

Fig. 2

Frequency of target virus detection in all methods. Left panel, Correlation between number of target viruses detected in 2,000,000 reads (median 20.5) and entire read set (median 22). Four laboratories performed analysis only on the total read set and these are shown left of the dotted line. Right panel, correlation between number of target viruses detected and read depth. Graph shows best fit and 95% confidence bands for regression line.

Number of target viruses detected by individual laboratories and methods. Horizontal hatched bars, target viruses detected in best replicate using 2 million reads. Solid black bar, target viruses detected in best replicate using all reads. Grey bar, target viruses detected in second replicate using all reads. White bar, target viruses detected in both replicates using all reads. Laboratories L01-B, L11, L13 and L15-B performed analysis only on the total read set. Frequency of target virus detection in all methods. Left panel, Correlation between number of target viruses detected in 2,000,000 reads (median 20.5) and entire read set (median 22). Four laboratories performed analysis only on the total read set and these are shown left of the dotted line. Right panel, correlation between number of target viruses detected and read depth. Graph shows best fit and 95% confidence bands for regression line.

Consistency of replicates

Ten laboratories (13 methods) performed technical replicates. The vast majority of viruses detected were present in both replicates (Fig. 1). The most common inconsistency was the presence of Norovirus GI, GII or Influenza B in one replicate but its absence in the other (Supplemental Table 4). The nine viruses where discrepancies were observed were predominantly the bottom ranked viruses in the positive replicate, i.e. the replicate in which the virus was detected (Supplemental Table 4 and Fig. 4).

Fig. 4

Rank order of viruses using all methods. Top panel, ranking of target viruses. Bottom panel, ranking of all viruses, excluding results where no additional viruses were reported. Horizontal bars indicate median rank of viruses; open circles indicate individual data points. Solid circles and prefix NT indicate non-target viruses reported by three or more laboratories. Data points below dotted line indicate that virus was not detected – such viruses were assigned a rank order of 26 (for target viruses) or 31 (for all viruses) for plotting and calculation of median.

Consistency of virus detection across methods and laboratories

A detailed breakdown of viruses detected by each laboratory is shown in Supplemental Table 5. Ad2, Human Herpesvirus (HHV)-3 and HHV-5 were detected by all methods (Fig. 3, left panel). Ad2, Ad41, HHV-3, HHV-5, Parechovirus, and Rotavirus A were detected by all laboratories (Fig. 3, right panel). Norovirus GI and Influenza B were detected by fewer than 50% of methods while Norovirus GI was detected by fewer than 50% of laboratories. The current study did not aim to directly compare PCR and deep sequencing for detection of these viruses, however with the exception of Ad41, the viruses detected by the fewest laboratories and methods were those that were not detected by real-time PCR.

Fig. 3

Consistency of virus detection across different methods and laboratories. Left panel, proportion of all methods detecting target viruses. Right panel, proportion of all laboratories detecting target viruses using best method. Grey shading indicates viruses not detected by real-time PCR.

Additional viruses detected

Ten of 15 laboratories reported the detection of additional viruses; those detected by three or more laboratories are described in Table 3 . At least 20 additional viruses were reported by single labs; these are not reported here as their presence was not corroborated by a second lab.

Table 3

Non-target viruses detected by three or more laboratories.

Virus	Number of laboratories (methods)	Laboratory/method
Bovine viral diarrhoea virus	10 (13)	L01-A, L02, L03, L06, L07, L09, L10-A, L10-B, L11, L12-A, L12-B, L15-A, L15-B
Human bocavirus	7 (8)	L01-A, L02, L03, L06, L10-A, L10-B, L11, L12-B
Human enterovirus (multiplea)	6 (8)	L02, L07, L09, L10-A, L10-B, L11 L12-A, L12-B
Aichi virus	4 (5)	L03, L10-A, L12-A, L15-A, L15-B
Bovine parvovirus	4 (5)	L01-A, L02, L10-A, L10-B, L11
Porcine/other circovirus	3 (3)	L03, L06, L15-B

Multiple similar results are consolidated to ‘Human enterovirus’. Some laboratories did not report non-target viruses.

Non-target viruses detected by three or more laboratories. Multiple similar results are consolidated to ‘Human enterovirus’. Some laboratories did not report non-target viruses.

Rank order of target viruses

Of the 25 targets, Parechovirus had the highest rank based on proportion of reads returning a hit (median rank 1.5, Fig. 4 ), while Norovirus GI had the lowest rank (median 26, i.e. not detected). The ranking varied significantly between laboratories, most notably rotavirus A, which varied from a rank of 1–26 (median 10). This may suggest that different methods are differentially likely to detect a given virus or family of viruses, however the low sample numbers and wide range of methods used precluded statistical analysis of the major factors influencing virus detection. When rankings were calculated with the inclusion of the additional viruses detected by three or more laboratories, the non-target Bovine Viral Diarrhoea Virus had a median rank of 8 and several other non-target viruses had rankings higher than target viruses (Fig. 4, bottom panel). Rank order of viruses using all methods. Top panel, ranking of target viruses. Bottom panel, ranking of all viruses, excluding results where no additional viruses were reported. Horizontal bars indicate median rank of viruses; open circles indicate individual data points. Solid circles and prefix NT indicate non-target viruses reported by three or more laboratories. Data points below dotted line indicate that virus was not detected – such viruses were assigned a rank order of 26 (for target viruses) or 31 (for all viruses) for plotting and calculation of median.

Summary results

The current study was not a proficiency test, and it is recognised that multiple experimental design considerations affect the ability to detect particular viruses. The results obtained in the current study, may be used for reference when using this material: All laboratories detected: Ad2, Ad41, HHV-3, HHV-5, Parechovirus 3, and Rotavirus A. Greater than 90% of laboratories detected Astrovirus, Coxsackievirus B4, HHV-1, HHV-2, HHV-4, Metapneumovirus A, Parainfluenzavirus 1, Parainfluenzavirus 4, Rhinovirus A39, and Sapovirus C12. Greater than 50% of laboratories detected: Coronavirus 229E, Influenza A H1N1, Influenza A H3N2, Influenza B, Norovirus GII, Parainfluenza virus 2, Parainfluenzavirus 3, Respiratory Syncytial Virus A2, and the non-target Bovine Viral Diarrhoea Virus. Fewer than 50% of laboratories detected: Norovirus GI, as well as the non-target Human Bocavirus, Human Enterovirus, Aichi Virus, Bovine Parvovirus, and Porcine/other Circoviruses.

Discussion

The detection of infectious PCV-1 in a human vaccine [19] demonstrated that existing safety tests may not detect some virus contaminants, and highlighted the potential for deep sequencing methods to form part of an improved testing scheme [1]. This study employed a reagent containing diverse virus families, genome and coat types, for comparison of the different strategies. The large diversity in laboratory and informatics methods used and the variability in detection of target viruses underscore the need for such materials to facilitate assay development and method comparison. The study highlighted a number of issues facing the successful implementation of deep sequencing within a manufacturing/regulatory environment and these are discussed below.

Issues in assay design and sample preparation

The nature of the material being tested will be determined by the product and production stage and different materials may have distinct upstream processing requirements (e.g. filtration, concentration, centrifugation). The discrepancies in detection of target viruses may in part be attributable to the assay design. For example, assays targeting only particle-protected nucleic acid and using nuclease may discriminate against certain signals, relative to those targeting total nucleic acid, though notably one laboratory that identified all 25 viruses (L02) reported using nuclease. Extraction methods may also affect the ability to detect different viruses [14] and variable efficiency of reverse transcriptase steps may bias against detection of RNA viruses. Sequencing library preparation typically requires nanogram to microgram amounts of DNA that may be challenging to obtain from certain starting materials. Lower amounts of DNA may yield adequate sequencing libraries in some cases, but deviating significantly from recommended inputs may be problematic in a Good Laboratory Practice (GLP) or Good Manufacturing Practice (GMP) environment. Amplification by PCR or Phi29-based systems was employed by two laboratories, however the risk of bias due to different template preferences should be considered. As with any molecular technique, the inclusion of no template controls is essential to avoid or identify false positives due to trace contamination of the sample extraction columns [7], [20], molecular biology reagents [7], [21] or sample-to-sample contamination during library preparation or sequencing [22].

Matrix effects

Matrix effects may be broadly defined as any change in the sensitivity and specificity of a detection assay due to substances in a sample which inhibit extraction of the target and/or co-purify and interfere with downstream processing. Adventitious virus detection is likely to be performed on diverse samples including raw materials, culture supernatants and bulk harvests, some of which may interfere with nucleic acid detection methods [23]. Competition by non-viral nucleic acids, e.g. from host cells, will negatively impact limits of detection, and the concentration of such may vary significantly between sample types. In such cases, nuclease treatment may increase sensitivity for particle-protected viruses, but should be used with caution for the reasons described in the previous section. A detailed investigation of matrix effects was beyond the scope of the study, however future reference materials should be compatible with a variety of matrices, and methods should be validated using a matrix similar to that of the test article.

Issues in bioinformatics and databases

It is important that bioinformatics algorithms strike an appropriate balance of speed, sensitivity and specificity. The size of sequencing datasets and reference databases continues to grow, offering obvious advantages in terms of sensitivity; however this has necessitated the development of new sequence classification algorithms to enable data analysis within a reasonable time, at a cost of potentially increased false negative rate [24], [25], [26], [27]. A range of methods were used in the current study and thresholds for assigning hits varied, with some laboratories reporting detection of a virus on the basis of a single read but others requiring that additional criteria were met. Viral identification stringency should be determined by the context. A low stringency will maximise the chance of detecting novel viruses which may be highly divergent from databases references at the risk of increased false positives, and may be appropriate e.g. for screening of master cell and virus banks where follow-up testing can be performed. In a routine testing scheme, or where a defined set of contaminants is being screened for, higher stringency may be appropriate to minimise false positives, e.g. due to matches with host sequences similar to viral genes. While the current study focussed on the detection of known viruses, many of the pipelines employed are entirely compatible with virus discovery investigations simply by altering the processing and search parameters. We also observed variation in the databases used for sequence classification. Publicly available databases such as those hosted by NCBI [28] are valuable but incomplete and not curated, and consideration is needed of what action should be taken if sequencing reads that are currently non-classified are later identified as being of viral origin. The size, breadth and complexity of databases can pose a challenge to sequence identification pipelines, especially when databases contain entries whose taxonomy is erroneously assigned. It may therefore be advisable for adventitious virus screens to be performed using curated databases, even though their breadth of coverage may be more limited.

General issues

Once hits are identified by deep sequencing, it will be important to confirm the presence of the contaminant by a second molecular method. Infectivity should be assessed if an appropriate assay exists. The presence of certain viral nucleic acid, e.g. that remaining after viral inactivation or reduction processes, may be considered acceptable if there is no evidence of infectious virus, though the fact that some viruses produce infectious nucleic acids [29], [30], [31] necessitates a cautious approach. Investigation procedures both within a Good Manufacturing Practice and regulatory setting will likely follow existing guidelines.

Intended use and limitations of the reference material

The reagent is intended to be used as a control to assess assay performance relative to the results presented herein, or to results from historical runs of the reagent. While not a proficiency testing material (since some assays have different design parameters), the reagent will enable users to perform inter-assay, intra-laboratory and inter-laboratory comparisons. The reagent does not purposefully contain a single-stranded DNA virus, a class of virus of particular interest given its detection in two vaccine products [1]. However, circovirus and parvovirus sequences were detected in the sample by three and four laboratories respectively (Table 3), providing evidence for the presence of these ssDNA viruses in the reagent, albeit at a low level. The most commonly reported non-target virus, Bovine Viral Diarrhoea Virus, likely originates from the foetal bovine serum used in the reagent, as does the Bovine Parvovirus. The detection of Human Bocavirus and Aichi Virus most likely reflects the use of faecal samples as a source for several of the target viruses (Table 1), while the ‘Human Enterovirus’ category likely reflects a combination of authentic enteroviruses from the faecal samples and potentially mis-classification of reads originating from Coxsackievirus and Rhinovirus. Only short sequence contigs were obtained for the circoviruses and definitive identification was not achieved, hence the origin of these viral reads cannot be determined. Absolute quantification of the components, in terms of infectious units, particles or genomic equivalents is not available. The reagent is therefore not suitable for determination of limits of detection or quantification. A number of reports have begun addressing limits of detection of particular viruses by deep sequencing [32], [33], [34], [35], and the potential replacement of existing in vitro and in vivo tests [10], [23]. Empirical definition of limits of detection for all possible viruses is impractical, but limits of detection for a set of viruses representing all major genome and coat types is a minimal starting point, with potential extension to include the full set of viruses for which screening is mandatory. Future reference materials will contain purified virus particles representing the diverse size, genome structure, GC-content and particle structure (enveloped or not) of the virus Kingdom. The materials should be subject to comprehensive characterisation including precise quantification, be compatible with a variety of sample matrices commonly encountered in biologicals manufacturing and not be restricted to particular molecular biology techniques or sequencing platforms, since these are likely to change over time.

Summary

The collaborative study highlighted the diversity of methods currently employed for adventitious virus screening by deep sequencing, and the variability in target virus detection underscored the need for a suitable reference material to enable assay comparison. Reagent 11/242-001 will serve as a useful first generation reference material for evaluating and improving such methods, monitoring of intra-laboratory consistency and enabling inter-laboratory comparisons to support this promising but nascent application of deep sequencing.

Author contributions

ETM, PDM and SS conceived the study. ETM, PDM, SS and study participants designed the study. ETM co-ordinated the study and data collection. ETM and MDP analysed the data. ETM drafted the manuscript. All authors revised and approved the manuscript.

34 in total

1. Systematic evaluation of in vitro and in vivo adventitious virus assays for the detection of viral contamination of cell banks and biological products.

Authors: James Gombold; Stephen Karakasidis; Paula Niksa; John Podczasy; Kitti Neumann; James Richardson; Nandini Sane; Renita Johnson-Leva; Valerie Randolph; Jerald Sadoff; Phillip Minor; Alexander Schmidt; Paul Duncan; Rebecca L Sheets
Journal: Vaccine Date: 2014-03-25 Impact factor: 3.641

2. Investigation of a regulatory agency enquiry into potential porcine circovirus type 1 contamination of the human rotavirus vaccine, Rotarix: approach and outcome.

Authors: Gary Dubin; Jean-François Toussaint; Jean-Pol Cassart; Barbara Howe; Donna Boyce; Leonard Friedland; Remon Abu-Elyazeed; Sylviane Poncelet; Htay Htay Han; Serge Debrus
Journal: Hum Vaccin Immunother Date: 2013-08-28 Impact factor: 3.452

3. Unbiased analysis by high throughput sequencing of the viral diversity in fetal bovine serum and trypsin used in cell culture.

Authors: Léa Gagnieur; Justine Cheval; Marlène Gratigny; Charles Hébert; Erika Muth; Marine Dumarest; Marc Eloit
Journal: Biologicals Date: 2014-03-22 Impact factor: 1.856

4. Detection of adventitious agents using next-generation sequencing.

Authors: Brenda Richards; Sherry Cao; Mark Plavsic; Robert Pomponio; Claire Davies; Robert Mattaliano; Stephen Madden; Katherine Klinger; Adam Palermo
Journal: PDA J Pharm Sci Technol Date: 2014 Nov-Dec

5. The sensitivity of massively parallel sequencing for detecting candidate infectious agents associated with human tissue.

Authors: Richard A Moore; René L Warren; J Douglas Freeman; Julia A Gustavsen; Caroline Chénard; Jan M Friedman; Curtis A Suttle; Yongjun Zhao; Robert A Holt
Journal: PLoS One Date: 2011-05-13 Impact factor: 3.240

6. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples.

Authors: Christian B Matranga; Kristian G Andersen; Sarah Winnicki; Michele Busby; Adrianne D Gladden; Ryan Tewhey; Matthew Stremlau; Aaron Berlin; Stephen K Gire; Eleina England; Lina M Moses; Tarjei S Mikkelsen; Ikponmwonsa Odia; Philomena E Ehiane; Onikepe Folarin; Augustine Goba; S Humarr Kahn; Donald S Grant; Anna Honko; Lisa Hensley; Christian Happi; Robert F Garry; Christine M Malboeuf; Bruce W Birren; Andreas Gnirke; Joshua Z Levin; Pardis C Sabeti
Journal: Genome Biol Date: 2014 Impact factor: 13.583

7. Infectivity of ribonucleic acid from poliovirus in human cell monolayers.

Authors: H E ALEXANDER; G KOCH; I M MOUNTAIN; O VAN DAMME
Journal: J Exp Med Date: 1958-10-01 Impact factor: 14.307

8. Kraken: ultrafast metagenomic sequence classification using exact alignments.

Authors: Derrick E Wood; Steven L Salzberg
Journal: Genome Biol Date: 2014-03-03 Impact factor: 13.583

9. Comparison of three next-generation sequencing platforms for metagenomic sequencing and identification of pathogens in blood.

Authors: Kenneth G Frey; Jesus Enrique Herrera-Galeano; Cassie L Redden; Truong V Luu; Stephanie L Servetas; Alfred J Mateczun; Vishwesh P Mokashi; Kimberly A Bishop-Lilly
Journal: BMC Genomics Date: 2014-02-04 Impact factor: 3.969

10. Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent.

Authors: Linlin Li; Xutao Deng; Edward T Mee; Sophie Collot-Teixeira; Rob Anderson; Silke Schepelmann; Philip D Minor; Eric Delwart
Journal: J Virol Methods Date: 2014-12-11 Impact factor: 2.014

19 in total

1. Report of the second international conference on next generation sequencing for adventitious virus detection in biologics for humans and animals.

Authors: Arifa S Khan; Johannes Blümel; Dieter Deforce; Marion F Gruber; Carmen Jungbäck; Ivana Knezevic; Laurent Mallet; David Mackay; Jelle Matthijnssens; Maureen O'Leary; Sebastiaan Theuns; Joseph Victoria; Pieter Neels
Journal: Biologicals Date: 2020-07-11 Impact factor: 1.856

2. Implementation of next-generation sequencing for virus identification in veterinary diagnostic laboratories.

Authors: Jakub Kubacki; Cornel Fraefel; Claudia Bachofen
Journal: J Vet Diagn Invest Date: 2020-12-24 Impact factor: 1.279

3. A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection.

Authors: Arifa S Khan; Siemon H S Ng; Olivier Vandeputte; Aisha Aljanahi; Avisek Deyati; Jean-Pol Cassart; Robert L Charlebois; Lanyn P Taliaferro
Journal: mSphere Date: 2017-09-13 Impact factor: 4.389

4. Efficient and unbiased metagenomic recovery of RNA virus genomes from human plasma samples.

Authors: Carmen F Manso; David F Bibby; Jean L Mbisa
Journal: Sci Rep Date: 2017-06-23 Impact factor: 4.379

5. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus.

Authors: Wycliff M Kinoti; Fiona E Constable; Narelle Nancarrow; Kim M Plummer; Brendan Rodoni
Journal: Front Microbiol Date: 2017-06-30 Impact factor: 5.640

6. Evaluation of Methods for the Concentration and Extraction of Viruses from Sewage in the Context of Metagenomic Sequencing.

Authors: Mathis Hjort Hjelmsø; Maria Hellmér; Xavier Fernandez-Cassi; Natàlia Timoneda; Oksana Lukjancenko; Michael Seidel; Dennis Elsässer; Frank M Aarestrup; Charlotta Löfström; Sílvia Bofill-Mas; Josep F Abril; Rosina Girones; Anna Charlotte Schultz
Journal: PLoS One Date: 2017-01-18 Impact factor: 3.240

7. Optimization and validation of sample preparation for metagenomic sequencing of viruses in clinical samples.

Authors: Dagmara W Lewandowska; Osvaldo Zagordi; Fabienne-Desirée Geissberger; Verena Kufner; Stefan Schmutz; Jürg Böni; Karin J Metzner; Alexandra Trkola; Michael Huber
Journal: Microbiome Date: 2017-08-08 Impact factor: 14.650

8. Viral Surveillance in Serum Samples From Patients With Acute Liver Failure By Metagenomic Next-Generation Sequencing.

Authors: Sneha Somasekar; Deanna Lee; Jody Rule; Samia N Naccache; Mars Stone; Michael P Busch; Corron Sanders; William M Lee; Charles Y Chiu
Journal: Clin Infect Dis Date: 2017-10-16 Impact factor: 9.079

9. Oxford Screening CSF and Respiratory samples ('OSCAR'): results of a pilot study to screen clinical samples from a diagnostic microbiology laboratory for viruses using Illumina next generation sequencing.

Authors: Colin Sharp; Tanya Golubchik; William F Gregory; Anna L McNaughton; Nicholas Gow; Mathyruban Selvaratnam; Alina Mirea; Dona Foster; Monique Andersson; Paul Klenerman; Katie Jeffery; Philippa C Matthews
Journal: BMC Res Notes Date: 2018-02-09

10. High-throughput Sequencing in Vaccine Research.

Authors: Katarzyna Pasik; Katarzyna Domańska-Blicharz
Journal: J Vet Res Date: 2021-05-31 Impact factor: 1.744