Literature DB >> 33958399

Rengasvirus, a Circular Replication-Associated Protein-Encoding Single-Stranded DNA Virus-Related Genome That Is a Common Contaminant in Metagenomic Data.

Emma L Keeler1, Louis J Taylor1, Arwa Abbas1, Ronald G Collman1,2, Frederic D Bushman3.   

Abstract

We report the genome of a circular replication-associated protein (Rep)-encoding segmented or satellite virus, which we have provisionally named rengasvirus. In metagenomic studies of virus-enriched fractions, rengasvirus was detected widely, including in reagent-negative controls. We thus report this genome to help others recognize a probable contaminating sequence.
Copyright © 2021 Keeler et al.

Entities:  

Year:  2021        PMID: 33958399      PMCID: PMC8103869          DOI: 10.1128/MRA.00273-21

Source DB:  PubMed          Journal:  Microbiol Resour Announc        ISSN: 2576-098X


ANNOUNCEMENT

Here, we describe a circular replication-associated protein (Rep)-encoding single-stranded (CRESS) DNA virus-related genome discovered in metagenomic data from human subjects and also in negative controls, suggesting that it originates from laboratory reagents. The rengasviral (rengas = Finnish for “ring”) sequence was first detected in metagenomic sequence data from bronchoalveolar lavage (BAL) fluid samples from lung transplant recipients (1). Default parameters were used for all software except where otherwise indicated. Contig building, open reading frame (ORF) prediction, and mapping of reads to contigs were performed using the Sunbeam pipeline version 2.1 (2); candidate CRESS viruses were identified by screening against vFam models for CRESS viral Reps using HMMER version 3 (3, 4). We confirmed the circularity of these sequences using PCR by amplifying around the DNA circles. For this, we used divergently oriented “back-to-back” primer pairs and recovered product bands of genome length. This was repeated with two primer sets binding to different locations on the circular DNA (set A forward [Fwd], GGCAGATCTAGATCACTACTCTGGAC; set A reverse [Rev], GCCAATGCGGGAGTAAATAGCTTG; set B Fwd, CCCTATCACTCTATAACATAACAAATGTCATTAGG; set B Rev, GGGTAATACTGATCCTATCACTCCTTTATAAC). We identified 105× coverage of the PCR-confirmed rengasvirus full-genome sequence in the sample in which we initially identified the rengasvirus sequence (SRA run accession number SRR5826708). The rengasvirus genome is a 1,045-bp DNA sequence containing a single ORF encoding the Rep with a GC content of 49.7% (GenBank accession number MW559600). Based on BLASTp searches, the closest reported Rep amino acid sequences were from a circular DNA molecule from a glacial ice core (QGF19362.1), a CRESS virus helicase (AWW06123.1), and a dragonfly larva-associated circular virus (ALE29688.1), with sequence identities of 51.49%, 42.91%, and 41.11%, respectively (online search, February 2021). A maximum-likelihood phylogenetic tree of Rep placed rengasvirus as a member of the CRESSV2 viral cluster (Fig. 1A) (5).
FIG 1

(A) Phylogenetic tree of CRESS virus Rep amino acid sequences. The rengasvirus Rep is shown in green. Rep amino acid sequences were aligned using MUSCLE version 3.8; trees were constructed using RaxML version 8.2 and visualized using iTOL version 6. (B) Comparison between rengasvirus DNA stem-loop and that of a circular DNA encoding capsid found in one rengasvirus-positive sample.

(A) Phylogenetic tree of CRESS virus Rep amino acid sequences. The rengasvirus Rep is shown in green. Rep amino acid sequences were aligned using MUSCLE version 3.8; trees were constructed using RaxML version 8.2 and visualized using iTOL version 6. (B) Comparison between rengasvirus DNA stem-loop and that of a circular DNA encoding capsid found in one rengasvirus-positive sample. To investigate the prevalence of rengasvirus sequences, we interrogated publicly available metagenomic data sets for homologous sequences generated by our lab and by other groups. Alignments were performed using the hisss pipeline (https://github.com/louiejtaylor/hisss), described in reference 6, which uses grabseqs and sra-tools to access public metagenomic data, Bowtie 2 (option, –very-sensitive-local) to align reads to target genomes, and ggplot2 (R version 3.2.3) (7–11). A positive rengasvirus hit in a metagenomic sample was defined as reads aligning to ≥25% of the viral genome; we discussed the rationale for this cutoff for CRESS virus genomes in a previous publication (6). Of the 40 data sets and 3,568 samples queried for sequence homology to the rengasvirus genome, positive hits were detected in 6 data sets, with percentages of positive samples ranging from 0.70% to 10.9% of samples (Table 1). We identified hits to the rengasvirus genome in various control samples from two different in-house studies, including two buffer-negative controls performed using the All Prep extraction kit (SRA numbers SRR6316280 and SRR6316219) (1) and one water extraction blank using the UltraSens virus kit (SRR7430813) (both kits from Qiagen, Valencia, CA) (12). Few public data sets include sequenced negative controls, precluding a detailed analysis of the origin of this putative genome or segment. However, circular DNAs have previously been identified as contaminants in nucleic extraction kit columns (13), representing a potential source for rengasvirus DNA in negative-control samples.
TABLE 1

Metadata associated with rengasvirus-positive metagenome samples (≥25% coverage)

BioProject no.OrganismBody siteSampleDisease stateLocationPositive samples (%)
PRJNA390659Homo sapiens; controlsLungBAL fluid; bufferLung transplantUnited States10.9
PRJNA327423Homo sapiensGut; oralStool; salivaNoneUnited States1.04
PRJNA385126Homo sapiensGutStoolNoneIreland2.50
PRJNA275568Homo sapiensGutStoolIslet autoimmunityFinland5.21
PRJEB9524Homo sapiensGutStoolHIVUganda1.52
PRJNA407341Homo sapiensGutStoolIBDaIreland0.70

IBD, inflammatory bowel disease.

Metadata associated with rengasvirus-positive metagenome samples (≥25% coverage) IBD, inflammatory bowel disease. The rengasvirus genome described encodes only a Rep, raising the question of how it becomes encapsidated in viral particles. In one BAL fluid sample containing rengasvirus (SRA number SRR5826708), we also identified another circular DNA of 933 bases in length encoding a capsid protein with a GC content of 41.3% (GenBank accession number MW559599). We identified this sequence from metagenomic contigs using a method similar to the initial rengasvirus detection method (described above), except for using hidden Markov models (HMMs) based on viral capsid instead of Reps from vFam (3). For this molecule, we also confirmed circularity using whole-genome PCR with two sets of back-to-back primers as described above (set A Fwd, GCCTCACTTAAATAGATGTTAAGGTATGCAATG; set A Rev, GGCAAGTACTGGTACTGCACC; set B Fwd, GCCATAAGCATTCCGCGTG; set B Rev, GGCGAAGAGGAAGAGGAAGATG). This sequence also contained a DNA stem-loop with some resemblance to that of the rengasvirus Rep-encoding DNA (Fig. 1B). Thus, the two molecules together might comprise a bipartite genome. It is also possible that rengasvirus is a satellite virus relying on capsid and other functions produced by another unknown virus. In summary, our results indicate that rengasvirus sequences are a common laboratory contaminant and provide an alignment target that can be used for quality control in future metagenome studies.

Data availability.

The sequences described above been deposited in GenBank under the accession numbers MW559599 and MW559600. The sequence data set in which both sequences were originally identified is available under BioProject number PRJNA390659.
  11 in total

1.  The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns.

Authors:  Samia N Naccache; Alexander L Greninger; Deanna Lee; Lark L Coffey; Tung Phan; Annie Rein-Weston; Andrew Aronsohn; John Hackett; Eric L Delwart; Charles Y Chiu
Journal:  J Virol       Date:  2013-09-11       Impact factor: 5.103

2.  The Perioperative Lung Transplant Virome: Torque Teno Viruses Are Elevated in Donor Lungs and Show Divergent Dynamics in Primary Graft Dysfunction.

Authors:  A A Abbas; J M Diamond; C Chehoud; B Chang; J J Kotzin; J C Young; I Imai; A R Haas; E Cantu; D J Lederer; K C Meyer; R K Milewski; K M Olthoff; A Shaked; J D Christie; F D Bushman; R G Collman
Journal:  Am J Transplant       Date:  2016-11-04       Impact factor: 8.086

3.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

4.  Cressdnaviricota: a Virus Phylum Unifying Seven Families of Rep-Encoding Viruses with Single-Stranded, Circular DNA Genomes.

Authors:  Mart Krupovic; Arvind Varsani; Darius Kazlauskas; Mya Breitbart; Eric Delwart; Karyna Rosario; Natalya Yutin; Yuri I Wolf; Balázs Harrach; F Murilo Zerbini; Valerian V Dolja; Jens H Kuhn; Eugene V Koonin
Journal:  J Virol       Date:  2020-06-01       Impact factor: 5.103

5.  grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories.

Authors:  Louis J Taylor; Arwa Abbas; Frederic D Bushman
Journal:  Bioinformatics       Date:  2020-06-01       Impact factor: 6.937

6.  Redondoviridae, a Family of Small, Circular DNA Viruses of the Human Oro-Respiratory Tract Associated with Periodontitis and Critical Illness.

Authors:  Arwa A Abbas; Louis J Taylor; Marisol I Dothard; Jacob S Leiby; Ayannah S Fitzgerald; Layla A Khatib; Ronald G Collman; Frederic D Bushman
Journal:  Cell Host Microbe       Date:  2019-05-08       Impact factor: 21.023

7.  HMMER web server: interactive sequence similarity searching.

Authors:  Robert D Finn; Jody Clements; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2011-05-18       Impact factor: 16.971

8.  The sequence read archive.

Authors:  Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

9.  Profile hidden Markov models for the detection of viruses within metagenomic sequence data.

Authors:  Peter Skewes-Cox; Thomas J Sharpton; Katherine S Pollard; Joseph L DeRisi
Journal:  PLoS One       Date:  2014-08-20       Impact factor: 3.240

10.  Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments.

Authors:  Erik L Clarke; Louis J Taylor; Chunyu Zhao; Andrew Connell; Jung-Jin Lee; Bryton Fett; Frederic D Bushman; Kyle Bittinger
Journal:  Microbiome       Date:  2019-03-22       Impact factor: 14.650

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.