| Literature DB >> 35336955 |
Austin R Manny1,2, Carrie A Hetzel1,2, Arshan Mizani1,3, Max L Nibert1,2.
Abstract
Trichomonas vaginalis is the most common non-viral cause of sexually transmitted infections globally. Infection by this protozoan parasite results in the clinical syndrome trichomoniasis, which manifests as an inflammatory disease with acute and chronic consequences. Half or more isolates of this parasite are themselves infected with one or more dsRNA viruses that can exacerbate the inflammatory syndrome. At least four distinct viruses have been identified in T. vaginalis to date, constituting species Trichomonas vaginalis virus 1 through Trichomonas vaginalis virus 4 in genus Trichomonasvirus. Despite the global prevalence of these viruses, few complete coding sequences have been reported. We conducted viral sequence mining in publicly available transcriptomes across 60 RNA-Seq accessions representing at least 13 distinct T. vaginalis isolates. The results led to sequence assemblies for 27 novel trichomonasvirus strains across all four recognized species. Using a strategy of de novo sequence assembly followed by taxonomic classification, we additionally discovered six strains of a newly identified fifth species, for which we propose the name Trichomonas vaginalis virus 5, also in genus Trichomonasvirus. These additional strains exhibit high sequence identity to each other, but low sequence identity to strains of the other four species. Phylogenetic analyses corroborate the species-level designations. These results substantially increase the number of trichomonasvirus genome sequences and demonstrate the utility of mining publicly available transcriptomes for virus discovery in a critical human pathogen.Entities:
Keywords: Totiviridae; dsRNA virus; protozoan virus; transcriptome mining; trichomonasvirus; virus discovery
Mesh:
Year: 2022 PMID: 35336955 PMCID: PMC8953718 DOI: 10.3390/v14030548
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Screens of SRA datasets at NCBI for TVV-matching sequence reads.
| Sequence Read Counts from Screen for: | |||||||
|---|---|---|---|---|---|---|---|
| BioProject | Inst. 1 | TVV1 | TVV2 | TVV3 | TVV4 | TVV5 | |
| PRJNA176299 | HHUD | T016 | 0 |
| 0 | 0 | 0 |
| PRJNA236636 | HHUD | T016 | 0 |
| 0 | 0 | 0 |
| PRJNA280779 | NYU | BRIS/92/STDL/B7268 4 |
| 4 | 0 | 0 | 17 |
| PRJNA280779 | NYU | GOR/03/PNGIMR/69 |
|
| 2 | 0 | 2 |
| PRJNA280779 | NYU | G3 | 2 |
|
| 1 | 4 |
| PRJNA280779 | NYU | NYCA04 |
| 6 |
| 2 |
|
| PRJNA280779 | NYU | NYCB20 | 4 | 6 | 1 | 0 | 4 |
| PRJNA280779 | NYU | NYCC37 |
|
|
| 3 |
|
| PRJNA280779 | NYU | NYCD15 |
|
|
|
|
|
| PRJNA280779 | NYU | NYCE32 |
| 3 |
|
|
|
| PRJNA280779 | NYU | NYCF20 |
| 2 |
| 0 | 6 |
| PRJNA280779 | NYU | NYCG31 |
| 5 |
| 0 |
|
| PRJNA280779 | NYU | SD2 11591* |
| 0 |
| 0 |
|
| PRJNA345042 | UU | B7RC2 | 0 |
|
| 0 | 0 |
| PRJNA345042 | UU | G3 |
| 4 | 10 | 0 | 0 |
| PRJNA352855 | YU | T016 | 0 | 0 | 0 | 0 | 0 |
| Current study | HMS | G3 | 0 |
|
| 0 | 0 |
Institution: HHUD, Heinrich Heine University Düsseldorf; YU, Yonsei University; UU, University of Utah; NYU, New York University; HMS, Harvard Medical School; As indicated in the metadata for the respective SRA accessions, including the asterisk in SD2 11591*; Numbers in bold reflect new TVV strains; SRA reads from this T. vaginalis isolate and a metronidazole-resistant mutant derived from it were combined for this analysis.
Figure 1Maximum-likelihood phylogenetic tree of TVV1 through TVV5 strains. CP/RdRp aa sequences were deduced from the new TVV assemblies presented in this study (labeled in black) as well as from reference TVV genomes retrieved from NCBI GenBank (labeled in gray). Support values for the main branches are shown as percentages; above the branch is the value from standard bootstrapping with/without subsequent transfer analysis, and below the branch is the value from ultrafast bootstrapping with/without subsequent transfer analysis. The tree is rooted at the midpoint. Bars on the right highlight the five trichomonasvirus species. See Table 1 and Supplementary File S2 for explanations of TVV strain names.
Figure 2Gap plots showing all indels across the aligned nt sequences of TVV1 through TVV5 strains. Indels are concentrated in the 5′ and 3′ UTRs, although gapless regions also found within the UTRs suggest conserved functional elements. For each trichomonasvirus species, new assemblies were combined with all coding-complete and partial sequences from NCBI GenBank and aligned using MAFFT L-INS-i. The unsequenced ends of partial sequences in the multiple sequence alignment were masked to prevent bias from missing residues. The alignment was analyzed with a custom R script. Gray boxes denote the CDS for the CP/RdRp of each species. Gap positions are indicated by red bars. Red numbers indicate the gap position nearest each CDS boundary; black numbers indicate the CDS boundary positions.
Figure 3Conservation plots for nt sequences of TVV1 through TVV5 strains. For each trichomonasvirus species, complete and partial coding sequences were retrieved from NCBI GenBank and combined with new assemblies that were coding-complete or nearly so. A sliding window of 15 nt was chosen for smoothing. The EDNAFULL substitution matrix was used, in which a score of 5 denotes perfect identity at a given position. Gray boxes denote the CDS for the CP/RdRp of each species. Red triangles denote positions of any indels within the CDS (also see Figure 2).
Figure 4Double stem–loop structures near the coding-strand 3’ termini of TVV2 through TVV5 strains. Secondary-structure predictions identified this conserved feature, shown here for representative strains TVV2-OC3, TVV3-OC3, TVV4-OC3, and TVV5-NYCE32. This feature extends to within 21 nt of the coding-strand 3’ terminus of each virus and consists of two adjoining stem–loops with no intervening nt residues.