| Literature DB >> 34276576 |
Frits F J Franssen1, Ingmar Janse1, Dennis Janssen1, Simone M Caccio2, Paolo Vatta2, Joke W B van der Giessen1, Mark W J van Passel1.
Abstract
Parasites often have complex developmental cycles that account for their presence in a variety of difficult-to-analyze matrices, including feces, water, soil, and food. Detection of parasites in these matrices still involves laborious methods. Untargeted sequencing of nucleic acids extracted from those matrices in metagenomic projects may represent an attractive alternative method for unbiased detection of these pathogens. Here, we show how publicly available metagenomic datasets can be mined to detect parasite specific sequences, and generate data useful for environmental surveillance. We use the protozoan parasite Cryptosporidium parvum as a test organism, and show that detection is influenced by the reference sequence chosen. Indeed, the use of the whole genome yields high sensitivity but low specificity, whereas specificity is improved through the use of signature sequences. In conclusion, querying metagenomic datasets for parasites is feasible and relevant, but requires optimization and validation. Nevertheless, this approach provides access to the large, and rapidly increasing, number of datasets from metagenomic and meta-transcriptomic studies, allowing unlocking hitherto idle signals of parasites in our environments.Entities:
Keywords: Cryptosporidium parvum; environmental metagenomes; metagenome analyses; parasite detection; signature sequences
Year: 2021 PMID: 34276576 PMCID: PMC8278238 DOI: 10.3389/fmicb.2021.622356
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Conceptual layout of the metagenome pipeline to detect parasites. Basically, the pipeline consists of three Python scripts (blue boxes) that receive input from databases (green cylinders) and two input files (ID_list and Query file), leading to output files mgsam, mglog and xslx file. Metagenome database sequence reads are checked for completeness of sequence- and metadata by Python script mg_downloader.py. Sequence reads in the database are classified to species level with mg_aligner.py. This may take place using Kraken-2, which gives an overview of what species may be present in an environment, or alternatively, reference sequences (orange box) may be used to mine the database for specific species, using e.g. K-mer Aligner (Kma) or Burrows-Weeler Aligner (BWA-MEM).Optionally, an independent pipeline may be used to identify specific (signature) sequences (yellow box), which can be used as reference sequence (Query file, orange box). During post-processing, confirmation of retrieved reads may take place (e.g., by BLASTn). All results are merged into two reports: mgsam and mglog which are processed to an xlsx file with mg_visualizer.py.
Querying the MG-RAST database with chromosome 6 sequence of Cryptosporidium parvum Iowa II strain yielded two environments and five project numbers.
| 4622705.3 | South America | Brazil, Sao Paulo, Brazil | Fresh water | 26,668,719 | 13,612 | 0.05 | 7,055 | 0.03 |
| 4537110.3 | North America | Canada, Edmonton | Calf mid jejunum | 167,401 | 402 | 0.24 | 267 | 0.16 |
| 4536848.3 | North America | Canada, Edmonton | Calf distal jejunum | 199,721 | 1,399 | 0.70 | 977 | 0.49 |
| 4536849.3 | North America | Canada, Edmonton | Calf ileum | 104,749 | 1,912 | 1.83 | 1,398 | 1.34 |
| 4537108.3 | North America | Canada, Edmonton | Calf distal jejunum | 147,855 | 4,382 | 2.92 | 3,199 | 2.16 |
| 4622705.3 | 0 | – | 1 | 5.40E-08 | 0 | – | 1 | |
| 4537110.3 | 255 | 6.04E-81 (9.62E-92–4.36E-81) | 9 | 6.48E-75 (8.37E-86–2.20E-53) | 2 | – (5.58E-63–2.24E-48) | 266 | |
| 4536848.3 | 944 | 6.04E-81 (9.62E-92–5.47E-40) | 24 | 3.92E-73 (3.33E-91–1.73E-40) | 7 | 9.16E-47 (2.48E-67–9.12E-20) | 975 | |
| 4536849.3 | 1,351 | 1.45E-82 (9.62E-92–7.75E-42) | 32 | 1.18E-73 (1.16E-90–3.97E-50) | 15 | 7.20E-42 (5.40E-63–2.89E-20) | 1,398 | |
| 4537108.3 | 3,095 | 4.19E-83 (9.62E-92–4.12E-39) | 84 | 2.42E-77 (9.62E-92–3.79E-42) | 19 | 3.52E-46 (4.23E-77–1.35E-23) | 3,198 | |
Reads mapped to different Cryptosporidium species of MG project number 4537108.3 showed variable specificity.
| 5807/353152 | 4.19E-83 | 9.62E-92–4.34E-48 | 153 | |||||
| 353151 | 3.63E-81 | 9.62E-92–2.63E-65 | 28 | |||||
| 5807/353152 | 4.22E-83 | 9.62E-92–4.12E-39 | 2608 | |||||
| 353151 | 3.52E-75 | 2.05E-87–3.79E-42 | 32 | |||||
| 857276 | 4.13E-45 | 8.09E-67–1.35E-23 | 14 | |||||
| 5807/353152 | 1.01E-84 | 9.62E-92–2.52E-44 | 236 | |||||
| 353151 | 1.89E-77 | 1.17E-90–1.05E-58 | 20 | |||||
| 857276 | - | 4.23E-77–2.02E-74 | 3 | |||||
| 5807/353152 | 4,19E-83 | 9.62E-92–2.13E-45 | 97 | |||||
| 353151 | 9,19E-68 | 2.96E-85–1.34E-62 | 4 | |||||
| 857276 | - | 1.69E-49–1.88E-35 | 2 | |||||
| 4,19E-83 | - | 1 | ||||||
FIGURE 2Query results with Cryptosporidium parvum Iowa II strain in MG-RAST sample ID 4537108.3 (project calf study digesta, mgp6020). (A) Query with chromosome 6 whole sequence. At C. parvum strain level, 3095 reads were confirmed (see Table 2 for E-values). (B) Query with chromosome 6 signature sequences. Fewer reads were retrieved, due to much shorter reference sequences than in A, but retrieved reads were more specific in comparison to panel (A). Retrieved reads were BLASTn verified. E-values and range ID1: 9E-86 (4E-94–2E-49). E-values and range ID2: 2E-76 (4E-94–2E-06). E-values and range ID3: 3E-70 (4E-94–9E-16). Note that, for improved clarity, the two graphs are not in scale.