| Literature DB >> 25091138 |
Allyson L Byrd, Joseph F Perez-Rogers, Solaiappan Manimaran, Eduardo Castro-Nallar, Ian Toma, Tim McCaffrey, Marc Siegel, Gary Benson, Keith A Crandall1, William Evan Johnson.
Abstract
BACKGROUND: The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens.Entities:
Mesh:
Year: 2014 PMID: 25091138 PMCID: PMC4131054 DOI: 10.1186/1471-2105-15-262
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation study alignment statistics using optimal model parameters
| Human | Virus | Bacteria | ||||
|---|---|---|---|---|---|---|
| Time (m) | Sensitivity | Time (m) | Sensitivity | Time (m) | Sensitivity | |
| Specificity | Specificity | Specificity | ||||
| Bowtie2 | 8.2 ± 0.0 | 90.2 ± 0.0 | 3.3 ± 0.6 | 98 .1 ± 0.6 | 15.8 ± 1.6 | 79.8 ± 0.1 |
| 100.0 ± 0.0 | 99.8 ± 0.2 | 100.0 ± 0.0 | ||||
| BWA | 22.8 ± 3.2 | 89.9 ± 0.0 | 6.5 ± 1.4 | 76.8 ± 5.4 | - | - |
| 100.0 ± 0.0 | 99.8 ± 0.2 | - | ||||
| SOAP2 | 5.7 ± 1.6 | 76.7 ± 0.0 | 3.9 ± 0.8 | 50.3 ± 5.4 | 23.3 ± 2.2 | 27.7 ± 0.0 |
| 100.0 ± 0.0 | 99.9 ± 0.1 | 100 ± 0.0 | ||||
| PBLAT | 61.2 ± 6.8 | 78.2 ± 0.0 | 16.7 ± 1.3 | 99.8 ± 0.1 | 306.3 ± 23.3 | 98.9 ± 0.0 |
| 100.0 ± 0.0 | 99.6 ± 0.2 | 52.7 ± 0.0 | ||||
Each aligner was used to align the first set of five simulated sequencing samples (10 million 100 base-pair reads) against each of the three genome libraries using optimal parameters. The average run time, sensitivity, and specificity as well as confidence intervals for each alignment are reported. BWA failed to run to completion with the bacterial library.
Figure 1Clinical PathoScope pipeline. A computational subtraction method using varying sequence read lengths and ambiguous read reassignment. Unassembled sequencing reads are aligned against a target library containing reference sequences of the intended target(s) of identification (e.g. viruses). Reads aligned to the target library are then aligned to a host library. Any reads aligned to the host sequences are removed from further analysis. Next, reads are aligned against a library of known non-target sequences. Unaligned reads are then mapped back to the target library, allowing up to k alignments per read (e.g. k = 10). These alignments are subsequently passed to an expectation maximization algorithm in which ambiguous alignments are reassigned to their most probable genome of origin. Upon reassignment, a report detailing the pathogens identified and their relative abundances is produced.
Run time comparisons of Clinical PathoScope and existing technologies
| Average Run Time (minutes) | ||||
|---|---|---|---|---|
| Dataset | Target | Clinical PathoScope | RINS | READSCAN |
| Simulation | Virus | 4.5 | 84.1 | 193.58 |
| Simulation | Bacteria | 13.1 | 1108.2 | |
| PCCL | Virus | 6.0 | 89.1 | 52.8 |
| TMAdv | Virus | 4.4 | 144.0 | 78.6 |
| Mummy | Bacteria | 25.0 | 1099 | 882 |
Figure 2Alignment variations with and without TMAdv in the target library. A) Without the TMAdv present in the target library, Clinical PathoScope assigned reads to several adenovirus genomes. The identified genomes are displayed according to the proportion of total reads aligned to all adenovirus genomes. The pairwise nucleotide identities of several adenovirus subtypes to the TMAdv genome according to Chen et al. are given in parentheses. The Simian adenovirus 3 had the most reads aligned of all adenoviral genomes, which is consistent with its sequence similarity to the TMAdv. Additionally, the Human adenovirus D aligned the most reads of all human adenoviruses, which is consistent with the analysis of Chen et al. B) Inclusion of the Titi Monkey Adenovirus (TMAdv) in the target library resulted in the assignment of 12,568 reads to the TMAdv reference genome.
Clinical PathoScope performance on the 16S amplimer dataset
| Clinical PathoScope Results | |||
|---|---|---|---|
| Accession | Sample type | Species identified | Reads assigned (%) |
| SRR949994 |
|
| 3,479 (98.0) |
|
| 36 (1.0) | ||
| SRR949995 |
|
| 2,351 (89.8) |
|
| 139 (5.3) | ||
|
| 44 (1.7) | ||
|
| 42 (1.6) | ||
| SRR949996 |
|
| 5,661(82.3) |
|
| 1,021 (14.9) | ||
| SRR949997 |
|
| 4,169 (94.7) |
|
| 66 (1.6) | ||
| SRR949998 | Mixture of |
| 14,280 (31.9) |
|
| 14,306 (31.9) | ||
|
| 8,771 (19.6) | ||
|
| 6,594 (14.8) | ||
| SRR950015 | Clinical Sample (F1) |
| 4,889 (59.4) |
|
| 3,177 (38.7) | ||
| SRR950024 | Clinical Sample (G1) |
| 1,131 (94.5) |
|
| 45 (3.8) | ||
| SRR950025 | Clinical Sample (H1) |
| 587 (85.9) |
|
| 18 (2.6) | ||
|
| 19 (2.8) | ||
|
| 18 (2.6) | ||
|
| 9 (1.3) | ||
|
| 10 (1.5) | ||
|
| 8 (1.2) | ||