| Literature DB >> 24837716 |
Martin Laurence1, Christos Hatzis2, Douglas E Brash3.
Abstract
Unbiased high-throughput sequencing of whole metagenome shotgun DNA libraries is a promising new approach to identifying microbes in clinical specimens, which, unlike other techniques, is not limited to known sequences. Unlike most sequencing applications, it is highly sensitive to laboratory contaminants as these will appear to originate from the clinical specimens. To assess the extent and diversity of sequence contaminants, we aligned 57 "1000 Genomes Project" sequencing runs from six centers against the four largest NCBI BLAST databases, detecting reads of diverse contaminant species in all runs and identifying the most common of these contaminant genera (Bradyrhizobium) in assembled genomes from the NCBI Genome database. Many of these microorganisms have been reported as contaminants of ultrapure water systems. Studies aiming to identify novel microbes in clinical specimens will greatly benefit from not only preventive measures such as extensive UV irradiation of water and cross-validation using independent techniques, but also a concerted effort to sequence the complete genomes of common contaminants so that they may be subtracted computationally.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24837716 PMCID: PMC4023998 DOI: 10.1371/journal.pone.0097876
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The contents of non-aligning reads from 57 human whole genome sequencing runs.
Categories are defined in Methods. Sequencing center acronyms in this table are: Baylor College of Medicine (BCM), the Broad Institute (BI), Illumina (ILLUM), the Max Planck Institute for Molecular Genetics (MPIMG), the Sanger Center (SC), and Washington University Genome Sequencing Center (WUGSC). Runs are sorted alphabetically by center, then by SRA number which are assigned successively over time. Units are read pairs.
Genbank entries which appear to be contaminated.
|
|
|
|
|
|
|
|
|
| 18,674 |
| Beijing Genomics Institute | Illumina GenomeAnalyser |
|
| 329,545 |
| Unknown | Illumina MiSeq | |
|
|
| Unknown |
| Canada’s Michael Smith Genome Sciences Centre | Illumina HiSeq 2000 |
|
| Complete |
| Unknown | 454 GS FLX Titanium | |
|
|
| 84,429 |
| J. Craig Venter Institute | 454 GS FLX Titanium and Illumina |
|
| 141,525 | B | Unknown | Illumina HiSeq 2000 | |
|
|
| Unknown |
| Celera | PE BiosystemsABI Prism |
|
| 141,525 |
| Unknown | Illumina HiSeq 2000 | |
|
|
| 685 |
| Unknown | 454 GS FLX |
|
| 141,525 |
| Unknown | Illumina HiSeq 2000 | |
|
|
| 35,272 |
| J. Craig Venter Institute | 454 GS FLX Titaniumand Illumina |
|
| 141,525 |
| Unknown | Illumina HiSeq 2000 | |
|
|
| 9,773 |
| Broad Institute | 454; ABI |
|
| 141,525 |
| Unknown | Illumina HiSeq 2000 |
Seven sequences which match specifically with both the Bradyrhizobium genus and a Genbank entry of a eukaryote. Two hundred thousand randomly selected Bradyrhizobium sequences were aligned against eukaryotes in the NCBI BLAST databases: this search is therefore indicative rather than exhaustive. Bradyrhizobium contamination inserted prior to de novo assembly of the eukaryote genome appears to have caused this double match, as it is not likely that different parts of the Bradyrhizobium genome would be conserved in select eukaryotes.