Literature DB >> 26426306

Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples.

Nezar Noor Al-Hebshi¹, Akram Thabet Nasher², Ali Mohamed Idris³, Tsute Chen⁴.

Abstract

BACKGROUND: Usefulness of next-generation sequencing (NGS) in assessing bacteria associated with oral squamous cell carcinoma (OSCC) has been undermined by inability to classify reads to the species level.
OBJECTIVE: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues.
METHODS: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD), HOMD extended (HOMDEXT), and Greengene Gold (GGG) at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis.
RESULTS: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples.
CONCLUSION: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the human, oral reference set. Applying the algorithm to OSCC samples revealed high diversity. In addition to oral taxa, a number of human, non-oral taxa were also identified, some of which are rarely detected in the oral cavity.

Entities: Chemical Disease Gene Species

Keywords: OSCC; bacteria; cancer; next-generation sequencing; pyrosequencing; taxonomy

Year: 2015 PMID： 26426306 PMCID： PMC4590409 DOI： 10.3402/jom.v7.28934

Source DB: PubMed Journal: J Oral Microbiol ISSN： 2000-2297 Impact factor: 5.474

There is recently an increasing interest in the potential role of bacteria in the development of oral cancer (1). Such a trend is driven by the existing evidence on association between certain bacterial species and some types of cancer. The etiological role of Helicobacter pylori in gastric adenocarcinomas and lymphomas is a classic example (2). Other examples include the association of Chlamydia trachomatis with cervical cancer (3), Salmonella typhi with gallbladder cancer (4), and Bacteroides fragilis and Fusobacteria with colon cancer (5, 6). Mechanisms by which bacteria are thought to contribute to the development of cancer include induction of chronic inflammation, interference with eukaryotic cell cycle, or/and production of carcinogenic substances (7). Actually, the oral microbiota has been demonstrated to produce carcinogenic levels of acetaldehyde (8). Bacteria associated with oral squamous cell carcinoma (OSCC) have been assessed in several studies using various methods with different types of specimens. Culture techniques have been first used to characterize bacteria on the surface of OSCC lesions (9), and later to document the presence of viable bacteria within OSCC tissues (10). Molecular techniques such as checkerboard DNA–DNA hybridization and clonal analysis of 16S rRNA have been employed to profile and compare bacterial species in tissue or saliva samples from OSCC and control subjects (11–14). While these studies identified several bacterial taxa in association with OSCC lesions or as potential markers in saliva, there seems to be no consensus among them on particular species to link to oral cancer. One possible reason, among others, for this is that cultivation and clonal analysis are limited by the number of strains/clones that can be feasibly tested, rendering reproducible detection of potentially relevant taxa, particularly low abundant ones, unlikely. The advent of high-throughput, next-generation sequencing (NGS) techniques, such as pyrosequencing, has enabled analysis of microbial communities at significantly higher depth and coverage than classical Sanger sequencing (15). Indeed, two recent studies have employed NGS to assess the bacteriome associated with OSCC (16, 17). However, these studies have used either saliva or surface swab samples but not cancerous tissue for testing. In addition, both studies employed the typical analysis approach that involves clustering of reads into operational taxonomic units (OTUs), using a Bayesian classifier or BLAST to assign taxonomies to representative OTU sequences and describing/comparing microbial composition at the phylum and genus levels, without the capability to accurately classify individual reads to the species level, which is probably more relevant to addressing the link between bacteria and oral cancer (or any other disease). In fact, while OTU analysis and taxonomic classification to the genus level may be justified for less characterized microbial communities such as that of the soil, it is probably not for well-characterized ones like those associated with humans (18), for which well-curated databases of reference 16S rRNA gene sequences such as the Human Oral Microbiome Database (HOMD; www.homd.org) (19) and the Greengene databases (www.greengenes.lbl.gov) (20) are available. These databases do not seem to have been adequately exploited for improving the resolution of taxonomic assignment of microbial metagenomic 16S rRNA reads despite the increase in reads length obtained with NGS technologies. The objective of this work therefore was to develop a robust, multi-stage, BLASTN-based search algorithm for classification of NGS reads from oral microbiological samples to the species level and to pilot test it for characterizing bacterial species/phylotypes within OSCC tissues. The algorithm takes advantage of three 16S rRNA reference sequence databases which were further curated in this study by removing potentially chimeric and redundant sequences and refining the associated taxonomy annotations.

Methods

OSCC DNA samples

Three samples were randomly selected from among 60 archived DNA extracts obtained from fresh OSCC biopsies in a previous study (21). All extracts had tested HPV-negative by q-PCR and had been stored at −80°C. The clinical features of the three cases selected retrospectively for the study are presented in Table 1.

Table 1

Clinical characteristics of the OSCC cases included in the study

Case no.	Site affected	Gender	Age (years)	Snuff dipping	Smoking
1	Floor of the mouth	Female	54	Yes	No
2	Gum	Male	45	Yes	Yes
3	Other and unspecified parts of the mouth	Female	55	No	No

Clinical characteristics of the OSCC cases included in the study

Amplicon library preparation and sequencing

Library preparation and sequencing were done at GATC Biotech (Konstanz, Germany). In a first PCR reaction, the V-V3 region of the 16S rRNA gene was amplified with the degenerate primers 27FYM (22) and 519R (23) using the reaction setup and cycling program described by Kistler et al. (24), with some modifications as follows: a second PCR reaction with few cycles was used to incorporate the GS FLX titanium adaptors FLX-A and FLX-B along with a 22-base spacer and 5-base barcodes to the amplicons. The final forward primer construct used in the second reaction was [FLX-A]-[22-base spacer]-[5-base tag]-27FYM, while that of the reverse was [FLX-B]-[22-base spacer]-519R. The three tagged amplicon libraries were pooled with another 16 libraries (another study) in equimolar amounts and sequenced unidirectionally (side A) on quarter plate using 454 GS FLX chemistry (Roche, Germany).

Preprocessing of sequencing data

The raw data were submitted to the Sequence Read Archive (SRA) under project accession number SRA204252. Data preprocessing were performed using the mothur software package version 1.33 (25). To minimize sequencing error rates, reads with any mismatch in the spacer–tag–primer sequence, base ambiguity or/and homopolymers>eight bases long were excluded, and remaining reads were trimmed so as to maintain a 50-nucleotide sliding window with an average quality score of ≥30 (26). Subsequently, the spacer–tag–primer sequence was trimmed off, and the reads were filtered to include only those with a minimum read length of 350 bases since a read length of 350–500 bases is required for identification (27). Those were aligned to SILVA reference alignment (28), and the ones with poor alignment were removed. The rest were stringently screened for chimeras with Uchime (29) and Chimera Slayer (30) sequentially, using both SILVA gold (30) (downloaded from www.mothur.org/wiki/Silva_reference_files) and, for the first time, updated-HOMD 13.2 (see the following sections) reference sequences combined as the reference set.

Optimization of reference databases

For taxonomic assignment of reads (see the following sections), three sets of 16S rRNA gene reference sequences were used: HOMD version 13.2 (downloaded from www.homd.org/index.php?name=seqDownload&file&type=R), HOMD extended version 1.1 (downloaded from www.homd.org/index.php?name=seqDownload&file&type=R), and Greengene Gold (GGG; downloaded from www.greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/). Due to concerns about the reliability of some reference sequences in HOMD 13.2 and HOMD extended, both sets were double-checked for and cleared off potential chimeric sequences with Uchime and Chimera Slayer, which resulted in removal of 27 and 172 sequences from HOMD 13.2 and HOMD extended, respectively. In addition, sequences in HOMD extended with better representatives in HOMD 13.2 (match at ≥98%) were removed (Floyd Dewhrist, personal communication), resulting in a final set of trusted 495 sequences (trusted-HOMDEXT). Finally, full 16S rRNA sequences of 21 novel oral taxa recently described by Camanocha and Dewhirst (31) were added to HOMD 13.2 (referred to hereafter as updated-HOMD 13.2). GGG was also modified to only include aligned and non-redundant sequences (3,940 out of 5,441). To obtain full taxonomy annotations for these, sequences were first classified with the Wang method (32) using the 2013 greengene reference taxonomy (~202,000 taxa); the resultant classifications were then combined with the binary names (Genus species/strain no.) provided with the GGG set, which resulted in obtaining full taxonomy annotations without conflict for the majority of the sequences. For a subset, additional search in Greengene and NCBI was necessary to arrive at the right taxonomy; many of the strains initially unnamed in the GGG set have been recently named and reclassified, so taxonomy was updated accordingly. The fasta and taxonomy files for the three sets (updated-HOMD 13.2, trusted-HOMDEXT, and modified-GGG), as well as lists of potential chimeras in HOMD and a detailed description of how they were identified can be obtained from ftp://www.homd.org/publication_data/20150120/.

Taxonomy assignment algorithm

The high-quality, non-chimeric reads were classified using a prioritized, multi-stage, BLASTN-based search against updated-HOMD 13.2, trusted-HOMDEXT, and modified-GGG as shown in Fig. 1. The ‘blastn’ command in the Nucleotide-Nucleotide BLAST 2.2.30+ was used with the following parameters: max. target seqs., 1000; penalty, -5; reward, +4; opengap, -5; gapextend, -5; outfmt, 6. For each read, hits to the reference sequences with both alignment coverage (BLASTN alignment length/read length) and identity (matches/alignment length) ≥98% were collected and ordered first by bit score then by percentage identity. Reads with a single best hit or multiple best hits (with equal bit score and identity) representing the same species/phylotype were assigned to the unique species-level taxonomy. Unmatched reads (i.e. reads with no hits at ≥98% alignment coverage and identity) and reads with best hits to multiple species were forwarded to the next stage of BLASTN search that included an additional set of reference sequences.

Fig. 1

Prioritized, multi-stage, BLASTN search algorithm used for taxonomic assignment of the reads. Refer to the text for a description. Updated-HOMD 13.2, Human Oral Microbiome Database version 13.2 updated by removal of potential chimeric sequences and addition of new taxa; trusted-HOMDEXT, HOMD extended after clearing chimeric and redundant sequences; modified-GGG: Greengene Gold collection after removing unaligned and redundant sequences. In the third stage, reads that hit multiple species were classified into either the ‘species-group’ level (for consistent species combinations, for example, Neisseria flavescens/subflava) or genus level (for inconsistent species combinations). Reads returning no hits from searching against all three reference sets were subjected to OTU analysis (stage 4). Reads were clustered to OTUs at 98% identity using average neighborhood. OTUs with ≤3 sequences were removed (rare OTUs), while the sequence with the smallest maximum distance to other sequences in each of the remaining OTUs was selected as a representative. The representative sequences were then BLASTN searched using the same coverage and identity criteria described above against NCBI's bacterial 16S rRNA sequences and nucleotide collection, and if any returned a hit, the corresponding OTU was assigned the species-level taxonomy of the returned hit. Finally, unmatched OTUs were labeled as potentially novel taxa, and classified to a higher rank (genus or family) using the Wang method and SILVA sequences as the reference.

Results

Pyrosequencing information

A total of 33,810 raw reads were obtained (~11,000 reads per sample). Filtering by read quality and length as described above removed 24,178 reads (71.5%). Additional 1,881 reads with poor alignment were excluded. One-thousand chimeras were identified with Uchime and another 46 with Chimera Slayer, leaving a final of 6,705 non-chimeric reads (mean of 2,235±336 reads/sample) with an average length of 416 bases.

Taxonomic assignment of reads

Using the novel, multi-stage, assignment algorithm, about 96% of the reads were matched to the reference sequences as follows: 92.8% to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. The majority of these reads (91.7%) were assigned to single species/phylotypes, 7.9% consistently matched to two or a few species/phylotypes so were assigned to the ‘species-group’ level, and only 0.4% were classified to the genus level in the final stage. That is, 99.6% were classified to the species level. OTU analysis of the remaining 4% unmatched reads, generated 17 non-rare, species-level OTUs, of which 12 matched reference sequences in NCBI including seven oral clones described in recent studies (33–35) but not included in HOMD. One OTU could not be classified even to the phylum level.

Bacteria within OSCC – phylum level

Eleven bacterial phyla were identified (Fig. 2), of which eight were present in all samples. The most abundant in order were: Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria, accounting for 98.8% of the reads. Phyla Tenericutes, Synergistetes, and GN02 were represented by few reads in single samples. Cases 1 and 2 had comparable phylum distribution, while case 3 had higher proportion of Proteobacteria at the expense of Fusobacteria and Actinobacteria (Fig. 3).

Fig. 2

Relative abundance (%) of 11 phyla detected in the OSCC samples. GN02, Synergistetes, and Tenericutes were found in single samples.

Fig. 3

Distribution of the detected phyla in each of the study OSCC samples.

Relative abundance (%) of 11 phyla detected in the OSCC samples. GN02, Synergistetes, and Tenericutes were found in single samples. Distribution of the detected phyla in each of the study OSCC samples.

Bacteria within OSCC – genus level

The reads were classified into 78 genera, of which 29 were detected in all samples. Overall, Haemophilus, Neisseria, Prevotella, Fusobacteria, Streptococcus, Porphyromonas, Leptotrichia, and Aggregatibacter were the most abundant in order (Fig. 4). Sixteen genera accounted for >80% of the reads. The distribution of these in each of the three samples is shown in Fig. 5. Case 3 had exceptionally very high relative abundance of genus Haemophilus, resulting in a significantly different profile than that of cases 1 and 2.

Fig. 4

Relative abundance (%) of 29 genera detected in all OSCC samples. Abundance of Haemophilus was inflated by the presence of high level of H. influenzae in the sample from case 3 (see Fig. 5).

Fig. 5

Distribution of 16 genera accounting for >80% of the reads in each of the OSCC sample. Profiles of cases 1 and 2 are comparable, while that of case 3 deviates significantly due to high levels of Haemophilus.

Relative abundance (%) of 29 genera detected in all OSCC samples. Abundance of Haemophilus was inflated by the presence of high level of H. influenzae in the sample from case 3 (see Fig. 5). Distribution of 16 genera accounting for >80% of the reads in each of the OSCC sample. Profiles of cases 1 and 2 are comparable, while that of case 3 deviates significantly due to high levels of Haemophilus.

Bacteria within OSCC – species level

Excluding 102 rare OTUs, a total of 228 species-level taxa (mean of 118±18 taxa/subject) were identified in the samples as follows: 222 species/phylotypes, 2 species groups, and 4 potentially novel OTUs. The vast majority of these were human, oral taxa representing 191 sequences in updated-HOMD 13.2, 16 in trusted-HOMDEXT, 10 in modified-GGG, and 7 in the NCBI's nucleotide collection. A list of detected taxa sorted by number of positive samples and relative abundance is presented in Supplementary Table 1. Thirty-five of the species-level taxa were detected in all the three samples (Fig. 6), while 54 were found in two of them, that is, 89 taxa were identified at least twice. The rest were identified in single samples at very low abundance, the only exception being Haemophilus influenzae, which was detected at 40.5% in the sample from case 3. Excluding the latter, the most abundant taxa, on average, were Prevotella oris, Neisseria flava, N. flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, Fusobacterium periodonticum, Neisseria elongata, Porphyromonas sp. oral taxon 279, and Alloprevotella tannerae. The distribution of these and other taxa, however, varied considerably among the three samples.

Fig. 6

Subject-level distribution of 35 species-level taxa identified in all the three OSCC samples. *Veillonella parvula group: V. parvula, V. dispar and V. rogosae.

Subject-level distribution of 35 species-level taxa identified in all the three OSCC samples. *Veillonella parvula group: V. parvula, V. dispar and V. rogosae. A number of human, non-oral taxa were also identified including B fragilis, Leptotrichia trevisani, Fusobacterium varium, Haemophilus pittmaniae, Propionibacterium granulosum, and Wolinella succinogenes. The former was detected in two samples, while the rest was found in only one sample. In addition, two environmental species, Sphingopyxis alaskensis and Cupriavidus metallidurans were identified in case 2.

Discussion

This report describes the use of a novel prioritized, multi-stage algorithm for species-level taxonomy assignment of bacterial 16S rRNA NGS reads from oral samples in a pilot study involving three OSCC samples. To our best knowledge, this is the first attempt to profile bacteria within OSCC tissues to the species level using NGS. Sequencing was achieved at ~11,000 reads per sample. However, filtering by length and quality parameters removed 70% of the reads which is too high compared to other oral microbiome studies (16, 24). In fact, similar processing of reads from another 16, primarily bacterial DNA samples run in parallel with the three OSCC samples resulted in removal of only 30% of the reads. It seems, therefore, that the presence of high human DNA background in OSCC samples adversely affects reads quality and length. Optimization of extracts from OSCC tissues for library preparation, for example, by enriching bacterial DNA, should be considered in future work. The high-quality reads were then subjected to exceptionally stringent chimera check by using two effective chimera detection software and combining HOMD 13.2 and SILVA gold sequences to make the reference set, an approach never reported before. Using updated-HOMD 13.2 as a reference for chimera detection may, however, be viewed as a more reliable alternative to one established method in which high abundance sequences in the dataset itself is used as a reference (26). Indeed, including HOMD 13.2 sequences resulted in detection of significantly more chimeras compared to when only SILVA gold was used (data not shown). Matching of non-chimeric reads to reference databases was performed using sequence% identity cutoff of 98% as previously described (27, 36) but a more stringent alignment coverage cutoff (98% vs. 95% in previous studies). Priority was given to the reference sequences in updated-HOMD 13.2 based on the rational that 1) they represent the highest quality, best curated full 16S rRNA gene sequences from oral bacteria 2) reads from a sample are most likely derived from species belonging to the same environment from which the sample was taken. Indeed, the vast majority of reads in this study (~93%) matched reference sequences in updated-HOMD 13.2. Surprisingly, neither of the two previous studies (16, 17) used HOMD as a reference in their analysis, which probably explains why one of them reported classification of only 16.7% of the reads (16). Next in priority was the extended set of the HOMD 16S rRNA sequences which contained additional collection of oral taxa with less reliable, partial sequences that have, nevertheless, proved valuable and been previously used for classification of short clone sequences (27). The purpose of including this set was to increase the assignment rate of reads to known human, oral taxa. However, an additional effort was carried out to clear it first of potential chimera as well as sequences already present in the HOMD 13.2, resulting in a more reliable reference sequence set (trusted-HOMDEXT). The third reference set exploited by the algorithm, modified-GGG, comprises reference sequences of both human oral and non-oral taxa as well as a number of well-characterized, environmental species/phylotypes, thus allowing classification of reads from non-oral taxa possibly present in the samples. Including trusted-HOMDEXT and modified-GGG increased the proportion of reads that could be successfully classified to 96%, which is very close to that reported for classification of short clone sequences, using a comparable algorithm (27). An additional search of non-rare OTU representatives against NCBI also returned top hits, mostly sequences of oral clones not included in HOMD. These shorter clone oral sequences in NCBI thus served as good attractor sequences for relevant taxa, which reflect the need to regularly update HOMD to include such sequences and, in turn, minimize the need for the slow, system-demanding NCBI search. In addition to the use of multiple reference databases that are prioritized by relevance and sequence quality, a major strength of the algorithm described here is that reads are individually matched to the reference sequences, which ensures most accurate assignment for each read. This is not the case with OTU-based algorithms that assign taxonomy to representative OTU sequences since reads called into an OTU using a certain sequence identity cutoff may not belong to a single species. This is especially true for those species with sequence similarities greater than the cutoff used. For example, in the genus Streptococcus, many species have greater than 99% sequence similarity to each other even based on the full-length 16S rRNA. Thus, a cutoff of 98% identity will lump multiple species of this genus together in an OTU. One common option is to use a Bayesian classifier to obtain consensus OTU taxonomy labels, but this usually reduces the taxonomic resolution to the genus, if not higher level. Our algorithm scarifies the speed for higher accuracy by searching individual reads against the most relevant reference sequences using the BLASTN program, which accounts for both sequence percentage identity and alignment length. The slower speed however is being quickly caught up by the availability of lower cost and higher performance, multi-core CPU computing resource, such as the cloud computing platform. Thus, the read-by-read, reference-based approach should be more commonly adopted for studying microbial communities with good quality reference source. Five bacterial phyla – namely Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria – have been consistently detected in tissue OSCC samples, obviously because they are the most abundant. In this study, additional six phyla were identified, with G02 reported for the first time within OSCC tissues. Firmicutes was not the most abundant phylum, while Bacteroidetes and Fusobacteria formed substantial proportions, which seems to be the characteristic of both cancerous and normal tissues of patients with OSCC (14, 17). Seventy-nine genera were identified which is double the highest number previously reported (13). Of the genera detected at high abundance in this report, Prevotella, Fusobacterium, and Leptotrichia have previously been shown to be characteristically more abundant in samples from OSCC patients compared to those from healthy controls (17). While Streptococcus was among the most abundant genera, it accounted for less than 10% of the reads on average, which is comparable to findings reported by Schmidt et al. (17) and Bebek et al. (14). In contrast, Pushalkar et al. (13), found it to represent around 50% of the sequences in both tumor and non-tumor tissues. Such a considerable variation may be explained by PCR and cloning biases, or may be due to targeting different regions of the 16S rRNA gene. A total of 228 species-level taxa were identified in this study, which is the highest diversity reported for bacteria within OSCC tissues. Using culture methods, Hooper et al. (10) detected 80 viable species within OSCC tissues. They later, employing classical 16S rRNA gene clonal analysis, identified additional 28 species in the same samples (11), bringing diversity to 108 species, of which 38 species were identified in this study. The results described by Pushalkar et al. (13), however, are probably the most comparable to the findings in this report, since they used HOMD for identification of their sequences. Indeed, 60 out of 80 species/phylotypes identified in that study were also detected in this study. Of course, much more species/phylotypes were identified here because of the higher sequencing throughput offered by pyrosequencing compared to the classical sequencing. The vast majority (~95%) of the species/phylotypes identified in the OSCC samples represented oral taxa. While these are probably commensals adapting to the tumor tissue environment, the possibility that a few of them may contribute to the development with OSCC or modify its clinical course cannot be excluded. The role of highly abundant species, particularly those with pathogenic potential such as P. oris, A. segnis, and Fusobacterium spp., should be explored further, probably testing them against oral epithelium in vitro. In addition to oral taxa, a number of human, non-oral taxa were also identified, some of which are rarely detected in the oral cavity, such as B. fragilis. The latter species may, in fact, be of relevance to the development of OSCC since it is linked in the literature to colon cancer. Actually, one proteomic work identified six proteins from this species in saliva of OSCC patients (37). This, therefore, warrants further investigation. In conclusion, we describe a robust algorithm that assigns individual NGS reads to species level by searching against multiple sets of high-quality, 16S rRNA reference sequences. The assignment is based on the best hits of single reads to the reference based on both sequence identity and alignment length. For biologically sensible taxonomy assignment, the algorithm gives priority to the taxonomy information provided by the highest quality, most relevant reference set which, in the case of oral samples, is the HOMD set. Appling the algorithm to a dataset from three OSCC samples resulted in unambiguous classification of the majority of the reads to the species level. The number of bacterial species-level taxa detected is the highest reported so far for OSCC tissues, with a number of species being reported for the first time in oral samples. However, the biological significance of specific taxa in OSCC was not the focus of this report and it remains to be evaluated in large-scale, controlled studies. Click here for additional data file.

37 in total

1. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors: T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal: Appl Environ Microbiol Date: 2006-07 Impact factor: 4.792

2. Critical evaluation of two primers commonly used for amplification of bacterial 16S rRNA genes.

Authors: Jeremy A Frank; Claudia I Reich; Shobha Sharma; Jon S Weisbaum; Brenda A Wilson; Gary J Olsen
Journal: Appl Environ Microbiol Date: 2008-02-22 Impact factor: 4.792

3. Microbial diversity in saliva of oral squamous cell carcinoma.

Authors: Smruti Pushalkar; Shrinivasrao P Mane; Xiaojie Ji; Yihong Li; Clive Evans; Oswald R Crasta; Douglas Morse; Robert Meagher; Anup Singh; Deepak Saxena
Journal: FEMS Immunol Med Microbiol Date: 2011-02-01

4. The human oral microbiome.

Authors: Floyd E Dewhirst; Tuste Chen; Jacques Izard; Bruce J Paster; Anne C R Tanner; Wen-Han Yu; Abirami Lakshmanan; William G Wade
Journal: J Bacteriol Date: 2010-07-23 Impact factor: 3.490

5. Viable bacteria present within oral squamous cell carcinoma tissue.

Authors: Samuel J Hooper; St John Crean; Michael A O Lewis; David A Spratt; William G Wade; Melanie J Wilson
Journal: J Clin Microbiol Date: 2006-05 Impact factor: 5.948

6. Acetaldehyde production and microbial colonization in oral squamous cell carcinoma and oral lichenoid disease.

Authors: Emilia Marttila; Johanna Uittamo; Peter Rusanen; Christian Lindqvist; Mikko Salaspuro; Riina Rautemaa
Journal: Oral Surg Oral Med Oral Pathol Oral Radiol Date: 2013-04-23

7. A molecular analysis of the bacteria present within oral squamous cell carcinoma.

Authors: Samuel J Hooper; St-John Crean; Michael J Fardy; Michael A O Lewis; David A Spratt; William G Wade; Melanie J Wilson
Journal: J Med Microbiol Date: 2007-12 Impact factor: 2.472

8. The microflora associated with human oral carcinomas.

Authors: K N Nagy; I Sonkodi; I Szöke; E Nagy; H N Newman
Journal: Oral Oncol Date: 1998-07 Impact factor: 5.337

Review 9. Helicobacter pylori and gastric cancer: the causal relationship.

Authors: Shajan Peter; Christoph Beglinger
Journal: Digestion Date: 2007 Impact factor: 3.216

10. The salivary microbiota as a diagnostic indicator of oral cancer: a descriptive, non-randomized study of cancer-free and oral squamous cell carcinoma subjects.

Authors: D L Mager; A D Haffajee; P M Devlin; C M Norris; M R Posner; J M Goodson
Journal: J Transl Med Date: 2005-07-07 Impact factor: 5.531

39 in total

1. Modeling Normal and Dysbiotic Subgingival Microbiomes: Effect of Nutrients.

Authors: D Baraniya; M Naginyte; T Chen; J M Albandar; S M Chialastri; D A Devine; P D Marsh; N N Al-Hebshi
Journal: J Dent Res Date: 2020-01-30 Impact factor: 6.116

2. Porphyromonas gingivalis is the most abundant species detected in coronary and femoral arteries.

Authors: J-L C Mougeot; C B Stevens; B J Paster; M T Brennan; P B Lockhart; F K B Mougeot
Journal: J Oral Microbiol Date: 2017-02-08 Impact factor: 5.474

3. The Salivary Microbiome and Oral Cancer Risk: a Pilot Study in Fanconi Anemia.

Authors: C P Furquim; G M S Soares; L L Ribeiro; M A Azcarate-Peril; N Butz; J Roach; K Moss; C Bonfim; C C Torres-Pereira; F R F Teles
Journal: J Dent Res Date: 2016-11-13 Impact factor: 6.116

Review 4. Periodontitis prevalence in adults ≥ 65 years of age, in the USA.

Authors: Paul I Eke; Liang Wei; Wenche S Borgnakke; Gina Thornton-Evans; Xingyou Zhang; Hua Lu; Lisa C McGuire; Robert J Genco
Journal: Periodontol 2000 Date: 2016-10 Impact factor: 7.589

5. The Microbiomes of Pancreatic and Duodenum Tissue Overlap and Are Highly Subject Specific but Differ between Pancreatic Cancer and Noncancer Subjects.

Authors: Erika Del Castillo; Richard Meier; Mei Chung; Devin C Koestler; Tsute Chen; Bruce J Paster; Kevin P Charpentier; Karl T Kelsey; Jacques Izard; Dominique S Michaud
Journal: Cancer Epidemiol Biomarkers Prev Date: 2018-10-29 Impact factor: 4.254

6. Human defects in STAT3 promote oral mucosal fungal and bacterial dysbiosis.

Authors: Loreto Abusleme; Patricia I Diaz; Alexandra F Freeman; Teresa Greenwell-Wild; Laurie Brenchley; Jigar V Desai; Weng-Ian Ng; Steven M Holland; Michail S Lionakis; Julia A Segre; Heidi H Kong; Niki M Moutsopoulos
Journal: JCI Insight Date: 2018-09-06

Review 7. A practical guide to the oral microbiome and its relation to health and disease.

Authors: K Krishnan; T Chen; B J Paster
Journal: Oral Dis Date: 2016-07-04 Impact factor: 3.511

8. SMDI: An Index for Measuring Subgingival Microbial Dysbiosis.

Authors: T Chen; P D Marsh; N N Al-Hebshi
Journal: J Dent Res Date: 2021-08-25 Impact factor: 6.116

Review 9. Emerging role of bacteria in oral carcinogenesis: a review with special reference to perio-pathogenic bacteria.

Authors: Manosha Perera; Nezar Noor Al-Hebshi; David J Speicher; Irosha Perera; Newell W Johnson
Journal: J Oral Microbiol Date: 2016-09-26 Impact factor: 5.474

10. The bacterial microbiome and metabolome in caries progression and arrest.

Authors: Thamirys da Costa Rosa; Aline de Almeida Neves; M Andrea Azcarate-Peril; Kimon Divaris; Di Wu; Hunyong Cho; Kevin Moss; Bruce J Paster; Tsute Chen; Liana B Freitas-Fernandes; Tatiana K S Fidalgo; Ricardo Tadeu Lopes; Ana Paula Valente; Roland R Arnold; Apoena de Aguiar Ribeiro
Journal: J Oral Microbiol Date: 2021-06-16 Impact factor: 5.474