| Literature DB >> 30930939 |
Man Guo1, Er Xu1, Dongmei Ai2,1.
Abstract
Colorectal cancer is the third most common cancer worldwide with abysmal survival, thus requiring novel therapy strategies. Numerous studies have frequently observed infiltrating bacteria within the primary tumor tissues derived from patients. These studies have implicated the relative abundance of these bacteria as a contributing factor in tumor progression. Infiltrating bacteria are believed to be among the major drivers of tumorigenesis, progression, and metastasis and, hence, promising targets for new treatments. However, measuring their abundance directly remains challenging. One potential approach is to use the unmapped reads of host whole genome sequencing (hWGS) data, which previous studies have considered as contaminants and discarded. Here, we developed rigorous bioinformatics and statistical procedures to identify tumor-infiltrating bacteria associated with colorectal cancer from such whole genome sequencing data. Our approach used the reads of whole genome sequencing data of colon adenocarcinoma tissues not mapped to the human reference genome, including unmapped paired-end read pairs and single-end reads, the mates of which were mapped. We assembled the unmapped read pairs, remapped all those reads to the collection of human microbiome reference, and then computed their relative abundance of microbes by maximum likelihood (ML) estimation. We analyzed and compared the relative abundance and diversity of infiltrating bacteria between primary tumor tissues and associated normal blood samples. Our results showed that primary tumor tissues contained far more diverse total infiltrating bacteria than normal blood samples. The relative abundance of Bacteroides fragilis, Bacteroides dorei, and Fusobacterium nucleatum was significantly higher in primary colorectal tumors. These three bacteria were among the top ten microbes in the primary tumor tissues, yet were rarely found in normal blood samples. As a validation step, most of these bacteria were also closely associated with colorectal cancer in previous studies with alternative approaches. In summary, our approach provides a new analytic technique for investigating the infiltrating bacterial community within tumor tissues. Our novel cloud-based bioinformatics and statistical pipelines to analyze the infiltrating bacteria in colorectal tumors using the unmapped reads of whole genome sequences can be freely accessed from GitHub at https://github.com/gutmicrobes/UMIB.git.Entities:
Keywords: colorectal cancer; infiltrating bacteria; maximum likelihood estimation; tumor tissue; unmapped reads
Year: 2019 PMID: 30930939 PMCID: PMC6428740 DOI: 10.3389/fgene.2019.00213
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Flow chart showing the differential analysis of bacterial relative abundance using whole genome sequencing data. Whole genome sequencing BAM files are the result after mapping to the human reference genome.
FIGURE 2Alpha diversity of bacteria in the normal blood samples and primary tumor tissue samples. The violin plots show the alpha diversity of infiltrating bacteria in the normal blood and primary tumor tissue samples. The green color in the plot represents the normal blood samples, and the red color in the plot represents the primary tumor tissue samples. The “∗∗∗” symbol represents P-value < 0.001. Differential analysis was performed by Student’s t-test (P = 1.27E–06).
The most differentially abundant genera between tumor and normal samples (Q-value < 0.05).
| Genus | ||
|---|---|---|
| Bacteroides | 4.50E-09 | 4.81E-07 |
| Clostridium | 0.000377316 | 0.005046606 |
| Fusobacterium | 7.63E-05 | 0.002040633 |
| Streptococcus | 0.00095632 | 0.007309021 |
FIGURE 3Differential analysis of the relative abundance of bacteria in the normal blood and primary tumor tissue samples. (A) Differential analysis of bacteria at the genus level in normal blood and primary tumor tissue samples. The Benjamini–Hochberg false discovery rate (FDR)-corrected non-parametric Mann–Whitney–Wilcoxon test was used to calculate the P-value and analyze the differences in bacteria. The box plots show bacteria significantly different at the genus level. The “∗” symbol represents Q-value < 0.05; the “∗∗” symbol represents Q-value < 0.01; and the “∗∗∗” symbol represents Q-value < 0.001. (B) Differential analysis of bacterial abundance at the species level in the normal blood and primary tumor tissue samples. To differentially analyze the diversity of bacterial species in the normal blood and primary tumor tissue samples, the Benjamini–Hochberg FDR-corrected non-parametric Mann–Whitney–Wilcoxon test was used. Letters B, F, and P in the x-axis represent Bacteroides, Fusobacterium, and Parabacteroides, respectively. (C) The stacked bar charts of the top 10 bacterial species enriched in the normal blood samples and their relative abundance in the primary tumor tissue samples. (D) The stacked bar charts of the top ten bacterial species enriched in the primary tumor tissue samples and their relative abundance in the normal blood samples.
The most differentially abundant species between tumor and normal samples (Q-value < 0.05).
| Species | ||
|---|---|---|
| 1.36E-05 | 0.000715315 | |
| 1.31E-05 | 0.000715315 | |
| 8.84E-06 | 0.000715315 | |
| 1.88E-05 | 0.000850917 | |
| 0.000112939 | 0.002974056 | |
| 0.000150549 | 0.00317157 | |
| 0.000147964 | 0.00317157 | |
| 0.000190032 | 0.003336119 | |
| 0.005728611 | 0.043100975 | |
FIGURE 4Heat map and biclustering analysis of different colorectal cancer tissue samples based on phylogenesis of the bacterial species. Forty-three different bacterial species among the 46 selected primary tumor tissue samples and 36 normal blood samples were used to prepare the heat map. The red color of the tree diagram on the left hand side represents the primary tumor tissue samples, and the green color represents the normal blood samples.