| Literature DB >> 27379237 |
Steven J Schiff1, Julius Kiwanuka2, Gina Riggio3, Lan Nguyen4, Kevin Mu3, Emily Sproul3, Joel Bazira5, Juliet Mwanga-Amumpaire6, Dickson Tumusiime2, Eunice Nyesigire2, Nkangi Lwanga5, Kaleb T Bogale7, Vivek Kapur8, James R Broach9, Sarah U Morton10, Benjamin C Warf11, Mary Poss3.
Abstract
Neonatal sepsis (NS) is responsible for over 1 million yearly deaths worldwide. In the developing world, NS is often treated without an identified microbial pathogen. Amplicon sequencing of the bacterial 16S rRNA gene can be used to identify organisms that are difficult to detect by routine microbiological methods. However, contaminating bacteria are ubiquitous in both hospital settings and research reagents and must be accounted for to make effective use of these data. In this study, we sequenced the bacterial 16S rRNA gene obtained from blood and cerebrospinal fluid (CSF) of 80 neonates presenting with NS to the Mbarara Regional Hospital in Uganda. Assuming that patterns of background contamination would be independent of pathogenic microorganism DNA, we applied a novel quantitative approach using principal orthogonal decomposition to separate background contamination from potential pathogens in sequencing data. We designed our quantitative approach contrasting blood, CSF, and control specimens and employed a variety of statistical random matrix bootstrap hypotheses to estimate statistical significance. These analyses demonstrate that Leptospira appears present in some infants presenting within 48 h of birth, indicative of infection in utero, and up to 28 days of age, suggesting environmental exposure. This organism cannot be cultured in routine bacteriological settings and is enzootic in the cattle that often live in close proximity to the rural peoples of western Uganda. Our findings demonstrate that statistical approaches to remove background organisms common in 16S sequence data can reveal putative pathogens in small volume biological samples from newborns. This computational analysis thus reveals an important medical finding that has the potential to alter therapy and prevention efforts in a critically ill population.Entities:
Keywords: 16S rRNA; Leptospira; bacteria; neonatal sepsis; principal orthogonal decomposition; singular value decomposition
Year: 2016 PMID: 27379237 PMCID: PMC4904006 DOI: 10.3389/fmed.2016.00022
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1The characterization of the dataset and modes. (A) The graphical representation of read counts, sorted by columns of total reads for each taxa from left to right in descending order for 131 genus identifications in 95 samples. Color map is scaled to amplify the lowest 1% of read counts, and the color bar maximum dark red color is the same for counts from 320 to 32,000 in order to aid visualization of the dataset. This image is a visualization of the data in Table S1 in Supplementary Material, and the taxa for each column, from left to right, are given in the table in the same columnar order. (B) Fisher’s canonical linear discrimination demonstrates the optimal linear combinations of the read counts (Z1 and Z2) that separate samples from blood, CSF, and controls. These Fisher’s discriminants are optimal combinations of the read counts that maximally separate the different groups. Two of the three control samples overlap in the plot. Group means are large symbols. (C) First 10 eigenmodes from principal orthogonal decomposition and total energy [cumulative energy fraction, E] accounted for by summing modes progressively from left to right. Only the first 10 columns are plotted in each mode. The sum of all modes, which are weighted by their eigenvalues, would equal the original data set [a full discussion of this geometry can be found in Chapter 7.3 of Schiff (16)]. (D) The weighting of each mode (log of eigenvalue amplitudes) are shown, as well as the tolerance for insignificance (dashed line) below which eigenvalues are not resolvable. There are 95 eigenvalues, one for each patient sample and control. (E) Composition of the first three modes in terms of their representative genera sorted in descending order as blue, green, and red.
Figure 2Hypothesis testing for modes using random matrices. (A) Random matrix bootstrap ensemble distribution for all samples showing the mean (black solid line) and ±1 SD (blue dotted lines) for 1000 randomizations of all matrix eigenvalue amplitudes, and original data set eigenvalues (red asterisks). (B) Graphical representation of a randomization of Figure 1A using same color map scale. (C) All samples with mode 1 removed, and comparable mode composition in (D). (E) shows eigenvalue distribution for blood samples only, with mode composition in (F). (G) shows eigenvalues for blood with mode 1 removed, and in (H) the mode composition. (I) illustrates the probabilities of obtaining the first mode eigenvalues for all eigenvalues, and the bootstrap histograms that underlie the probabilities of the first three modal eigenvalues from (G,H) illustrating the significance of dominant Leptospira mode from (H) (similar results randomizing only by bacterial type not shown). Note that by removing the mode dominated by Ralstonia in the blood sample, the Leptospira dominant mode has an eigenvalue far larger than any eigenvalue generated from the randomized dataset. In contrast, the next two modes generate relatively small eigenvalues compared with the bootstrapped values. These results demonstrate that with the removal of the contaminating mode, it is highly statistically unlikely that random contamination was responsible for the pattern of Leptospira reads observed.