Literature DB >> 30482864

Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults.

Charles Langelier1,2, Katrina L Kalantar3, Farzad Moazed4, Michael R Wilson5,6, Emily D Crawford2,3, Thomas Deiss4, Annika Belzer4, Samaneh Bolourchi4, Saharai Caldera1,2, Monica Fung1, Alejandra Jauregui4, Katherine Malcolm7, Amy Lyden2, Lillian Khan3, Kathryn Vessel4, Jenai Quan2,3, Matt Zinter8, Charles Y Chiu1,9, Eric D Chow3, Jenny Wilson10, Steve Miller9, Michael A Matthay4,11,12, Katherine S Pollard2,13,14,15,16,17, Stephanie Christenson4, Carolyn S Calfee4,8, Joseph L DeRisi18,3.   

Abstract

Lower respiratory tract infections (LRTIs) lead to more deaths each year than any other infectious disease category. Despite this, etiologic LRTI pathogens are infrequently identified due to limitations of existing microbiologic tests. In critically ill patients, noninfectious inflammatory syndromes resembling LRTIs further complicate diagnosis. To address the need for improved LRTI diagnostics, we performed metagenomic next-generation sequencing (mNGS) on tracheal aspirates from 92 adults with acute respiratory failure and simultaneously assessed pathogens, the airway microbiome, and the host transcriptome. To differentiate pathogens from respiratory commensals, we developed a rules-based model (RBM) and logistic regression model (LRM) in a derivation cohort of 20 patients with LRTIs or noninfectious acute respiratory illnesses. When tested in an independent validation cohort of 24 patients, both models achieved accuracies of 95.5%. We next developed pathogen, microbiome diversity, and host gene expression metrics to identify LRTI-positive patients and differentiate them from critically ill controls with noninfectious acute respiratory illnesses. When tested in the validation cohort, the pathogen metric performed with an area under the receiver-operating curve (AUC) of 0.96 (95% CI, 0.86-1.00), the diversity metric with an AUC of 0.80 (95% CI, 0.63-0.98), and the host transcriptional classifier with an AUC of 0.88 (95% CI, 0.75-1.00). Combining these achieved a negative predictive value of 100%. This study suggests that a single streamlined protocol offering an integrated genomic portrait of pathogen, microbiome, and host transcriptome may hold promise as a tool for LRTI diagnosis.
Copyright © 2018 the Author(s). Published by PNAS.

Entities:  

Keywords:  lower respiratory tract infection; mechanical ventilation; next-generation sequencing; pneumonia; transcriptome

Mesh:

Year:  2018        PMID: 30482864      PMCID: PMC6310811          DOI: 10.1073/pnas.1809700115

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide (1–3). Early and accurate determination of acute respiratory disease etiology is crucial for implementing effective pathogen-targeted therapies but is often not possible due to the limitations of current microbiologic tests in terms of sensitivity, speed, and spectrum of available assay targets (4). For instance, even with the best available clinical diagnostics, a contributory pathogen can be detected in only 38% of adults with community acquired pneumonia, due to the low sensitivity and time requirements of culture, and the limited number of microbes detectable by serologic and PCR assays (4, 5). In the absence of a definitive microbiologic diagnosis, clinicians may presume symptoms are due to a noninfectious inflammatory condition and initiate empiric corticosteroids, which can exacerbate an occult infection (6). Furthermore, even with negative microbiologic testing, providers often continue empiric antibiotics due to concerns of falsely negative results, a practice that drives emergence of antibiotic resistance and increases risk of Clostridium difficile infection (7). In the intensive care unit (ICU), LRTI diagnosis is particularly complex due to a high prevalence of noninfectious inflammatory conditions with overlapping clinical features (8) and a patient demographic that includes severely immunocompromised individuals who may exhibit atypical presentations of pulmonary infections. Advancements in genome sequencing hold promise for overcoming these diagnostic challenges by affording culture-independent assessment of microbial genomes from microliter volumes of clinical samples (9, 10). Recent work has highlighted the utility of metagenomic next-generation sequencing (mNGS) for rapid and actionable diagnosis of complicated infections (6, 11–13). While these results are encouraging, most mNGS computational pipelines have been developed for analysis of sterile fluids or cultured bacterial isolates and have limited capacity to identify pathogens amid the complex background of commensal microbiota present in respiratory specimens (13–15). Host transcriptional profiling from peripheral blood has emerged as a promising alternative to pathogen-based diagnostics that can distinguish viral from bacterial LRTIs as well as differentiate between patients with acute respiratory infections versus those with noninfectious illnesses (5, 16, 17). This approach, while highly promising, has not been well studied in ICU patients with respiratory failure or in severely immunocompromised subjects. Furthermore, host transcriptional profiling has not yet been coupled with simultaneous detection of pulmonary pathogens (5, 18), which could improve diagnostic accuracy and more precisely inform optimal antimicrobial treatment. mNGS can extend both host gene expression assays and current microbe-based diagnostics by simultaneously detecting pathogens, the airway microbiome, and transcriptional biomarkers of the host’s immune response. Here, we address the need for better LRTI diagnostics by developing an mNGS-based method that integrates host response and unbiased microbe detection. We then evaluate the performance of this approach in a prospective cohort of critically ill patients with acute respiratory failure.

Results

We prospectively enrolled 92 adults admitted to the ICU with acute respiratory failure and collected tracheal aspirate (TA) samples within 72 h of intubation (Table 1). Patients underwent testing with clinician-ordered standard of care microbiologic diagnostics at the University of California, San Francisco, Moffitt–Long Hospital, a tertiary-care referral center. Subjects with LRTI were identified by two-physician adjudication using US Centers for Disease Control/National Healthcare Safety Network (CDC/NHSN) surveillance case definitions and retrospective electronic medical record review, with blinding to mNGS results (Dataset S2A) (19). Using this approach, patients were assigned to one of four groups: (i) LRTI defined by both clinical and microbiologic criteria (LRTI+C+M, n = 26); (ii) no evidence of LRTI and a clear alternative explanation for acute respiratory failure (no-LRTI, n = 18); (iii) LRTI defined by clinical criteria alone with negative conventional microbiologic testing (LRTI+C, n = 34); and (iv) respiratory failure due to unclear cause, infectious or noninfectious (unk-LRTI, n = 14).
Table 1.

Demographics and clinical characteristics of study cohort

Cohort characteristicsCohort overallLRTI+C+Mno-LRTIP*
Patient characteristics
 Total enrolled922618
 Age, y6261630.80
 Female gender31 (34%)690.13
 Race
  African American55%)210.82
  Asian26 (28%)750.82
  Caucasian50 (46%)1590.82
  Other11 (12%)230.82
  Hispanic ethnicity8 (9%)310.88
Comorbidities and outcomes
 Bacteremia21 (23%)630.90
 Nonpulmonary infections29 (32%)940.58
 COPD12 (13%)300.37
 Diabetes mellitus6 (7%)130.36
 Congestive heart failure7 (8%)111.00
 Current smoker12 (13%)510.39
 Immune suppression41 (45%)1090.65
 Solid-organ transplantation13 (14%)150.07
 Prior antibiotic use84 (91%)22180.23
 Community acquired pneumonia42 (46%)18
 Hospital acquired pneumonia13 (14%)5
 Ventilator associated pneumonia3 (3%)3
 30-d mortality18 (20%)610.25
Clinical metrics
 Max temperature, °C37.838.138.00.33
 Max WBC count, 106 cells/μL14.313.812.80.58
 Max heart rate, bpm1101111070.50
 Max respiratory rate, breaths/min3635350.74
 SIRS criteria, mean3330.54
 APACHE III score, mean97101940.62
 Pneumonia severity index, mean1511481370.65

COPD, chronic obstructive pulmonary disease; LRTI+C+M, subjects who met both clinical and microbiologic criteria for LRTI; no-LRTI, subjects with a noninfectious etiology of acute respiratory failure; SIRS, systemic inflammatory response syndrome, defined as two or more abnormalities in white blood cell count (>12,000 or <4,000 cells per µL), temperature (>38 or <36 °C), heart rate (>90 beats per min), or respiratory rate (>20 breaths per min). APACHEIII score predicts mortality and disease severity for critically ill patients. Pneumonia severity index score estimates mortality for adult patients with community-acquired pneumonia (65). Percentage of total cohort is shown in parentheses.

P values for Patient characteristics and Comorbidities and outcomes, χ2 test; P values for Clinical metrics, Wilcoxon rank sum test. test.

The age range of the cohort was 21–85+.

Demographics and clinical characteristics of study cohort COPD, chronic obstructive pulmonary disease; LRTI+C+M, subjects who met both clinical and microbiologic criteria for LRTI; no-LRTI, subjects with a noninfectious etiology of acute respiratory failure; SIRS, systemic inflammatory response syndrome, defined as two or more abnormalities in white blood cell count (>12,000 or <4,000 cells per µL), temperature (>38 or <36 °C), heart rate (>90 beats per min), or respiratory rate (>20 breaths per min). APACHEIII score predicts mortality and disease severity for critically ill patients. Pneumonia severity index score estimates mortality for adult patients with community-acquired pneumonia (65). Percentage of total cohort is shown in parentheses. P values for Patient characteristics and Comorbidities and outcomes, χ2 test; P values for Clinical metrics, Wilcoxon rank sum test. test. The age range of the cohort was 21–85+. From extracted nucleic acid samples, we performed both metagenomic shotgun DNA sequencing (DNA-seq) as well as RNA sequencing (RNA-seq). We first developed computational algorithms to sift respiratory pathogens from background commensal flora in an effort to enhance detection of LRTI etiology. To differentiate patients with LRTI from those with noninfectious critical respiratory illnesses, we next developed metrics of LRTI probability based on pathogen, airway microbiome diversity, and host gene expression (Fig. 1). To assess assay performance, we focused on the most unambiguously LRTI-positive and -negative subjects (LRTI+C+M and no-LRTI) by randomly dividing them into independent derivation (n = 20, used for model training) and validation cohorts (n = 24, used for model testing). Each metric (pathogen, microbiome, and host) was evaluated independently and then in combination.
Fig. 1.

Study overview and analysis workflow. Patients with acute respiratory failure were enrolled within 72 h of ICU admission, and TA samples were collected and underwent both RNA sequencing (RNA-seq) and shotgun DNA sequencing (DNA-seq). Post hoc clinical adjudication blinded to mNGS results identified patients with LRTI defined by clinical and microbiologic criteria (LRTI+C+M); LRTI defined by clinical criteria only (LRTI+C); patients with noninfectious reasons for acute respiratory failure (no-LRTI); and respiratory failure due to unknown cause (unk-LRTI). The LRTI+C+M and no-LRTI groups were divided into derivation and validation cohorts. To detect pathogens and differentiate them from a background of commensal microbiota, we developed two models: a rules-based model (RBM) and a logistic regression model (LRM). LRTI probability was next evaluated with (i) a pathogen metric, (ii) a lung microbiome diversity metric, and (iii) a 12-gene host transcriptional classifier. Models were then combined and optimized for LRTI rule out.

Study overview and analysis workflow. Patients with acute respiratory failure were enrolled within 72 h of ICU admission, and TA samples were collected and underwent both RNA sequencing (RNA-seq) and shotgun DNA sequencing (DNA-seq). Post hoc clinical adjudication blinded to mNGS results identified patients with LRTI defined by clinical and microbiologic criteria (LRTI+C+M); LRTI defined by clinical criteria only (LRTI+C); patients with noninfectious reasons for acute respiratory failure (no-LRTI); and respiratory failure due to unknown cause (unk-LRTI). The LRTI+C+M and no-LRTI groups were divided into derivation and validation cohorts. To detect pathogens and differentiate them from a background of commensal microbiota, we developed two models: a rules-based model (RBM) and a logistic regression model (LRM). LRTI probability was next evaluated with (i) a pathogen metric, (ii) a lung microbiome diversity metric, and (iii) a 12-gene host transcriptional classifier. Models were then combined and optimized for LRTI rule out.

Pathogen Detection.

While many NGS platforms utilize only one nucleic acid type, we combined both RNA-seq and DNA-seq. This approach allowed for simultaneous host transcriptional profiling, permitted detection of RNA viruses, and enriched for actively transcribing microbes (versus latent or nonviable taxa). In addition, requiring concordant detection of microbes across both nucleic acid types reduced spurious alignments derived from reagent contaminants intrinsic to the library preparations of each nucleic acid type (20). From each TA sample, we generated a mean of 19.6 and 32.6 million paired-end sequencing reads, from DNA-seq and RNA-seq, respectively, of which the median fraction of microbial reads was 0.04% (interquartile range, 0.01–0.16%). Raw reads were analyzed using a rapid computational pipeline that aligns and classifies microbial taxa by nucleotide and peptide translation using the National Center for Biotechnology Information (NCBI) NT and NR databases, respectively (13, 20). RNA-seq yielded a greater abundance of sequences compared with DNA-seq for 78% of identified microbes, with a median of 2.2 times more reads per microbe. We and others have previously developed NGS methodologies for “sterile site” clinical fluids such as cerebrospinal fluid (13, 14, 21). The lung, however, is not a sterile environment and in fact harbors microbial communities during states of both health and disease (22–25). Asymptomatic carriage of potentially pathogenic organisms is common (26, 27), and only in a subset of cases do these microbes overtake airway microbial communities and precipitate LRTI (28). As such, distinguishing legitimate pathogens from commensal or colonizing microbiota is a central challenge for LRTI diagnostics and adds complexity to the interpretation of metagenomic sequencing data. To this point, while we detected all 38 pathogens identified from clinician-ordered microbiologic tests in the 26 LRTI+C+M patients using mNGS (Dataset S3A), a 10-fold greater number of airway commensals were also identified. The most prevalent microbes in the no-LRTI patient group included well-known commensal taxa (Dataset S4). Thus, to distinguish probable pathogens from airway commensals, we developed two complementary algorithms: (i) a rules-based model (RBM) optimized for detecting well-established respiratory pathogens, and (ii) a more flexible logistic regression model (LRM) that also permitted novel pathogen detection (Fig. 1). The goal of both models was to correctly identify pathogens amid abundant and heterogeneous populations of commensals. Microbes identified by clinician-ordered diagnostics plus all viruses with established respiratory pathogenicity in the LRTI+C+M group were categorized as pathogens (n = 12 in derivation cohort and n = 26 in validation cohort; Dataset S1). Any additional microbes identified by mNGS were considered commensals (n = 155 in derivation cohort; n = 174 in validation cohort). We accepted that this “practical” gold standard would provide an attenuated estimate of performance due to the sensitivity limitations of microbial culture in the setting of antibiotic preadministration (4). In the RBM, respiratory microbes from each patient were assigned an abundance score based on the sum of log(RNA-seq) and log(DNA-seq) genus reads per million reads mapped (rpm) (Dataset S3A). After ranking microbes by this abundance score, the greatest score difference between sequentially ranked microbes was identified and used to distinguish the group of highest-scoring microbes within each patient (Fig. 2 and ). These high-scoring microbes plus all RNA viruses detected at a conservative threshold of >0.1 rpm were indexed against an a priori developed table of established lower respiratory pathogens derived from landmark surveillance studies and clinical guidelines (Dataset S2B) and, if present, were identified as putative pathogens by the RBM (4, 29–31).
Fig. 2.

Workflow for distinguishing LRTI pathogens from commensal respiratory microbiota using an algorithmic approach. (A) Projection of microbial relative abundance in log reads per million reads sequenced (rpm) by RNA sequencing (RNA-seq) (x axis) versus DNA sequencing (DNA-seq) (y axis) for representative cases. In the LRTI+C+M group, pathogens identified by standard clinical microbiology (filled shapes) had higher overall relative abundance compared with other taxa detected by sequencing (open shapes). The largest score differential between ranked microbes (max Δrpm) was used as a threshold to identify high-scoring taxa, distinct from the other microbes based on abundance (line with arrows). Red indicates taxa represented in the reference list of established LRTI pathogens. (B) Receiver operating characteristic (ROC) curve demonstrating logistic regression model (LRM) performance for detecting pathogens versus commensal microbiota in both the derivation and validation cohorts. The gray ROC curve and shaded region indicate results from 1,000 rounds of training and testing on randomized sets the derivation cohort. The blue and green lines indicate predictions using leave-one-patient-out cross-validation (LOPO-CV) on the derivation and validation on the validation cohort, respectively. (C) Microbes predicted by the LRM to represent putative pathogens. The x axis represents combined RNA-seq and DNA-seq relative abundance, and the y axis indicates pathogen probability. The dashed line reflects the optimized probability threshold for pathogen assignment. Red filled circles: microbes predicted by LRM to represent putative LRTI pathogens that were also identified by conventional microbiologic tests. Blue filled circles: microbes predicted to represent putative LRTI pathogens by LRM only. Blue open circles: microbes identified by NGS but not predicted by the LRM to represent putative pathogens. Red open circles: microbes identified using NGS and by standard microbiologic testing but not predicted to be putative pathogens. Dark red outlined circles: microbes detected as part of a polymicrobial culture.

Workflow for distinguishing LRTI pathogens from commensal respiratory microbiota using an algorithmic approach. (A) Projection of microbial relative abundance in log reads per million reads sequenced (rpm) by RNA sequencing (RNA-seq) (x axis) versus DNA sequencing (DNA-seq) (y axis) for representative cases. In the LRTI+C+M group, pathogens identified by standard clinical microbiology (filled shapes) had higher overall relative abundance compared with other taxa detected by sequencing (open shapes). The largest score differential between ranked microbes (max Δrpm) was used as a threshold to identify high-scoring taxa, distinct from the other microbes based on abundance (line with arrows). Red indicates taxa represented in the reference list of established LRTI pathogens. (B) Receiver operating characteristic (ROC) curve demonstrating logistic regression model (LRM) performance for detecting pathogens versus commensal microbiota in both the derivation and validation cohorts. The gray ROC curve and shaded region indicate results from 1,000 rounds of training and testing on randomized sets the derivation cohort. The blue and green lines indicate predictions using leave-one-patient-out cross-validation (LOPO-CV) on the derivation and validation on the validation cohort, respectively. (C) Microbes predicted by the LRM to represent putative pathogens. The x axis represents combined RNA-seq and DNA-seq relative abundance, and the y axis indicates pathogen probability. The dashed line reflects the optimized probability threshold for pathogen assignment. Red filled circles: microbes predicted by LRM to represent putative LRTI pathogens that were also identified by conventional microbiologic tests. Blue filled circles: microbes predicted to represent putative LRTI pathogens by LRM only. Blue open circles: microbes identified by NGS but not predicted by the LRM to represent putative pathogens. Red open circles: microbes identified using NGS and by standard microbiologic testing but not predicted to be putative pathogens. Dark red outlined circles: microbes detected as part of a polymicrobial culture. The RBM achieved an accuracy for pathogen detection of 98.8% and 95.5% in the derivation and validation cohorts, respectively (Dataset S3A). In subjects whose respiratory cultures grew three or more different bacteria, mNGS was able to detect each of the species. In most cases, however, their abundance differed by several 100-fold, which confounded detection of the lower abundance taxa (Dataset S3A). Given the unclear significance of single species in such polymicrobial cases with respect to pathogenicity (32), we performed a secondary analysis in which only the most abundant microbe was considered a pathogen, and this approach yielded an accuracy of 98.4%. While the RBM performed well for identifying microbes with established pulmonary pathogenicity, we recognized the need to also detect novel or atypical species. We thus employed machine learning to distinguish respiratory pathogens from commensals using a LRM trained on microbes detected in the derivation cohort patients (n = 20) using the predictor variables of RNA-seq rpm, DNA-seq rpm, rank by RNA-seq rpm, established LRTI pathogen (yes/no), and virus (yes/no). These features were selected to preferentially favor highly abundant organisms with established pathogenicity in the lung, but still permit detection of uncommon taxa that could represent putative pathogens. To evaluate LRM performance in the derivation cohort, we performed leave-one-patient-out cross-validation, in which all microbes from a single patient were held out in each round of cross-validation. This yielded an AUC of 0.90 (95% CI, 0.76–0.99). A final model was trained on all microbes from derivation cohort patients, and this achieved an AUC of 0.91 (95% CI, 0.83–0.97) for pathogen identification in the validation cohort (Fig. 2 and Dataset S3 A and B). At an optimized probability threshold of 0.36 (), this translated to an accuracy of 96.4% and 95.5% in the derivation and validation cohorts, respectively. As with the RBM, LRM performance suffered in polymicrobial culture cases with species that differed by several magnitudes in abundance when assessed by mNGS. As such, when only the most abundant microbe identified by clinical microbiologic diagnostics per LRTI+C+M patient was considered as the etiologic pathogen, the AUC increased to 0.997 (95% CI, 0.99–1.00) in the validation cohort. Combining the RBM and LRM identified more putative pathogens than either model alone and revealed a potential LRTI etiology in 62% (n = 21) of the LRTI+C patients with clinically adjudicated LRTI but negative microbiologic testing (Fig. 3, , and Dataset S3A). Compared with clinician-ordered diagnostics, this permitted a microbiologic diagnosis in a greater number of LRTI-positive subjects (78% vs. 43%; P < 1.00 × 10−4 by McNemar’s test; Fig. 3). Putative new pathogens in a representative subset of the LRTI+C group patients (n = 11; 32%) were orthogonally confirmed by clinical multiplex respiratory virus PCR, influenza C PCR (33), or by 16S bacterial rRNA gene sequencing (Dataset S3A).
Fig. 3.

Distribution of respiratory pathogens identified in patients using clinician-ordered diagnostics versus mNGS. Number of subjects in whom each respiratory microbe was detected. All microbes detected by clinician-ordered diagnostics were detected by mNGS; however, pink bars indicate microbes misclassified as negative by either the RBM or LRM. Notably, all microbes identified by clinician-ordered diagnostics and misclassified by either the RBM or LRM (pink bars) were found in polymicrobial cultures, highlighting the presence of dominant pathogens by NGS that are not captured in the polymicrobial culture results. Red bars indicate microbes detected by clinician-ordered diagnostics and also predicted as pathogens by either the RBM or LRM. More detail on which model identified each microbe can be found in . Dark red bars (LRTI+C+M and LRTI+C subjects) and gray bars (no-LRTI subjects) indicate number of cases with microbes detected only by mNGS.

Distribution of respiratory pathogens identified in patients using clinician-ordered diagnostics versus mNGS. Number of subjects in whom each respiratory microbe was detected. All microbes detected by clinician-ordered diagnostics were detected by mNGS; however, pink bars indicate microbes misclassified as negative by either the RBM or LRM. Notably, all microbes identified by clinician-ordered diagnostics and misclassified by either the RBM or LRM (pink bars) were found in polymicrobial cultures, highlighting the presence of dominant pathogens by NGS that are not captured in the polymicrobial culture results. Red bars indicate microbes detected by clinician-ordered diagnostics and also predicted as pathogens by either the RBM or LRM. More detail on which model identified each microbe can be found in . Dark red bars (LRTI+C+M and LRTI+C subjects) and gray bars (no-LRTI subjects) indicate number of cases with microbes detected only by mNGS. Putative pathogens identified in the unk-LRTI group (n = 6, 42%) may have represented atypically presenting respiratory infections or incidental carriage in the respiratory tract ( and Dataset S3A). Microbes identified in the no-LRTI group (n = 3; 17%) were present at lower abundance compared with microbes in LRTI+C+M subjects (P < 0.01 by Wilcoxon rank sum), LRTI+C (P < 0.01), and unk-LRTI subjects (P = 0.02), and included contextual pathogens such as Streptococcus pneumoniae and Haemophilus influenzae that colonize the airways of 20–50% of healthy individuals (32, 34, 35). Together, these findings highlighted the reality of asymptotic carriage of potentially pathogenic species, emphasizing the need to contextualize microbial detection with respect to other key elements of an airway infection, in particular the airway microbiome and the host’s immune response (26, 36). We thus undertook further analytical development to predict LRTI status by calculating combined metrics based on pathogen, microbiome, and host transcriptional response.

LRTI Prediction Based on Pathogen.

We recognized that the highest per-patient LRM pathogen versus commensal probability value differed significantly between LRTI+C+M and no-LRTI subjects (P = 3.8 × 10−4 by Wilcoxon rank sum). As such, we hypothesized that this value might have utility not only for pathogen versus commensal prediction, but also for LRTI prediction in general. Testing this idea, we found that the maximum per patient LRM probability value predicted LRTI status with an AUC of 0.97 (95% CI, 0.90–1.00) in the derivation cohort and 0.96 (95% CI, 0.86–1.00) in the validation cohort ().

LRTI Prediction Based on Lung Microbiome Diversity.

Several studies have demonstrated reduced diversity of the airway microbiome in the setting of LRTI (20, 37–39). We measured intrapatient (α) diversity of airway genera using the Shannon diversity index (SDI) and found that LRTI+C+M subjects had significantly lower SDI compared with no-LRTI subjects when assessed by both RNA-seq (Fig. 4; P = 1.3 × 10−4) and DNA-seq (; P = 8.9 × 10−3) (Dataset S5). We next examined interpatient (β) diversity (40) using the Bray–Curtis Index (41) and found that this also differed between LRTI+C+M and no-LRTI subjects, with assessment by RNA-seq again yielding a more significant difference versus DNA-seq [P = 5 × 10−3 versus P = 9 × 10−3 by permutation analysis of variance (PERMANOVA), respectively; Fig. 4 and ]. We then tested whether diversity alone might predict LRTI and found that RNA-seq SDI differentiated LRTI+C+M from no-LRTI subjects with an AUC of 0.96 (95% CI, 0.89–1.00) in the derivation cohort and an AUC of 0.80 (95% CI, 0.63–0.96) in the validation cohort (Fig. 4). DNA-seq SDI did not perform as well, with AUCs of 0.84 (95% CI, 0.66–1.00) and 0.53 (95% CI, 0.25–0.80) in the derivation and validation cohorts, respectively (). These findings suggested that genus diversity assessed by RNA-seq was a useful, albeit imperfect, biomarker of LRTI.
Fig. 4.

Diversity of the transcriptionally active lung microbiome in patients with LRTI (LRTI+C+M) versus noninfectious respiratory illnesses (no-LRTI). (A) Box plots of Shannon diversity index (SDI) of the lung microbiome assessed by RNA-seq at the genus level (in the derivation cohort) differed between LRTI+C+M from no-LRTI groups. (B) The β diversity assessed by PERMANOVA on Bray–Curtis dissimilarity values in the derivation cohort differed between LRTI+C+M and no-LRTI groups. (C) ROC curve demonstrating performance of SDI to distinguish LRTI+C+M from no-LRTI groups.

Diversity of the transcriptionally active lung microbiome in patients with LRTI (LRTI+C+M) versus noninfectious respiratory illnesses (no-LRTI). (A) Box plots of Shannon diversity index (SDI) of the lung microbiome assessed by RNA-seq at the genus level (in the derivation cohort) differed between LRTI+C+M from no-LRTI groups. (B) The β diversity assessed by PERMANOVA on Bray–Curtis dissimilarity values in the derivation cohort differed between LRTI+C+M and no-LRTI groups. (C) ROC curve demonstrating performance of SDI to distinguish LRTI+C+M from no-LRTI groups.

LRTI Prediction Based on Host Response.

In the setting of critical illness, systemic inflammatory responses due to diverse physiologic processes can make true LRTI clinically indistinguishable from noninfectious respiratory failure or severe extrapulmonary infection. Consistent with this, we found that the systemic inflammatory response syndrome (SIRS) criteria (temperature, white blood cell count, heart rate, respiratory rate) had limited utility for LRTI detection despite being widely used for infection assessment (Dataset S1). We thus hypothesized that transcriptional profiling, which has emerged as a promising and accurate host-based approach for assessing infection, might provide diagnostic insight in settings when clinical rules are uninformative (5, 16, 42). As such, we examined differential gene expression between LRTI+C+M and no-LRTI subjects in the derivation cohort to define a host transcriptional signature of LRTI in patients with critical illness. Using a false-discovery rate (FDR) of <0.05, we identified a total of 882 differentially expressed genes, 414 of which were up-regulated in LRTI+C+M subjects ( and Dataset S6A). Gene set enrichment analysis (43) identified up-regulation of pathways related to innate immune responses, NF-κβ signaling, cytokine production, and the type I IFN response in LRTI+C+M subjects. In comparison, gene expression pathways in the no-LRTI group were enriched for oxidative stress responses and MHC class II receptor signaling (Dataset S6B). A subanalysis () evaluating differences between viral and bacterial infections in known LRTI+C+M patients identified four differentially expressed genes (RSAD2, OAS3, CXCL2, DUSP2). Genes up-regulated in viral cases (RSAD2, OAS3) were related to the type-1 IFN and antiviral responses, reflecting biologically relevant differences in host response indicative of pathogen type, despite a relatively limited sample size within a heterogeneous cohort and high proportion of immune-compromising conditions in the majority of patients with detected viruses. We next sought to construct an airway-specific host transcriptional classifier that could differentiate LRTI+C+M patients from no-LRTI subjects by employing machine learning (). Elastic net regularized regression in the derivation cohort identified a 12-gene classifier that was then used to score patients based on a weighted sum of scaled expression values (Fig. 5 and Dataset S7A). We found that predictive classifier genes up-regulated in LRTI+C+M patients compared with no-LRTI patients included NFAT-5, which plays a role in T-cell function and inducible gene transcription during immune responses (44); ZC3H11A, which encodes a zinc-finger protein involved in the regulation of cytokine production and immune cell activation (45); and PRRC2C, which functions in RNA binding and may play a role in hematopoietic progenitor cell differentiation in response to infection (46). Genes up-regulated in no-LRTI patients compared with LRTI+C+M patients included the following: CD36, which encodes a macrophage phagocytic receptor involved in scavenging dying/dead cells and oxidized lipids (47, 48); BLVRB, which is involved in oxidative stress responses (49); EDF1, which contributes to the regulation of nitric oxide release in endothelial cells (50); and ENG, an integral membrane glycoprotein receptor that may modulate inflammation and angiogenesis (51).
Fig. 5.

Host transcriptional profiling distinguishes patients with acute LRTI (LRTI+C+M) from those with noninfectious acute respiratory illness (no-LRTI). (A) Host classifier scores for all patients in the derivation and validation cohorts; each bar indicates a patient score and is colored as follows: LRTI+C+M, red; no-LRTI, blue. Orange dotted line indicates the host classifier threshold (score, −4) that achieved 100% sensitivity in the training set and was used to classify the test set samples. (B) Normalized expression levels, arranged by unsupervised hierarchical clustering, reflect overexpression (blue) or underexpression (turquoise) of classifier genes (rows) for each patient (columns). Twelve genes were identified as predictive in the derivation cohort and subsequently applied to predict LRTI status in the validation cohort. Column colors above the heatmap indicate whether a patient belonged to the derivation cohort (dark gray) or validation cohort (light gray) and whether they were adjudicated to have LRTI+C+M (red) or no-LRTI (blue). (C) ROC curves demonstrating host classifier performance for derivation (blue) and validation (green) cohorts.

Host transcriptional profiling distinguishes patients with acute LRTI (LRTI+C+M) from those with noninfectious acute respiratory illness (no-LRTI). (A) Host classifier scores for all patients in the derivation and validation cohorts; each bar indicates a patient score and is colored as follows: LRTI+C+M, red; no-LRTI, blue. Orange dotted line indicates the host classifier threshold (score, −4) that achieved 100% sensitivity in the training set and was used to classify the test set samples. (B) Normalized expression levels, arranged by unsupervised hierarchical clustering, reflect overexpression (blue) or underexpression (turquoise) of classifier genes (rows) for each patient (columns). Twelve genes were identified as predictive in the derivation cohort and subsequently applied to predict LRTI status in the validation cohort. Column colors above the heatmap indicate whether a patient belonged to the derivation cohort (dark gray) or validation cohort (light gray) and whether they were adjudicated to have LRTI+C+M (red) or no-LRTI (blue). (C) ROC curves demonstrating host classifier performance for derivation (blue) and validation (green) cohorts. Classifier performance assessed by leave-one-out cross-validation demonstrated an AUC of 0.90 (95% CI, 0.75–1.00) in the derivation cohort and an AUC of 0.88 (95% CI, 0.75–1.00) in the validation cohort (Fig. 5). Covariates for immune suppression, concurrent nonpulmonary infection, antibiotic use, age, and gender were iteratively incorporated into the regression model, but none was significant enough to be maintained when sparsity was added by elastic net (Dataset S7B). We tested whether differences in host gene expression could be attributed to enrichment of specific cell types using CIBERSORT (52) (Dataset S7C) and found that only M2 macrophages were enriched in the no-LRTI group (P = 0.03 by Wilcoxon rank sum). Finally, given our modest sample size, we tested the statistical power of our host classifier by computing learning curves (). We observed that even with subsampling, the 12 classifier genes were continually represented. While the derivation cohort sample size approached the limit required for robust performance assessment, the analysis suggested that additional patients might lead to further improvement (). A similar analysis for the pathogen versus commensal LRM indicated that performance metrics had converged with the given microbial sample size, indicating robust performance assessment and sufficient training data ().

Evaluation of a Combined LRTI Metric.

Given the relative success of each independent metric (pathogen, microbiome, and host) for discerning the presence of infection, we asked whether combining them could enhance LRTI detection. We recognized the potential of mNGS to empower a data-driven assessment of a patient’s LRTI status during the critical time frame following ICU admission. As such, we developed a readily interpretable compilation of host and pathogen mNGS metrics in a rule-out model designed to maximize LRTI diagnostic sensitivity. This process, which involved optimizing intrametric LRTI positivity thresholds in the derivation cohort and calling positivity based on either the host or pathogen scores (), achieved a sensitivity and specificity of 100% and 87.5%, respectively, in the validation cohort, equating to a negative predictive value of 100% (Fig. 6). Despite the limitations of a small cohort, we investigated the potential utility of the rule-out model for curbing broad-spectrum antibiotic overuse in the ICU by performing a theoretical calculation in the no-LRTI group to estimate the potential impact of mNGS result availability at 48-h postenrollment. This estimate suggested that a significant reduction in unnecessary empiric antibiotic use could have been possible (78 versus 50 d of therapy; P = 0.03; ).
Fig. 6.

Combined LRTI prediction metric integrating pathogen detection and host gene expression. (A) Scores per patient for each of the two components of this LRTI rule-out model are projected into a scatterplot (x axis represents the host metric; y axis represents the microbe score). The thresholds optimized for sensitivity in the derivation cohort are indicated in gray dashed line. Each point represents one patient—those that were in the derivation cohort have no fill, and those that were in the validation cohort are filled. Red indicates LRTI+C+M, and blue indicates no-LRTI subjects. (B) LRTI rule-out model results for each patient are shown for both the derivation and validation cohorts, with study subjects shown in rows and metrics in columns. Dark gray indicates a metric exceeded the optimized LRTI threshold; light gray indicates it did not. Dark red indicates the subject was positive for both pathogen-plus-host metrics, and thus was classified as having LRTI. White indicates missing data.

Combined LRTI prediction metric integrating pathogen detection and host gene expression. (A) Scores per patient for each of the two components of this LRTI rule-out model are projected into a scatterplot (x axis represents the host metric; y axis represents the microbe score). The thresholds optimized for sensitivity in the derivation cohort are indicated in gray dashed line. Each point represents one patient—those that were in the derivation cohort have no fill, and those that were in the validation cohort are filled. Red indicates LRTI+C+M, and blue indicates no-LRTI subjects. (B) LRTI rule-out model results for each patient are shown for both the derivation and validation cohorts, with study subjects shown in rows and metrics in columns. Dark gray indicates a metric exceeded the optimized LRTI threshold; light gray indicates it did not. Dark red indicates the subject was positive for both pathogen-plus-host metrics, and thus was classified as having LRTI. White indicates missing data.

Discussion

Of all infectious disease categories, LRTIs impart the greatest mortality both worldwide and in the United States (1). Contributing to this is the rising rate of treatment failure due to antibiotic resistance (53) and the limited performance of existing diagnostics for identifying respiratory pathogens (4, 54). In this prospective cohort study, we describe the use of unbiased mNGS for respiratory infectious disease diagnosis in the ICU. We develop methods that advance pathogen-based genomic diagnostics as well as existing host transcriptional classifier platforms by simultaneously assessing respiratory pathogens, the airway microbiome, and the host transcriptome in a single test to predict LRTI and identify disease etiology. We find that host/pathogen mNGS accurately detects LRTI in patients with acute respiratory failure and can provide a microbiologic diagnosis in cases due to unknown etiology. Host transcriptional profiling has gained attention as a promising approach to LRTI diagnosis (5, 16) but is understudied in critically ill and immunocompromised patients, who may be the most likely to benefit from this technology. We addressed this gap by interrogating airway gene expression in a critically ill cohort with 45% immunocompromised patients to develop an accurate host transcriptional classifier. Unlike existing classifiers, host–microbe mNGS offers the advantage of simultaneous species-level microbial identification. The role of commensal lung microbiota in health and disease is an area of active investigation. We corroborated prior findings demonstrating microbiome differences between subjects with respiratory infections and those with noninfectious airway disease (20, 37). More specifically, we found that LRTI was associated with reduced intrapatient α diversity of the airway microbiome and that, collectively, patients with LRTI differed significantly from those without in terms of β diversity and microbial sequence abundance. This diversity difference was more pronounced when assessed by RNA-seq, potentially due to inclusion of RNA viruses and transcripts from actively replicating pathogens in infected patients. As a biomarker, RNA-seq SDI had moderate utility for predicting LRTI; however, it did not enhance performance in combination with the other metrics, perhaps due to negative correlation with microbe score (r = −0.84 in the derivation cohort). Discriminating respiratory pathogens from background commensal microbiota is a key challenge for LRTI diagnostics and is particularly relevant for sensitive molecular assays (55). We directly addressed this by developing two complementary algorithms (RBM and LRM) that parsed putative pathogens from airway commensals. When combined, these models enabled a microbiologic diagnosis in significantly more patients with LRTI compared with clinician-ordered diagnostics. The fact that the a priori selected model features successfully differentiated pathogens from commensals validated the underlying model assumptions related to pathogen dominance resulting in disruption of α diversity. Notably, both models also proved useful despite widespread antibiotic use before airway sampling (90% of subjects), a practice that occurs commonly and that can sterilize microbial cultures (56). The capacity for mNGS to detect pathogens unidentifiable by standard clinical diagnostics was highlighted in several cases, including that of subject 254, who developed rapidly worsening respiratory failure and fever during a prolonged postsurgical admission. He was treated empirically for hospital acquired pneumonia with linezolid, aztreonam, and metronidazole. Lower respiratory cultures returned negative, but mNGS identified influenza C, which is not available on most clinical multiplex viral PCR assays. Notably, 12% of subjects were found to have undetected and potentially transmissible respiratory viruses despite strict precautionary respiratory contact policies at the study site, a finding that suggests the potential value of mNGS for hospital infection control. Several cases also highlighted the potential for mNGS to enhance antibiotic stewardship, and we estimated that theoretical implementation of the rule-out model within 48 h could have reduced antibiotic days of therapy by 36% in the no-LRTI validation cohort patients. Since at the time of ICU admission it is often difficult to distinguish infectious from noninfectious acute respiratory disease, a theoretical workflow for host/microbe mNGS could involve first employing the rule-out model to assess LRTI probability and complement clinical decision making regarding discontinuation of empiric antimicrobials. In cases where LRTI was ultimately suspected, a microbiologic diagnosis could then be obtained using a combination of the RBM and LRM to accurately screen for both well-established and uncommon respiratory pathogens. A principal advantage of mNGS is that all potential infectious agents can be simultaneously assessed, which avoids the need for ordering multiple individual tests for each different pathogen of concern. Future studies in a larger validation cohort can help optimize host and microbe LRTI rule-out thresholds and further assess test performance before deployment in a clinical setting. Some limitations of host/microbe mNGS were apparent and included false-positive detection of pathobionts such as H. influenzae and S. pneumoniae in the no-LRTI group, and false positivity of the host-response metric in subjects including patient 349, who was diagnosed with α-1 antitrypsin deficiency-associated pulmonary disease. The relatively small sample size of our derivation and validation cohorts increased the potential for data overfitting and was a limitation of our study. Learning curve estimates, however, indicated that the sample size was optimal for pathogen versus commensal prediction, and adequate for the host classifier, consistent with the estimate from an established sample size prediction tool for high-dimensional classifiers (57) (). Nonetheless, a larger cohort will be necessary to improve the robustness of model performance estimates and better assess synergy resulting from combining host and microbial metrics. Strengths of this study include an innovative bioinformatics approach, detailed patient phenotyping, and a study population reflective of the true heterogeneity of ICU patients, including severely immunocompromised subjects and patients receiving broad-spectrum antibiotics. Future studies in a larger cohort can further validate these findings, strengthen the utility of these models, and assess the impact of mNGS on clinical outcomes. In summary, we report a multifaceted approach to LRTI diagnosis that integrates three central elements of airway infections: the pathogen, airway microbiome, and host’s response.

Methods

Study Design and Subjects.

This prospective observational study evaluated adults with acute respiratory failure requiring mechanical ventilation who were admitted to the University of California, San Francisco (UCSF) Moffitt–Long Hospital ICUs. Subjects were enrolled sequentially between July 25, 2013, and October 17, 2017, within the first 72 h of intubation for respiratory failure. The UCSF Institutional Review Board approved an initial waiver consent for obtaining excess respiratory fluid, blood, and urine samples, and informed consent was subsequently obtained from patients or their surrogates for continued study participation according to CHR protocol 10-02701. For patients whose surrogates provided informed consent, follow-up consent was then obtained if patients survived their acute illness and regained the ability to consent. For subjects who died before consent being obtained, a full waiver of consent was approved. For all surviving subjects, if consent was not eventually obtained from either patient or surrogate, all specimens were discarded.

Clinical Microbiologic Testing.

During the period of study enrollment, subjects received standard of care microbiologic testing ordered by the treating clinicians. Respiratory testing from TA, bronchial alveolar lavage (BAL), or mini-BAL included the following: bacterial and fungal stains and semiquantitative cultures (n = 90); AFB stains and cultures (n = 8); 12-target clinical multiplex PCR (Luminex) for influenza A/B, respiratory syncytial virus (RSV), human metapneumovirus (HMPV), human rhinovirus (HRV), adenovirus (ADV), and parainfluenza viruses (PIV) 1–4 (n = 23); Legionella culture (n = 1); Legionella pneumophila PCR (n = 4); cytomegalovirus (CMV) culture (n = 4); and cytology for Pneumocystis jiroveccii (n = 4). Other microbiologic testing included blood culture (n = 89); urine culture (n = 87); serum cryptococcal antigen (n = 4); serum galactomannan (n = 1); and serum β-d-glucan (n = 1).

Definitions and Clinical Adjudication of LRTI.

Because admission diagnoses made by treating clinicians at the time of study enrollment were by necessity based on incomplete clinical, microbiologic, and treatment outcome information, a post hoc adjudication approach was carried out to enhance accuracy of LRTI diagnosis. For this, two attending physicians [one from infectious disease (C.L.) and one from pulmonary medicine (F.M.)] blinded to mNGS results, retrospectively reviewed each patient’s medical record following hospital discharge or death to determine whether they met the CDC/NHSN surveillance definition of pneumonia, with respect to clinical and/or microbiologic criteria (Dataset S1) (19). Chart review consisted of in-depth analysis of complete patient histories, including laboratory and radiographic results, inpatient notes, and postdischarge clinic notes. Using this approach, subjects were assigned to one of four groups, consistent with a recently described approach (16): (i) LRTI defined by both clinical and laboratory criteria; (ii) no evidence of respiratory infection and with a clear alternative explanation for respiratory failure (no-LRTI); (iii) LRTI defined by clinical criteria only (LRTI+C); and (iv) unknown, LRTI possible (unk-LRT). A determination of noninfectious etiology was made only if an alternative diagnosis could be established and results of standard clinical microbiological testing for LRTI were negative.

Host/Microbe mNGS.

Excess TA was collected on ice, mixed 1:1 with DNA/RNA Shield (Zymo), and frozen at −80 °C. RNA and DNA were extracted from 300 µL of patient TA using bead-based lysis and the Allprep DNA/RNA kit (Qiagen). RNA was reverse transcribed to generate cDNA and used to construct sequencing libraries using the NEBNext Ultra II Library Prep Kit (New England Biolabs). DNA underwent adapter addition and barcoding using the Nextera library preparation kit (Illumina) as previously described (20). Depletion of abundant sequences by hybridization (DASH) was employed to selectively deplete human mitochondrial cDNA, thus enriching for both microbial and human protein coding transcripts (58). The final RNA-seq and DNA-seq libraries underwent 125-nt paired-end Illumina sequencing on a HiSeq 4000.

Pathogen Detection Bioinformatics.

Detection of host transcripts and airway microbes leveraged a custom bioinformatics pipeline (20) that incorporated quality filtering using PRICESeqfilter (23) and alignment against the human genome (NCBI GRC h38) using the STAR (59) aligner to extract genecounts. To capture respiratory pathogens, additional filtering to remove Pan troglodytes (UCSC PanTro4) was performed using STAR and removal of nonfungal eukaryotes, cloning vectors, and phiX phage was performed using Bowtie2 (60). The identities of the remaining microbial reads were determined by querying the NCBI nucleotide (NT) and nonredundant protein (NR) databases using GSNAP-L and RAPSEARCH2, respectively. Microbial alignments detected by RNA-seq and DNA-seq were aggregated to the genus-level and independently evaluated to determine genus α diversity as described below. The sequencing reads comprising each genus were then evaluated for taxonomic assignment at the species level based on species relative abundance as previously described (20). For each patient, the top 15 most abundant taxa by RNA rpm were identified and evaluated under the requirement that all bacteria, fungi, and DNA viruses had concordant detection of their genomes by DNA-seq and concordant alignments in NR and NT. RNA viruses did not require concordant DNA-seq reads (Fig. 2 and Dataset S3A). To differentiate putative pathogens from commensal microbiota, we developed RBM and LRM methods and benchmarked each on sequencing data from LRTI+C+M and no-LRTI subjects.

Statistical Analysis.

Statistical significance was defined as P less than 0.05, using two-tailed tests of hypotheses. Categorical data were analyzed by χ2 test and nonparametric continuous variables were analyzed by Wilcoxon rank sum. For statistical validation in the pathogen versus commensal and LRTI prediction metrics, 10 LRTI+C+M and 10 no-LRTI cases were randomly assigned to create a derivation cohort. Model performance was assessed in an independent validation cohort consisting of 16 LRTI+C+M and 8 no-LRTI cases.

Pathogen Versus Commensal Models.

We found that all clinically confirmed LRTI pathogens were present within the top 15 most abundant microbes by RNA-seq rpm, which on average represented 99% of reads across all samples. We thus limited analysis to the 15 most abundant NGS-detected genera in each sample. For both models, microbes identified using clinician-ordered diagnostics and all viruses with established respiratory pathogenicity in the derivation cohort subjects were considered “pathogens.” Any additional microbes identified by mNGS in these subjects were considered “commensals”. This equated to 12 “pathogens” and 155 “commensals” in the 20 derivation cohort patients, and 26 “pathogens” and 174 “commensals” in the 24 validation cohort patients.

RBM.

This model leveraged previous findings demonstrating that microbial communities in patients with LRTI are characterized by one or more dominant pathogens present in high abundance (20, 39). Using either RNA-seq rpm alone (RNA-viruses) or the combination of RNA-seq and DNA-seq rpm (all others), this model identified the subset of microbes with the greatest relative abundance in each sample, which consisted of single microbes in cases of a dominant pathogen and also identified coinfections where several microbes were present within a similar range. All viruses detected by RNA-seq at >0.1 rpm and present within the a priori-developed reference index of established respiratory pathogens were considered putative pathogens in the model. The remaining taxa (bacteria, fungi, and DNA viruses) were then aggregated at the genus level, assigned an abundance score based on [log(RNA-seq rpm) + log(DNA-seq rpm)], and sorted in descending order by this score. The greatest change in abundance score between sequentially ranked microbes was identified, and all genera with an abundance score greater than this threshold were then evaluated at the species level, by identifying the most abundant species within each genus. If the species was present within the a priori-developed reference index of established respiratory pathogens, it was selected as a putative pathogen by the model (Fig. 2).

LRM.

This model employed the Python (version 3.6.1) sklearn (version 0.18.1) package to train on distinguishing between “pathogen” and “commensals” using the following five input features: log(RNA-seq rpm), log(DNA-seq rpm), per-patient RNA-seq abundance rank, and two binary variables indicating whether the microbe could be identified in the established index of respiratory pathogens or was a virus. These features were selected in alignment with the observation that the pathogens identified in the LRTI+C+M group were more abundant and within the top-ranked microbes. Moreover, the individual features were significantly different between the pathogens and commensals: (RNA-seq rpm, P = 2.44 × 10−4; DNA-seq rpm, P = 3.55 × 10−3; scoring rank, P = 3.51 × 10−6). Model performance was estimated in the derivation and validation cohorts and learning curves were computed (). For identification of etiologic pathogens reported (Fig. 3 and Datasets S2 and S3A) the threshold of 0.36 was used for consistency between the LRM for pathogen identification and LRTI detection. Outside of identifying putative LRTI pathogens, we evaluated whether LRM microbial score alone could be used to classify subjects as LRTI positive or LRTI negative. To do so, we used the top LRM-derived pathogen probability score per patient and evaluated the performance of this value alone to predict likelihood of infection in the LRTI+C+M versus no-LRTI subjects.

Lung Microbiome Diversity Analysis.

The α diversity of the respiratory microbiome for each subject was assessed by SDI and Simpson diversity index at the genus level using NT rpm and the Vegan (version 2.4.4) (61) package in R (version 3.4.0) (62). Richness (total number of genera) and genus-specific library sequence abundance (total number of microbial reads normalized per million reads sequenced) were also evaluated. Viral, bacterial, and fungal microbes were included in all diversity analyses, computed independently for RNA- and DNA-seq samples without requiring that taxa be concordant on both nucleic acids. Diversity values were then compared between patients with clinically adjudicated LRTI (LRTI+C+M) and those with respiratory failure due to noninfectious causes (no-LRTI) using the nonparametric Wilcoxon rank sum test. Evaluation of α diversity for prediction LRTI status was performed using the SDI value. The β diversity was evaluated using the Bray–Curtis dissimilarity metric calculated at the genus level using NT rpm and the Vegan package in R. Statistical significance of the β diversity between LRTI+C+M and no-LRTI patients was assessed using PERMANOVA (999 permutations), and the results were visualized using nonmetric multidimensional scaling.

Host Gene Expression Analysis.

Following quality filtration with PRICESeqfilter (63), RNA transcripts were aligned to the ENSEMBL CRCh38 human genome build using STAR. Subsequently, genes were filtered to include only protein-coding genes that were expressed in at least 50% of patients. All samples used for host transcriptome analysis (both derivation and validation sets) ultimately included more than 95,000 protein-coding genes with an average of 734,844 transcripts per patient.

Differential Expression Analysis.

Gene count data were analyzed using the Bioconductor package DESeq2 (version 1.16.1) (64) in R statistical programming environment. To avoid batch-related confounding and class imbalance, we limited our differential expression analysis to the derivation cohort of 10 LRTI+C+M and 10 no-LRTI samples, sequenced in the same batch. Differentially expressed genes with FDR <0.05 were used as input to ToppGene (43) to evaluate for functional pathway enrichment.

Host Gene Expression Classifier for LRTI Prediction.

The derivation cohort was independently normalized using DESeq2 and log-transformed. The values for each gene in the derivation cohort were then scaled and centered by z score. A classifier was built using the elastic net regularized regression model implementation from the glmnet package (version 2.0.13) in the R Statistical Programming Language (version 3.4.0). Regularization parameter α = 0.5 was selected using leave-one-out cross-validation and optimizing for AUC. To account for heterogeneity in the cohort, the model included covariates of concurrent bloodstream infection, immunosuppression, and gender. No significant difference was seen in these parameters between LRTI+C+M and no-LRTI (Dataset S7B). These covariates were reduced to zero in the model-fitting stage. Genes with nonzero weights were used for classification. To obtain a single-value score for each patient, genes selected by the elastic net were evaluated for their correlation with each of the two groups. Genes for which the mean expression was greater in the LRTI+C+M were assigned a weight of 1, and those with mean expression greater in no-LRTI were assigned a weight of −1. The normalized, scaled, expression values for each patient were multiplied by the weight vector and summed across all genes. The total sum was used as a representative score, and the AUC was calculated. Given the importance of sensitivity in the context of diagnostics, the threshold selected for analysis of the test cohort and combined metrics (scores, −4) was chosen as the threshold which provided 100% sensitivity in the derivation cohort. The host gene expression classifier was then validated on the validation set, and learning curves were used to estimate the reliability of the performance metrics ().

Classifier Combination.

To generate a readily interpretable compilation of host and microbial mNGS metrics that could enable a data-driven assessment of LRTI, the rule-out model was developed. In the rule-out model, we identified score thresholds from the pathogen and host metrics required to achieve 100% sensitivity in the derivation cohort (pathogen > 0.36, and host > −4) and applied these to the validation cohort to predict LRTI using the following combinatorial rule: LRTI = (Host) positive OR (Microbe) positive.

Identification and Mitigation of Environmental Contaminants.

To minimize inaccurate taxonomic assignments due to environmental contaminants, we processed negative water controls with each group of samples that underwent nucleic acid extraction, and included these, as well as positive control clinical samples, with each sequencing run. We directly subtracted alignments to those taxa in water control samples detected by both RNA-seq and DNA-seq analyses (Dataset S8) from the raw rpm values in all samples. To account for selective amplification bias of contaminants in water controls resulting from PCR amplification of metagenomic libraries to a fixed standard concentration across all samples, before direct subtraction we scaled taxa rpms in the water controls to the median percent microbial reads present across all samples (0.04%). In addition, we confirmed reproducibility of results by sequencing 10% of samples in triplicate and evaluated discrepancies between mNGS and standard diagnostics in a random subset of LRTI+C patients using clinically validated 16S bacterial rRNA gene sequencing and/or viral PCR testing, as described above.

Data Availability.

Raw microbial sequences are available via SRA BioProject accession ID SRP139967. Host transcript counts are tabulated in (, Dataset S9). Scripts for the classification algorithms are available on GitHub at: https://github.com/DeRisi-Lab/Host-MicrobeLRTI .
  59 in total

1.  New Sepsis Definition (Sepsis-3) and Community-acquired Pneumonia Mortality. A Validation and Clinical Decision-Making Study.

Authors:  Otavio T Ranzani; Elena Prina; Rosario Menéndez; Adrian Ceccato; Catia Cilloniz; Raul Méndez; Albert Gabarrus; Enric Barbeta; Gianluigi Li Bassi; Miquel Ferrer; Antoni Torres
Journal:  Am J Respir Crit Care Med       Date:  2017-11-15       Impact factor: 21.405

2.  Community-Acquired Pneumonia Requiring Hospitalization among U.S. Adults.

Authors:  Seema Jain; Wesley H Self; Richard G Wunderink; Sherene Fakhran; Robert Balk; Anna M Bramley; Carrie Reed; Carlos G Grijalva; Evan J Anderson; D Mark Courtney; James D Chappell; Chao Qi; Eric M Hart; Frank Carroll; Christopher Trabue; Helen K Donnelly; Derek J Williams; Yuwei Zhu; Sandra R Arnold; Krow Ampofo; Grant W Waterer; Min Levine; Stephen Lindstrom; Jonas M Winchell; Jacqueline M Katz; Dean Erdman; Eileen Schneider; Lauri A Hicks; Jonathan A McCullers; Andrew T Pavia; Kathryn M Edwards; Lyn Finelli
Journal:  N Engl J Med       Date:  2015-07-14       Impact factor: 91.245

3.  The human oral microbiome.

Authors:  Floyd E Dewhirst; Tuste Chen; Jacques Izard; Bruce J Paster; Anne C R Tanner; Wen-Han Yu; Abirami Lakshmanan; William G Wade
Journal:  J Bacteriol       Date:  2010-07-23       Impact factor: 3.490

4.  EDF-1 contributes to the regulation of nitric oxide release in VEGF-treated human endothelial cells.

Authors:  Marzia Leidi; Massimo Mariotti; Jeanette A M Maier
Journal:  Eur J Cell Biol       Date:  2010-09       Impact factor: 4.492

5.  Enrichment of the lung microbiome with oral taxa is associated with lung inflammation of a Th17 phenotype.

Authors:  Leopoldo N Segal; Jose C Clemente; Jun-Chieh J Tsay; Sergei B Koralov; Brian C Keller; Benjamin G Wu; Yonghua Li; Nan Shen; Elodie Ghedin; Alison Morris; Phillip Diaz; Laurence Huang; William R Wikoff; Carles Ubeda; Alejandro Artacho; William N Rom; Daniel H Sterman; Ronald G Collman; Martin J Blaser; Michael D Weiden
Journal:  Nat Microbiol       Date:  2016-04-04       Impact factor: 17.745

6.  Multistate point-prevalence survey of health care-associated infections.

Authors:  Shelley S Magill; Jonathan R Edwards; Wendy Bamberg; Zintars G Beldavs; Ghinwa Dumyati; Marion A Kainer; Ruth Lynfield; Meghan Maloney; Laura McAllister-Hollod; Joelle Nadle; Susan M Ray; Deborah L Thompson; Lucy E Wilson; Scott K Fridkin
Journal:  N Engl J Med       Date:  2014-03-27       Impact factor: 91.245

7.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

8.  Bacterial Topography of the Healthy Human Lower Respiratory Tract.

Authors:  Robert P Dickson; John R Erb-Downward; Christine M Freeman; Lisa McCloskey; Nicole R Falkowski; Gary B Huffnagle; Jeffrey L Curtis
Journal:  mBio       Date:  2017-02-14       Impact factor: 7.867

9.  CD36 ligands promote sterile inflammation through assembly of a Toll-like receptor 4 and 6 heterodimer.

Authors:  Cameron R Stewart; Lynda M Stuart; Kim Wilkinson; Janine M van Gils; Jiusheng Deng; Annett Halle; Katey J Rayner; Laurent Boyer; Ruiqin Zhong; William A Frazier; Adam Lacy-Hulbert; Joseph El Khoury; Douglas T Golenbock; Kathryn J Moore
Journal:  Nat Immunol       Date:  2009-12-27       Impact factor: 25.606

10.  Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

Authors:  W Gu; E D Crawford; B D O'Donovan; M R Wilson; E D Chow; H Retallack; J L DeRisi
Journal:  Genome Biol       Date:  2016-03-04       Impact factor: 13.583

View more
  92 in total

1.  Rethinking pneumonia: A paradigm shift with practical utility.

Authors:  Rishi Chanderraj; Robert P Dickson
Journal:  Proc Natl Acad Sci U S A       Date:  2018-12-07       Impact factor: 11.205

Review 2.  From the Pipeline to the Bedside: Advances and Challenges in Clinical Metagenomics.

Authors:  Augusto Dulanto Chiang; John P Dekker
Journal:  J Infect Dis       Date:  2020-03-28       Impact factor: 5.226

Review 3.  Molecular Diagnostic Advances in Transplant Infectious Diseases.

Authors:  Brittany A Young; Kimberly E Hanson; Carlos A Gomez
Journal:  Curr Infect Dis Rep       Date:  2019-11-26       Impact factor: 3.725

4.  Pulmonary Microbiome of Patients Receiving Mechanical Ventilation: Changes Over Time.

Authors:  Mary Lou Sole; Shibu Yooseph; Steven Talbert; Bassam Abomoelak; Chirajyoti Deb; Kimberly Paige Rathbun; Daleen Penoyer; Aurea Middleton; Devendra Mehta
Journal:  Am J Crit Care       Date:  2021-03-01       Impact factor: 2.228

5.  Dose-Dependent Pulmonary Toxicity of Aerosolized Vitamin E Acetate.

Authors:  Shotaro Matsumoto; Xiaohui Fang; Maret G Traber; Kirk D Jones; Charles Langelier; Paula Hayakawa Serpa; Carolyn S Calfee; Michael A Matthay; Jeffrey E Gotts
Journal:  Am J Respir Cell Mol Biol       Date:  2020-12       Impact factor: 6.914

6.  Detection of Pneumonia Pathogens from Plasma Cell-Free DNA.

Authors:  Charles Langelier; Monica Fung; Saharai Caldera; Thomas Deiss; Amy Lyden; Brian C Prince; Paula Hayakawa Serpa; Farzad Moazed; Peter Chin-Hong; Joseph L DeRisi; Carolyn S Calfee
Journal:  Am J Respir Crit Care Med       Date:  2020-02-15       Impact factor: 21.405

7.  IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring.

Authors:  Katrina L Kalantar; Tiago Carvalho; Charles F A de Bourcy; Boris Dimitrov; Greg Dingle; Rebecca Egger; Julie Han; Olivia B Holmes; Yun-Fang Juan; Ryan King; Andrey Kislyuk; Michael F Lin; Maria Mariano; Todd Morse; Lucia V Reynoso; David Rissato Cruz; Jonathan Sheu; Jennifer Tang; James Wang; Mark A Zhang; Emily Zhong; Vida Ahyong; Sreyngim Lay; Sophana Chea; Jennifer A Bohl; Jessica E Manning; Cristina M Tato; Joseph L DeRisi
Journal:  Gigascience       Date:  2020-10-15       Impact factor: 6.524

8.  The pulmonary metatranscriptome prior to pediatric HCT identifies post-HCT lung injury.

Authors:  Matt S Zinter; Caroline A Lindemans; Birgitta A Versluys; Madeline Y Mayday; Sara Sunshine; Gustavo Reyes; Marina Sirota; Anil Sapru; Michael A Matthay; Sandhya Kharbanda; Christopher C Dvorak; Jaap J Boelens; Joseph L DeRisi
Journal:  Blood       Date:  2021-03-25       Impact factor: 22.113

9.  A cell-free DNA metagenomic sequencing assay that integrates the host injury response to infection.

Authors:  Alexandre Pellan Cheng; Philip Burnham; John Richard Lee; Matthew Pellan Cheng; Manikkam Suthanthiran; Darshana Dadhania; Iwijn De Vlaminck
Journal:  Proc Natl Acad Sci U S A       Date:  2019-08-26       Impact factor: 11.205

10.  Chronic lung allograft dysfunction small airways reveal a lymphocytic inflammation gene signature.

Authors:  Daniel T Dugger; Monica Fung; Steven R Hays; Jonathan P Singer; Mary E Kleinhenz; Lorriana E Leard; Jeffrey A Golden; Rupal J Shah; Joyce S Lee; Fred Deiter; Nancy Y Greenland; Kirk D Jones; Chaz R Langelier; John R Greenland
Journal:  Am J Transplant       Date:  2020-09-22       Impact factor: 8.086

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.