Mylarappa Ningappa1, Syed A Rahman2, Brandon W Higgs1, Chethan S Ashokkumar1, Nidhi Sahni3, Rakesh Sindhi4, Jishnu Das5. 1. Department of Surgery and Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA. 2. Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. 3. Department of Epigenetics, The University of Texas MD Anderson Cancer Center, Smithville, TX, USA; Department of Molecular Carcinogenesis and Bioinformatics, The University of Texas MD Anderson Cancer Center, Smithville, TX, USA; Department of Computational Biology, The University of Texas MD Anderson Cancer Center, Smithville, TX, USA. 4. Department of Surgery and Children's Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA. Electronic address: rakesh.sindhi@chp.edu. 5. Center for Systems Immunology, Departments of Immunology and Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. Electronic address: jishnu@pitt.edu.
Abstract
Selecting the right immunosuppressant to ensure rejection-free outcomes poses unique challenges in pediatric liver transplant (LT) recipients. A molecular predictor can comprehensively address these challenges. Currently, there are no well-validated blood-based biomarkers for pediatric LT recipients before or after LT. Here, we discover and validate separate pre- and post-LT transcriptomic signatures of rejection. Using an integrative machine learning approach, we combine transcriptomics data with the reference high-quality human protein interactome to identify network module signatures, which underlie rejection. Unlike gene signatures, our approach is inherently multivariate and more robust to replication and captures the structure of the underlying network, encapsulating additive effects. We also identify, in an individual-specific manner, signatures that can be targeted by current anti-rejection drugs and other drugs that can be repurposed. Our approach can enable personalized adjustment of drug regimens for the dominant targetable pathways before and after LT in children.
Selecting the right immunosuppressant to ensure rejection-free outcomes poses unique challenges in pediatric liver transplant (LT) recipients. A molecular predictor can comprehensively address these challenges. Currently, there are no well-validated blood-based biomarkers for pediatric LT recipients before or after LT. Here, we discover and validate separate pre- and post-LT transcriptomic signatures of rejection. Using an integrative machine learning approach, we combine transcriptomics data with the reference high-quality human protein interactome to identify network module signatures, which underlie rejection. Unlike gene signatures, our approach is inherently multivariate and more robust to replication and captures the structure of the underlying network, encapsulating additive effects. We also identify, in an individual-specific manner, signatures that can be targeted by current anti-rejection drugs and other drugs that can be repurposed. Our approach can enable personalized adjustment of drug regimens for the dominant targetable pathways before and after LT in children.
Ensuring lifelong graft survival is challenging for pediatric liver transplant (LT) recipients., Selection of anti-rejection drugs from a limited list of options is an empiric process. For example, these pediatric transplant (Tx) recipients receive tacrolimus, a calcineurin inhibitor that predominantly suppresses cytokine production by T cells but can cause hypertension, renal failure, diabetes, and post-Tx lymphoma. Steroids can cause weight gain, diabetes, and hypertension. Acute cellular rejection associated with cytotoxic T cells occurs most often in the first 90 days after LT and requires higher-dose steroids. Steroid-resistant rejection is treated with lymphocyte-depleting agents. Mycophenolate mofetil, a purine antimetabolite drug that acts as an antiproliferative agent, may be added subsequently as a “third” agent. If this drug is not tolerated, another purine antimetabolite, azathioprine, or a mammalian target of rapamycin (mTOR) inhibitor, rapamycin, can be added to tacrolimus and steroids. Some regimens also include interleukin-2 (IL-2) receptor-blocking or lymphocyte-depleting antibodies from the time of LT., Time-dependent protocols are used to reduce tacrolimus and steroid doses, based on the likelihood of rejection at various times after Tx. Such reductions can precipitate rejection in some individuals. Growth and changing metabolism also affect immunosuppressant doses in children. A high incidence of non-compliance during adolescence adds to these challenges. As a result, surveillance biopsies performed after the first year after Tx can reveal fibrosis, antibody-mediated chronic allograft injury; this can be with or without cellular infiltrates and circulating donor-specific antibodies.12, 13, 14These findings suggest that, if rejection is unchecked, then graft loss is the inevitable outcome. Current non-invasive liver function testing establishes whether graft function is normal. Biopsies confirm whether rejection is the cause of graft dysfunction. Multi-gene blood tests have predicted acute cellular rejection (ACR) and tolerance in adult LT recipients but have only been used in research settings in some children.16, 17, 18, 19, 20, 21 The cell-based Pleximmune test predicts rejection with a calculated index of donor-reactive T-cytotoxic memory cells. Although useful for surveillance, this test cannot aid drug selection. These needs can be addressed by a non-invasive blood-based diagnostic that can predict, before LT or early within 90 days after LT, the occurrence of LT rejection in children. An ideal diagnostic would categorize risk, be reproducible across cohorts, and identify druggable immune mechanisms underlying rejection. Current biomarkers are primarily focused on the adult population, often non-blood-based, and primarily correlative and do not identify underlying mechanisms. Peripheral and tissue transcripts associated with rejection of a given organ can vary, as can signatures associated with rejection of different organs or rejection that occurs at different time periods within the same type of organ Tx.Here we address these key challenges by developing molecular signatures for ACR after pediatric LT using blood-based transcriptomics data. Two distinct approaches were used. First, we identified traditional signatures using differentially expressed genes (DEGs) in rejectors compared with non-rejectors. Next we used an integrative approach with a modern machine learning pipeline to combine the blood-based transcriptomics data with the high-quality reference human protein interactome network to identify network module signatures for pre- and post-Tx samples. These modules make up the protein interactome network and also incorporate additive effects. We also identified specific network modules that could be targeted by current anti-rejection drugs or other repurposed drugs. Our integrative framework is broadly appropriate for contexts where DEGs in the peripheral blood transcriptome have small effect sizes. In this setting, a multivariate network module-based signature is more likely to be robust to cross-validation and informative of underlying molecular phenotypes that drive differences across the clinical outcomes.
Results
Biomarkers and molecular signatures of LT outcomes
To predict rejection outcomes after Tx, several molecular signatures have been developed. These signatures consist of blood-based genes with significant differential expression (DEGs) in rejection compared with non-rejection. In these signatures, genes are considered to be independent entities. However, genes are not independent. Proteins encoded by these genes are part of a complex underlying molecular network: the protein-protein interaction network. The structure of this network has been implicated in many disease contexts and can contribute to the rejection process, an attribute not considered in the design of prevailing blood-based multi-gene assays. We and others, have recently developed a new conceptual paradigm where we overlay expression data onto the underlying reference protein network and identify expression modules that can discriminate between clinical outcomes (Figures 1A and 1B). The underlying network used in the framework is an unbiased context-independent reference map of all high-quality protein-protein interactions that have been validated by multiple independent assays. Overlaying context-specific expression data onto this reference map helps us identify functionally connected subnetworks or modules that discriminate between clinical outcomes (Figure 1B). Key advantages of the network-based approach include encapsulating additive effects of genes in modules. This biologically motivated dimensionality reduction avoids overfitting and captures underlying molecular signatures that drive the bifurcation of clinical outcomes beyond simple correlates of outcome (Figure 1B).
Figure 1
Uncovering biomarkers and molecular signatures of LT outcomes
(A) Conceptual overview of gene- and network-centric approaches to identify biomarkers and molecular signatures, respectively, of pediatric LT outcomes (generated using BioRender).
(B) Schematic summarizing keys steps of the network-based approach: construction of expression modules followed by LASSO-based feature selection to identify network modules driving LT outcomes (generated using BioRender).
Uncovering biomarkers and molecular signatures of LT outcomes(A) Conceptual overview of gene- and network-centric approaches to identify biomarkers and molecular signatures, respectively, of pediatric LT outcomes (generated using BioRender).(B) Schematic summarizing keys steps of the network-based approach: construction of expression modules followed by LASSO-based feature selection to identify network modules driving LT outcomes (generated using BioRender).Here we apply both approaches, gene-centric and network-centric, to identify signatures of LT rejection. Three separate signatures are developed; the first uses single pre-Tx samples from 75 children with LT, the second uses early post-Tx samples from 55 children with LT, and the third uses late post-Tx samples from 55 children with LT (STAR Methods). Children experiencing rejection within 90 days after LT were termed early rejectors (Rs). Early non-rejectors (NRs) did not experience rejection during this period. For the late post-Tx period, samples were collected between years 2 and 5. Outcomes were recorded for 90 days after sampling. All rejection events were identified based on for-cause biopsies, so all R subjects are BPAR (biopsy-proven ACR). There were no significant differences in distribution of gender or time to rejection between Rs and NRs (STAR Methods).
Gene-based signatures of LT outcomes
Using a cohort of 75 pre-Tx individuals, we first sought to develop a standard gene-based signature to discriminate between Rs and NRs. Raw RNA sequencing (RNA-seq) data were processed using a standard analytic pipeline comprising quality control, alignment, and quantification (STAR Methods). Using a linear model adjusting for gender, age, and drug use, we identified significantly DEGs (p < 0.05) between R and NR individuals. We converged on a 125-gene signature, of which 55 had a univariate area under the ROC curve (AUC) of 0.6 or greater (Figure 2A). In a cross-platform validation of these 55 genes on the Fluidigm platform, only 12 achieved significant differential expression in Rs compared with NRs with an AUC of 0.65 or greater (Figure 2B). This low technical replication rate on a different platform is expected and may result from low effect sizes, large between-sample variation in expression, and considering these genes independent.
Figure 2
Gene-based biomarkers of LT outcomes identified using pre- and post-LT transcriptomics datasets
(A) Heatmap illustrating the 55 DEGs with AUC greater than 0.6 identified using pre-LT transcriptomics data.
(B) Heatmap with a subset of the DEGs in (A) that pass Fluidigm validation.
(C) Heatmap illustrating the 50 DEGs with AUC greater than 0.6 identified using early post-LT transcriptomics data.
(D) Heatmap with a subset of the DEGs in (C) that pass Fluidigm validation.
(E) Heatmap illustrating the 36 DEGs with AUC greater than 0.6 identified using late post-LT transcriptomics data.
(F) Heatmap with a subset of the DEGs in (E) that pass Fluidigm validation.
Gene-based biomarkers of LT outcomes identified using pre- and post-LT transcriptomics datasets(A) Heatmap illustrating the 55 DEGs with AUC greater than 0.6 identified using pre-LT transcriptomics data.(B) Heatmap with a subset of the DEGs in (A) that pass Fluidigm validation.(C) Heatmap illustrating the 50 DEGs with AUC greater than 0.6 identified using early post-LT transcriptomics data.(D) Heatmap with a subset of the DEGs in (C) that pass Fluidigm validation.(E) Heatmap illustrating the 36 DEGs with AUC greater than 0.6 identified using late post-LT transcriptomics data.(F) Heatmap with a subset of the DEGs in (E) that pass Fluidigm validation.We used a similar approach on a cohort of 55 early post-Tx individuals. A linear model adjusted for gender, age, and drug use (STAR Methods) identified significant DEGs (p < 0.05) between R and NR individuals. We converged on a 65-gene signature, of which 50 had a univariate AUC of 0.6 or greater (Figure 2C). In a cross-platform validation of these 50 genes on the Fluidigm platform, only 6 achieved significant differential expression in Rs compared with NRs with an AUC of 0.65 or greater (Figure 2D), likely because of the abovementioned factors.We also used a similar approach on a cohort of 55 late post-Tx individuals. A linear model adjusted for gender, age, and drug use (STAR Methods) identified significant DEGs (p < 0.05) between R and NR individuals. We honed in on a 55-gene signature, of which 36 had a univariate AUC of 0.6 or greater (Figure 2E). In a cross-platform validation of these 36 genes on the Fluidigm platform, only 9 achieved significant differential expression in Rs compared with NRs with an AUC of 0.65 or greater (Figure 2F), likely for similar reasons.
A network-based signature of pre-LT outcomes
The expression dataset is high dimensional; the number of transcripts far exceeds the number of samples. So, a traditional multivariate model that does not account for this is likely to suffer from overfitting. We accounted for this in two ways. First, using a network-based approach, we used modules as features instead of genes; this takes into account the topology of the underlying network. Second, we used an L1 regularization-based machine learning approach (a statistical approach that helps avoid overfitting) on the module-centric features.The high-quality human reference protein interactome network was clustered into modules using a well-established topological clustering approach, ClusterOne. First, we overlaid expression data from the 75 pre-Tx individuals on these subnetworks to generate expression module-based features. Using least absolute shrinkage and selection operator (LASSO)-based feature selection, we then identified a minimal subset of modules that met an effect size threshold and were discriminative of R versus NR outcome in a cross-validation framework. This approach achieves dimensionality reduction biologically by clustering genes into modules and statistically using L1 regularization (LASSO).To evaluate the performance of our approach, we used rigorous k-fold cross-validation and permutation testing as described previously., The network module signature we identified had a median AUC of ∼0.7 across 100 replicates of 10-fold cross-validation and was significantly better at predicting rejection than a model built on permuted data (negative control) in a matched 10-fold cross-validation framework (Figure 3E; STAR Methods). Interestingly, a model built on the pre-Tx signature comprising 65 DEGs was not significantly better than the negative control across 100 replicates of 10-fold cross-validation (Figure 3A; STAR Methods). A model built with 12 DEGs down-selected by cross-platform (Fluidigm) validation was also not significantly better than a model built with permuted data (Figure 3B; STAR Methods). As an additional control, we also built models using pathway signatures. Specifically, we identified Hallmark pathways in which the DEGs were over-represented and constructed corresponding pathway-centric features using a strategy analogous to the module-based feature construction strategy. Interestingly, this model was not significantly better than the negative control across 100 replicates of 10-fold cross-validation (Figure 3C). A similar result was observed with Kyoto Encyclopedia of Genes and Genomes (KEGG)-based pathway signatures (Figure 3D). The network approach is likely more successful than the pathway approach because it allows identification of specific subnetworks from the entire reference human network in a data-dependent fashion, allowing discovery of non-intuitive associations. On the other hand, the pathway approach captures only known gene sets and is often overbroad. Thus, the network approach provides a way to leverage the combinatorial effect sizes of genes in functionally related processes or modules to come up with a set of modules as biomarkers of outcome. The combined use of biologically inspired dimensionality reduction (genes > modules) coupled to statistically driven dimensionality reduction (use of L1 regularization) in the network approach allows simultaneous prevention of overfitting and identification of interpretable signatures of pediatric LT outcomes.
Figure 3
Network-based molecular signatures of outcomes identified by combining pre-LT transcriptomics data with the reference human protein interactome network
(A) Evaluation of the predictive power of the gene signature from Figure 2A, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(B) Evaluation of the predictive power of the gene signature from Figure 2B, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(C) Evaluation of the predictive power of a signature consisting of Hallmark pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(D) Evaluation of the predictive power of a signature consisting of KEGG pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(E) Evaluation of the predictive power of the network-based molecular signatures in a cross-validation framework with permutation testing. The network-based signatures are identified by LASSO-based feature selection on expression modules constructed by combining pre-LT transcriptomics data with the reference human protein interactome network. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(F) Visualization of the network module signatures from the model in (E). Red denotes higher median expression in R subjects, and blue denotes median higher expression in NR subjects.
(G) PLS-DA using only the network module signatures from the model in (E) to discriminate between R and NR subjects.
(H) Heatmap visualizing the variation of the module-specific features across R and NR subjects.
Network-based molecular signatures of outcomes identified by combining pre-LT transcriptomics data with the reference human protein interactome network(A) Evaluation of the predictive power of the gene signature from Figure 2A, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(B) Evaluation of the predictive power of the gene signature from Figure 2B, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(C) Evaluation of the predictive power of a signature consisting of Hallmark pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(D) Evaluation of the predictive power of a signature consisting of KEGG pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(E) Evaluation of the predictive power of the network-based molecular signatures in a cross-validation framework with permutation testing. The network-based signatures are identified by LASSO-based feature selection on expression modules constructed by combining pre-LT transcriptomics data with the reference human protein interactome network. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(F) Visualization of the network module signatures from the model in (E). Red denotes higher median expression in R subjects, and blue denotes median higher expression in NR subjects.(G) PLS-DA using only the network module signatures from the model in (E) to discriminate between R and NR subjects.(H) Heatmap visualizing the variation of the module-specific features across R and NR subjects.To test how sensitive the approach is to the specific summarization approach used to capture combinatorial effects, we used three different strategies to capture the additive effects of genes within modules (STAR Methods). Irrespective of the summarization method, the network approach remained significantly predictive (Figure S1). Thus, the concept of network-based integration, rather than a specific implementation strategy, holds the key for building a molecular signature that can distinguish between R and NR individuals.In addition to being predictive, the identified network module biomarkers (Figures 3F and 3H) also encapsulate details of molecular mechanisms not easily captured by signatures incorporating individual DEGs. Unlike the pre-LT gene-based signature, the network signature (Figures 3F and 3H) identified several known as well as putative novel molecular mechanisms underlying predisposition to rejection. Several type I interferon (IFN) and complement genes as well as those in mTOR modules were upregulated in Rs, suggesting primed innate immune signaling in these individuals. In contrast, modules involved in basic cellular processes, including DNA replication, cell proliferation, and adhesion, had lower expression in NRs compared with Rs, suggesting a dysregulation of replication and cellular maintenance mechanisms in those likely to experience rejection. Our results suggest that higher inflammatory signaling coupled to a disruption in basic cellular processes underlies predisposition to rejection in pediatric LT recipients.Finally, we visualized the down-selected network modules (Figure 3F) using partial least-squares discriminant analyses (PLS-DA). We observed that the overall set of down-selected modules has different profiles in R and NR subjects (Figure 3G). The earlier cross-validation-based analyses demonstrate the predictive power of the identified network modules with data held out. This analysis helps us visualize a signature from the whole dataset that can be utilized in downstream applications and could be extrapolated to other cohorts. Interestingly, there was lower variance across the profiles for R subjects than for NR subjects, suggesting that there are many ways to stay rejection free, and a series of specific dysregulation events that pre-dispose LT recipients to experience rejection (Figure 3G). Our network approach identifies robust signatures that are predictive and provide molecular insights into the mechanisms underlying rejection before Tx.
Identifying a network signature of early post-LT outcomes
Next we utilized the same network approach on the expression data from 55 early post-LT individuals. Here, too, the network module signature we identified had a median AUC of ∼0.65 across 100 replicates of 10-fold cross-validation and was significantly better at predicting rejection than a model built on permuted data (negative control) in a matched 10-fold cross-validation framework (Figure 4E; STAR Methods). A model built on the early post-Tx gene signature was somewhat more accurate than the negative control across 100 replicates of 10-fold cross-validation but did not achieve significance (Figure 4A; STAR Methods). A gene-based model was also not significantly better than a model built with permuted data in a cross-validation framework (Figure 4B; STAR Methods). Here, too, as an additional control, we built models using pathway signatures. Specifically, we identified Hallmark pathways in which the DEGs were over-represented and constructed corresponding pathway-centric features using a strategy analogous to the module-based feature construction strategy. Interestingly, this model was not significantly better than the negative control across 100 replicates of 10-fold cross-validation (Figure 4C). A similar result was observed with KEGG-based pathway signatures (Figure 4D).
Figure 4
Network-based molecular signatures of outcomes identified by combining early post-LT transcriptomics data with the reference human protein interactome network
(A) Evaluation of the predictive power of the gene signature from Figure 2C, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(B) Evaluation of the predictive power of the gene signature from Figure 2D, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(C) Evaluation of the predictive power of a signature consisting of Hallmark pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(D) Evaluation of the predictive power of a signature consisting of KEGG pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(E) Evaluation of the predictive power of the network-based molecular signatures in a cross-validation framework with permutation testing. The network-based signatures are identified by LASSO-based feature selection on expression modules constructed by combining post-LT transcriptomics data with the reference human protein interactome network. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(F) Visualization of the network module signatures from the model in (E). Red denotes higher median expression in R subjects, and blue denotes median higher expression in NR subjects.
(G) PLS-DA using only the network module signatures from the model in (E) to discriminate between R and NR subjects.
(H) Heatmap visualizing the variation of the module-specific features across R and NR subjects.
Network-based molecular signatures of outcomes identified by combining early post-LT transcriptomics data with the reference human protein interactome network(A) Evaluation of the predictive power of the gene signature from Figure 2C, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(B) Evaluation of the predictive power of the gene signature from Figure 2D, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(C) Evaluation of the predictive power of a signature consisting of Hallmark pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(D) Evaluation of the predictive power of a signature consisting of KEGG pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(E) Evaluation of the predictive power of the network-based molecular signatures in a cross-validation framework with permutation testing. The network-based signatures are identified by LASSO-based feature selection on expression modules constructed by combining post-LT transcriptomics data with the reference human protein interactome network. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(F) Visualization of the network module signatures from the model in (E). Red denotes higher median expression in R subjects, and blue denotes median higher expression in NR subjects.(G) PLS-DA using only the network module signatures from the model in (E) to discriminate between R and NR subjects.(H) Heatmap visualizing the variation of the module-specific features across R and NR subjects.As with the pre-Tx network-based signature, the early post-Tx signature is also predictive (Figures 4F and 4H), and captures mechanistic details. As with the pre-LT signature, irrespective of the summarization method, the network approach remained significantly predictive (Figure S2). However, the predictive power is slightly lower at the post-Tx time point compared with the pre-Tx time point. This is likely due to the immunosuppressive effect of drugs. However, this reflects an overall immunosuppressive effect and not any specific drug-specific effect. As further confirmation of this, the transcriptomics signatures could not stratify those that did/did not receive thymoglobulin (Figure S3). Thus, the dampening of the predictive effect reflects successful immunosuppression rather than a drug-specific bias.The early post-Tx signature (Figures 4F and 4H) also provides insights into the “temporal shift” in host defenses. Although specific components of innate immune signaling pre-dispose recipients to rejection before LT, a combination of innate and adaptive mechanisms underlies rejection after LT. The early post-LT signature comprises a DNA damage repair module induced by activation-induced cytidine deaminase (AID). A key component of the AID module, IGLL5, is overexpressed in NRs. IGLL5 negatively regulates AID, which likely impairs B cell maturation into antibody-producing cells. This may explain the absence of rejection, which is mediated by cellular and humoral mechanisms. DNA damage indicated by the second module is a component of B cell maturation and rejection-associated cell death and injury. The module with histone deacetylase genes mediates epigenetic and transcriptional regulation.Here, too, we visualized the down-selected modules using PLS-DA and observed that R and NR subjects have different profiles (Figure 4G). Again, there was lower variance across the profiles for R subjects than for NR subjects (Figure 4G), providing support to our hypothesis that there are many ways to remain rejection free and a series of specific molecular events that contribute to rejection.
Paired and unpaired pre- and early post-LT samples
Within the 75 pre- and 55 early post-LT subjects, we have 32 pairs; i.e., 32 pre-Tx matched to 32 early post-Tx. We carried out analyses only using this subset of paired samples to uncover how signatures evolve longitudinally from pre-to early post-Tx. We constructed features using differentials between the early post-LT and pre-LT time points. The corresponding network module signature we identified was significantly more predictive than a model built on permuted data across 100 replicates of 10-fold cross-validation (Figure S4A). The identified network modules (Figure S4B) include signatures identified earlier, such as DNA repair, as well as novel functions, such as MYC and TCF signaling. These results suggest that, although specific modules at the pre- and early post-LT time points as well as module-based features capturing differentials are independently predictive of LT outcome, the functional signatures are unique, reflecting an evolution of the molecular basis of risk of rejection across time.Next, to systematically evaluate whether there are differences in predictive performance between paired and unpaired samples, we trained a model using only the unpaired samples and then tested on the paired samples. For the pre-LT samples, this cross-prediction had an AUC of ∼0.8, and for early post-LT samples, the cross-prediction AUC was ∼0.6 (Figures S4C and S4D). These analyses demonstrate that there is no bias in prediction performance across paired and unpaired samples and trajectories from before LT to after LT are also independently predictive (in addition to the before and after time points) of LT outcomes.
Identifying a network signature of late post-LT outcomes
Finally, we utilized the same network approach on the expression data from 55 late post-LT individuals. Here, too, the network module signature we identified had a median AUC of ∼0.65 across 100 replicates of 10-fold cross-validation and was significantly better at predicting rejection than a model built on permuted data (negative control) in a matched 10-fold cross-validation framework (Figure 5E; STAR Methods). A model built on the late post-Tx gene signature was somewhat more accurate than the negative control across 100 replicates of 10-fold cross-validation but did not achieve significance (Figure 5A; STAR Methods). A model built with genes down-selected by cross-platform validation was also not significantly better than a model built with permuted data (Figure 5B; STAR Methods). As for pre- and early post-Tx, models built using Hallmark and KEGG pathway signatures were not significantly better than the negative control across 100 replicates of 10-fold cross-validation (Figures 5C and 5D).
Figure 5
Network-based molecular signatures of outcomes identified by combining late post-LT transcriptomics data with the reference human protein interactome network
(A) Evaluation of the predictive power of the gene signature from Figure 2E, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(B) Evaluation of the predictive power of the gene signature from Figure 2F, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(C) Evaluation of the predictive power of a signature consisting of Hallmark pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(D) Evaluation of the predictive power of a signature consisting of KEGG pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(E) Evaluation of the predictive power of the network-based molecular signatures in a cross-validation framework with permutation testing. The network-based signatures are identified by LASSO-based feature selection on expression modules constructed by combining late post-LT transcriptomics data with the reference human protein interactome network. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).
(F) Visualization of the network module signatures from the model in (E). Red denotes higher median expression in R subjects, and blue denotes median higher expression in NR subjects.
(G) PLS-DA using only the network module signatures from the model in (E) to discriminate between R and NR subjects.
(H) Heatmap visualizing the variation of the module-specific features across R and NR subjects.
Network-based molecular signatures of outcomes identified by combining late post-LT transcriptomics data with the reference human protein interactome network(A) Evaluation of the predictive power of the gene signature from Figure 2E, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(B) Evaluation of the predictive power of the gene signature from Figure 2F, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(C) Evaluation of the predictive power of a signature consisting of Hallmark pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(D) Evaluation of the predictive power of a signature consisting of KEGG pathways in which the DEGs are over-represented, evaluated in a cross-validation framework with permutation testing. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(E) Evaluation of the predictive power of the network-based molecular signatures in a cross-validation framework with permutation testing. The network-based signatures are identified by LASSO-based feature selection on expression modules constructed by combining late post-LT transcriptomics data with the reference human protein interactome network. “Actual” denotes the performance of the model, built on real data, across replicates of k-fold cross-validation. “Permuted” denotes performance of the model on shuffled data in a matched cross-validation framework (negative control).(F) Visualization of the network module signatures from the model in (E). Red denotes higher median expression in R subjects, and blue denotes median higher expression in NR subjects.(G) PLS-DA using only the network module signatures from the model in (E) to discriminate between R and NR subjects.(H) Heatmap visualizing the variation of the module-specific features across R and NR subjects.As with the early post-Tx network-based signature, the late post-Tx signature is also predictive (Figures 5F and 5H) and captures mechanistic details. Here, too, we visualized the down-selected modules using PLS-DA and observed that R and NR subjects have different profiles (Figure 5G). Again, there was lower variance across the profiles for R subjects than for NR subjects (Figure 5G), providing support to our hypothesis that there are many ways to remain rejection free and a series of specific molecular events that contribute to rejection. Our results with the pre-LT, early post-LT, and late post-LT data provide key insights into why network module signatures may have advantages over gene-centric signatures. Our two-step approach achieves inherent dimensionality reduction from genes to modules via incorporation of the topological structure of the underlying network and objective feature selection on this dimensionality-reduced feature set using the LASSO.
Druggability of the identified network modules
The set of network modules also enables rigorous druggability analyses taking into account the underlying molecular mechanisms of rejection. Such module-centric analyses have been carried out previously from tissue-based biomarkers. For example, a common rejection module consisting of 11 overexpressed genes was identified in meta-analyses of eight studies conducted on kidney, lung, heart, and LT biopsies. Anchored by STAT1 and NFKB1, this module was associated with ACR in all organs. Genes in this module included known targets of bortezomib, a proteasomal inhibitor used for antibody-mediated rejection (PSMB9), the anti-inflammatory drug sunlindac (CXCL9), the anti-rejection drug mycophenolate mofetil (INPP5D), the antileukemia receptor tyrosine kinase inhibitor dasatinib (LCL), the antibiotic doxycycline (BASP1), and the commonly used atorvastatin (CXCL10). Drugs identified by such network module analyses have been shown to be effective; in a mouse model of allogeneic heart Tx, atorvastatin reduced graft infiltration by T cells and lowered rejection rates. In a study of 2,515 kidney Tx recipients, 1,566 of whom received a statin, statin use was significantly associated with reduced graft loss within 180 days after Tx. Such analyses to look for drug targets have not been conducted previously on blood-based gene expression signatures of rejection. We extrapolated this conceptual framework to the modules identified by our network approach.Using DrugBank, we identified several pre-LT modules that can be targeted by currently approved drugs, such as monoclonal antibodies, mTOR/KICSTOR inhibitors, and calcineurin inhibitors (Figure 6), a few of which are already part of current immunosuppression strategies. The IFN type 1 and complement pathways suggest primed innate immune signaling among Rs that are druggable by known complement inhibitors or drugs that suppress IFN receptor signaling. Tacrolimus and cyclosporine target calcineurin signaling (Figure 6). Rapamycin and everolimus could be used to target mTOR signaling (Figure 6). The complement signaling modules can be targeted by monoclonal antibodies like eculizimab, which is used in refractory antibody-mediated rejection, or other anti-inflammatory molecules (Figure 6). Alfacon 1 targets IFN alpha and beta receptors 1 and 2, which mediate type 1 IFN signaling, and the recently approved anifrolumab is a type I IFNR blocker, making the corresponding module also druggable (Figure 6). Lcn2 can be targeted by carboxymycobactin and associated esters (Figure 6). In addition to these direct drug-gene associations from DrugBank, purine antimetabolites, azathioprine, or mycophenolate mofetil can target other discriminatory modules in the pre-Tx signature, including pathways involved in cell proliferation, such as cell cycle signaling, cell proliferation, metabolic reprogramming, cell adhesion/proliferation, and mitochondrial synthase signaling. Overall, our approach can enable selection of a combination of immunosuppressants most suited for that recipient.
Figure 6
Druggable network modules before LT
Network module signatures of LT outcomes, identified using pre-LT transcriptomics data, that are the targets of known drugs. Red rhombuses denote genes, in that module, directly targeted by the corresponding drug.
Druggable network modules before LTNetwork module signatures of LT outcomes, identified using pre-LT transcriptomics data, that are the targets of known drugs. Red rhombuses denote genes, in that module, directly targeted by the corresponding drug.In the post-LT signature, the module enriched for epigenetic and transcriptional regulation can be targeted by a range of histone deacetylase modulators, such as valproic acid (Figure S5). The oxidative DNA demethylase ALKBH3 can be targeted by molecules such as ascorbic acid (Figure S5). In addition to these direct drug-gene associations from DrugBank, AID-induced DNA damage repair is essential for maturation of B cells into antibody-producing cells. B cell-associated mechanisms, including humoral mechanisms, can be targeted by several drugs. These druggability analyses demonstrate the value of the network-based discriminatory signatures identified in this study.Beyond identifying overall group-specific profiles, these analyses also provide insights into specific dysregulated molecular profiles on a per-individual basis. For example, at the pre-Tx time point, within the identified predictive module biomarkers, subject 56 only had higher module-specific scores for complement, IFN, and mTOR/KICSTOR signaling at the pre-Tx time point. This suggests that addition of a third agent, such as rapamycin, may have prevented rejection for this subject. Subject 60 only had higher module-specific scores for complement and IFN signaling as well as cell cycle/DNA replication. This suggests that titrating the dosage of an existing immunomodulatory drug (e.g., CellCept) may reduce the risk of rejection for this subject. On the other hand, subject 72 had higher scores across many different modules, suggesting that the subject may have been prone to rejection irrespective of the choice of small-molecule immunosuppressive regimen. Such a signature may indicate a need to deplete all lymphocytes with a polyclonal agent like rabbit anti-human thymocyte globulin. Thus, our analyses identify clinically actionable intervention strategies on a per-individual basis at the pre-Tx time point. Implementation of these strategies can dramatically improve current clinical practice because ACR (especially in a pediatric context), unlike antibody-mediated rejection, cannot be screened for using human leukocyte antigen (HLA) cross-matches or other related markers. Importantly, these suggested intervention strategies are based on immunosuppressive drugs currently in use; hence, they can be implemented readily.
Discussion
Avoiding rejection with predictive diagnostics can have long-term benefits for graft health and quality of life. Early acute rejection predicts infection, graft loss, and death in adults and late T cell-mediated rejection. In this study, we demonstrate the advantages of using a network-based signature to distinguish between pediatric LT recipients who are predisposed to rejection or rejection-free outcomes. Our approach captures the additive effect of genes within a module, which is greater than that of individual genes. This overcomes the small effects of individual genes in the blood transcriptome, which limit replication of blood-based gene signatures. Genes within a module can be differentially expressed to different extents, and individual genes do not need to pass univariate significance thresholds. Rather, the significance of the multivariate signature is evaluated in a rigorous cross-validation framework with permutation testing. Finally, the predictive modules identified in the network-based analyses capture disease phenotypes that are inherently tied to molecular phenotypes. Consistent with these advantages, the network-based signatures, but not the gene-centric signatures for the pre-LT and post-LT time points, are significantly predictive of LT outcome. Interestingly, for the network-based signatures, although we tried a range of feature construction strategies, different strategies performed almost equivalently. This is likely due to each having specific advantages. This suggests that a higher sample size (i.e., better power), rather than alternate feature construction strategies, is important to come up with a more discriminative signature. These signatures also provide insights into the molecular mechanisms underlying rejection. These advantages can be leveraged in an actual clinical setting to predict rejection severity and identify drugs that can be repurposed to aid prevention and treatment of rejection.In our study, network-based signatures also capture a mechanistic shift in mechanisms from the pre- to the post-LT periods. Modules seen in the pre-LT signature are not seen in the post-LT one, suggesting that currently used anti-rejection drugs may have suppressed these mechanisms. Modules in the pre-LT signature captured altered innate immune signaling and disrupted cellular processes as underlying predisposition to rejection. However, in the post-LT signature, the modules included differentially engaged humoral mechanisms, a hallmark of adaptive immune signaling. They also included rewired epigenetic and transcriptional regulation, likely brought about by Tx of an organ from another individual. Together, these modules support a shift to adaptive immune signaling after LT. This shift is not apparent in gene signatures that characterize pre- and post-LT time points. Thus, the network signatures are more predictive than the gene signatures and encapsulate functional processes underlying rejection.The modules also capture, at a per-individual level, dominant druggable mechanisms. Some of these mechanisms, particularly at the pre-LT time point, are targets of current immunomodulatory drugs, including addition of a third agent. Thus, our analyses identify clinically actionable intervention strategies on a per-individual basis. Implementation of these strategies can dramatically improve current clinical practice because ACR (especially in a pediatric context), unlike antibody-mediated rejection, cannot be screened for using HLA cross-matches or other related markers. Importantly, all of our suggested intervention strategies are based on currently used immunosuppressive drugs and can be implemented directly.The network-based approach may also support a change in maintenance regimens to better address mechanistic shifts in rejection-prone individuals, from a primed innate immune system before LT to one enriched for adaptive immune signaling after LT. There is additional evolution of the adaptive immune signature in late Rs to one that is accompanied by donor-specific antibodies. Overall, our novel framework provides a proof of concept for an approach that can simultaneously identify blood-based module biomarkers of pediatric LT outcomes at different time points and home in on corresponding druggable mechanisms. A number of these can be relatively easily implemented by modulating existing immunosuppression strategies.
Limitations of the study
An important limitation of our cross-sectional study is the variety of mostly congenital diseases that cause liver failure and require LT. These diseases could have variable effects on the peripheral blood transcriptome. Because these diseases are rare, it would be difficult to assemble a large enough cohort of a single liver disease to develop a rejection predictor. A follow-up study is being planned to evaluate pre- and post-LT signatures in paired samples from pediatric LT recipients. Another limitation of our current study is identification of the dominant druggable mechanism well after the rejection event has occurred. Future studies may want to apply the pre-LT network modules in a prospective fashion. The model predictions can be used to target immunosuppression strategies in a personalized fashion.
STAR★Methods
Key resources table
Resource availability
Lead contact
Requests for data and code used for the study should be directed to and will be fulfilled by the Lead Contact Jishnu Das (jishnu@pitt.edu).
Materials availability
This study did not generate new unique reagents.
Experimental model and subject details
Cohort details
Blood samples were obtained from 98 children with LT at the UPMC-Children’s Hospital of Pittsburgh (CHP) after informed consent approved by the University of Pittsburgh Institutional Review Board (NCT #01163578). The mean age of LT recipients was 6.8 years (with a SEM of 0.7 years) and recipients were approximately equally (46% male, 54% female) distributed by biological sex. 56 of 98 children were rejection-free during the 90-day period of transplantation and were termed as non-rejectors (NR). 42 of 98 children experienced acute cellular rejection within the 90-day period of transplantation and were termed as rejectors (R). From this cohort, 75 blood samples were obtained before LT (pre-LT), of which 45 corresponded to NR and 30 to R. 55 blood samples were obtained during the first 90 days after LT (post-LT) of which 32 corresponded to NR and 23 to R. For the 23 rejectors in the post-LT cohort, sample collection occurred on average 10 days (with a SEM of 2.2 days) before the rejection event. We also obtained 55 samples at a late-post-Tx timepoint (2–5 years). These samples consist of recipients who experienced late rejection within the 90-day period after sampling and non-rejectors that did not experience rejection at the late timepoint. Several patients contribute samples to both the early-post-LT and late-post-LT timepoints; however, not all the early-post-LT and late-post-LT samples are paired. Further among the 55 post-late-LT subjects (24 R, 31 NR), 17/24 R subjects had a history of prior rejection. However, 0/31 NR subjects had a history of prior rejection. Details of the primary indication for LT are included in Table S1.
Method details
Sample collection and RNA-seq
Blood samples were collected in PAxgene tubes (BD Biosciences, SanJose, CA), before LT (pre-LT, n = 75), within 90 days after LT (early post Tx, n = 55), and within 2–5 years LT (late post Tx, n = 55). RNA was extracted using PAXgene Blood RNA Kit (Qiagen, Valencia, CA). RNA libraries generated using TruSeq Stranded Total RNA with Ribo-Zero Globin (Illumina, San Diego, CA) according to the manufacturer’s protocol. The libraries were validated using Fragment Analyzer (Agilent, Santa Clara, CA) and sequenced using Nextseq 500 platform (Illumina, San Diego, CA) to generate 75 bp paired end reads, up to 40 million reads per sample.
RNA-seq analyses
The generated reads were pre-processed using FastQC (v0.10.1) and aligned to the human reference gtf annotation file (GRCh37.68) using STAR. Transcript abundances were quantified and normalized using RSEM. The DESeq2 negative binomial distribution was used to identify DE genes using an effect size i.e., fold change threshold of 1.2 and a significance i.e., FDR (P value adjusted for multiple comparisons using Benjamini-Hochberg multiple testing correction) threshold of ≤0.05. For the pre-Tx (30R vs 45NR), early-post-Tx (23R vs 32NR), and late-post-Tx (24R vs 31NR) samples, sets of 125, 65, and 36 DE genes were identified respectively.
Validation of DE genes by Taqman qRT-PCR using Fluidigm platform
Of the 125, 65, and 55 DEG genes identified above, a subset of target genes, 55, 50, and 36 respectively, were down-selected based on an AUC cutoff of 0.6. TaqMan gene expression assays (Thermo Fisher, Grand Island, NY) for these 55, 50, and 36 target genes (for the pre-LT and post-early-LT, post late-LT cohorts respectively) and 11 housekeeping genes were performed at the Q2 solutions laboratory (Morrisville, NC) using Fluidigm Biomark instrument (South San Francisco, CA). 12 of 55 DEGs, and 6 of 50 DEGs, and 10 of 36 DEGs in the pre-LT cohort, post early-post-LT, and late-post-LT cohorts were validated by TaqMan qRT-PCR on the Fluidigm platform where cross-platform reproducibility was defined based on stringent criteria – AUC > 0.65 and consistent fold change directionality when compared to RNA seq.
Quantification and statistical analyses
DEG-based machine learning models
We analyzed the pre-LT, early-post-LT, and late-post-LT cohorts separately. Within each cohort, we also generated separate models to discriminate between R and NR, using the DE genes before and after Fluidigm validation. In each case, the features were the normalized expression values. The classification models comprised two steps – feature selection using LASSO, followed by classification using the down-selected features using a support vector machine (SVM) model.
Network-based machine learning models
We clustered a reference high-quality human interactome into overlapping modules using ClusterOne. Generating overlapping modules incorporates pleiotropy because proteins can be involved in multiple functions/modules. For each module, using available transcript abundances (retaining modules where at least 3 proteins had corresponding transcript abundance values available), we constructed gene- and module-level summary statistics.First, for each patient, we computed gene-specific scores:where is the expression values of the gene i. are the mean expression levels of that gene within R and NR subjects, respectively. We then summarized the gene-specific scores into a module-specific score for each patient based on the following summarization strategy:where |M| is the number of genes in that specific module. It is important to note that for all model performance evaluations using cross-validation, the relevant means were calculated only using the fold-specific training data. The use of ln (1+x) ensures that all module-specific scores are positive. The feature construction encapsulates gene-level deviations from average R and NR profiles as well as the additive effects of genes within a module. However, to ensure that the results are robust to the choice of a specific heuristic, we also explored 2 alternate summarization strategies.In a first variation of the proposed method, we calculated the module specific score asThis was then multiplied by the “dominant” direction of genes within the module calculated based on the majority vote of the sign of . This. variation takes into account potentially different directions of genes within the same module. The second variation uses, a construction of z-score like summary statistics as described earlier. Here, there are 2 features per module – “z-score based summary statistic quantifying deviation from a typical R profile” and “z-score based summary statistic quantifying deviation from a typical NR profile”.These module-specific scores served as input features for the corresponding machine learning models. We built a LASSO/SVM model, as previously described,41, 42, 43 on the module-specific scores. The use of LASSO (L1 regularization) is especially important here as it helps prevent over-fitting on high dimensional data. The performance of all models were evaluated in a rigorous 10-fold cross validation framework, and the significance of the models was quantified using permutation testing. The overall framework is analogous to what has been previously described.41, 42, 43 Briefly, the dataset was split into 10 subsets – 9 subsets are used for training while the 10th subset is used for testing. Each subset served as the test set once, therefore each individual was in the test fold exactly once for each cross-validation run. For each test fold, LASSO-based feature selection was performed using the nine training folds. The coefficient for the LASSO penalty term (i.e., lambda for regularization) was determined via a second internal cross-validation using only the fold-specific training dataset.A fold-specific SVM model was built using the LASSO-selected modules that passed an effect size threshold of 10% (calculated using only the training data for that fold). This fold-specific classifier was subsequently used to predict the labels for the individuals in the test set for that fold. This process was repeated for each of the ten folds to generate a set of predicted clinical outcomes for each individual. This was then compared to the true set of clinical outcome labels to calculate a classification accuracy for that cross-validation replicate. We performed 100 independent ten-fold cross-validation replicates, to account for different ways in which the training and test folds can be split. This is a stringent and appropriate way of performing cross-validation, as both steps involved in the model (feature selection and subsequent classification using the selected features) are performed in a cross-validation setting with data held out. The significance of model performance was evaluated using permutation testing, by randomly shuffling the data with respect to the arm labels, within the cross-validation framework described above (i.e., a cross-validation framework matched to the actual model). We defined a model to be significant if the performance on the real data was significantly better than the performance on the permuted data at a P < 0.01 threshold, and there was an AUC difference of at least 0.15 between the median performances of the actual and permuted models across 100 replicates of 10-fold cross-validation. The use of both a statistical significance and an effect size threshold is the most stringent way to evaluate the performance of these models in a cross-validation framework.To visualize the modules selected by the LASSO model, we used PLS-DA. We carried out separate PLS-DA analyses using down-selected modules for the pre-LT and post-LT signatures.
Pathway-based machine learning models
For the pathway-based analyses, we identified which KEGG or Hallmark pathways the DEGs were enriched in using representation analysis (ORA). We then constructed pathway-centric features analogous to how module-centric features were constructed. Predictive performance of these pathway-based models was computed in a k-fold CV framework with permutation testing as described above.
Druggability of the identified network modules
We examined if there are known FDA-approved drugs or experimental molecules from DrugBank that are known to target one or more components of the network module signatures that discriminate between R and NR subjects. Specifically, we examined if one or more genes in the discriminatory modules are known to be targets of approved or experimental drug molecules cataloged in DrugBank.
REAGENT or RESOURCE
SOURCE
IDENTIFIER
Biological samples
Whole blood from pediatric liver transplant recipients
University of Pittsburgh
N/A
Critical commercial assays
TaqMan gene expression assays
TaqMan gene expression assays
N/A
PAxgene tubes
BD Biosciences
N/A
PAXgene Blood RNA Kit
Qiagen
N/A
TruSeq Stranded Total RNA with Ribo-Zero Globin
Illumina
N/A
Deposited data
Bulk RNA-seq data
This study
GEO accession number GSE200340, https://github.com/jishnu-lab/LT_Mdx
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937
Authors: Katelynn Madill-Thomsen; Marwan Abouljoud; Chandra Bhati; Michał Ciszek; Magdalena Durlik; Sandy Feng; Bartosz Foroncewicz; Iman Francis; Michał Grąt; Krzysztof Jurczyk; Goran Klintmalm; Maciej Krasnodębski; Geoff McCaughan; Rosa Miquel; Aldo Montano-Loza; Dilip Moonka; Krzysztof Mucha; Marek Myślak; Leszek Pączek; Agnieszka Perkowska-Ptasińska; Grzegorz Piecha; Trevor Reichman; Alberto Sanchez-Fueyo; Olga Tronina; Marta Wawrzynowicz-Syczewska; Andrzej Więcek; Krzysztof Zieniewicz; Philip F Halloran Journal: Am J Transplant Date: 2020-04-09 Impact factor: 8.086
Authors: Ran Tao; Edwin F de Zoeten; Engin Ozkaynak; Liqing Wang; Bin Li; Mark I Greene; Andrew D Wells; Wayne W Hancock Journal: Curr Opin Immunol Date: 2007-08-24 Impact factor: 7.486
Authors: Jorge R Ferraris; Pablo Duca; Norma Prigoshin; Monica L Tambutti; Gustavo Boldrini; Rita L Cardoni; Daniel D'Agostino Journal: Pediatr Transplant Date: 2004-10
Authors: Jishnu Das; Srivamshi Pittala; Margaret E Ackerman; Thomas Broge; Caitlyn Linde; Todd J Suscovich; Eric P Brown; Todd Bradley; Harini Natarajan; Shu Lin; Jessica K Sassic; Sean O'Keefe; Nickita Mehta; Derrick Goodman; Magdalena Sips; Joshua A Weiner; Georgia D Tomaras; Barton F Haynes; Douglas A Lauffenburger; Chris Bailey-Kellogg; Mario Roederer; Galit Alter Journal: Nat Med Date: 2018-09-03 Impact factor: 53.440
Authors: Purvesh Khatri; Silke Roedder; Naoyuki Kimura; Katrien De Vusser; Alexander A Morgan; Yongquan Gong; Michael P Fischbein; Robert C Robbins; Maarten Naesens; Atul J Butte; Minnie M Sarwal Journal: J Exp Med Date: 2013-10-14 Impact factor: 14.307
Authors: Lenette L Lu; Jishnu Das; Patricia S Grace; Sarah M Fortune; Blanca I Restrepo; Galit Alter Journal: J Infect Dis Date: 2020-11-13 Impact factor: 5.226