| Literature DB >> 29272327 |
Francesca Vitali1, Qike Li1, A Grant Schissler2, Joanne Berghout1, Colleen Kenost1, Yves A Lussier1.
Abstract
The development of computational methods capable of analyzing -omics data at the individual level is critical for the success of precision medicine. Although unprecedented opportunities now exist to gather data on an individual's -omics profile ('personalome'), interpreting and extracting meaningful information from single-subject -omics remain underdeveloped, particularly for quantitative non-sequence measurements, including complete transcriptome or proteome expression and metabolite abundance. Conventional bioinformatics approaches have largely been designed for making population-level inferences about 'average' disease processes; thus, they may not adequately capture and describe individual variability. Novel approaches intended to exploit a variety of -omics data are required for identifying individualized signals for meaningful interpretation. In this review-intended for biomedical researchers, computational biologists and bioinformaticians-we survey emerging computational and translational informatics methods capable of constructing a single subject's 'personalome' for predicting clinical outcomes or therapeutic responses, with an emphasis on methods that provide interpretable readouts. Key points: (i) the single-subject analytics of the transcriptome shows the greatest development to date and, (ii) the methods were all validated in simulations, cross-validations or independent retrospective data sets. This survey uncovers a growing field that offers numerous opportunities for the development of novel validation methods and opens the door for future studies focusing on the interpretation of comprehensive 'personalomes' through the integration of multiple -omics, providing valuable insights into individual patient outcomes and treatments.Entities:
Keywords: n-of-1; personalome; precision medicine; single-subject studies
Mesh:
Year: 2019 PMID: 29272327 PMCID: PMC6585155 DOI: 10.1093/bib/bbx149
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 13.994
Figure 1Flow chart of methods designed for clinical interpretation of single-subject -omics. This review addresses the gap of knowledge to compare and contrast single-subject methods designed to reduce the dimension of raw -omics data (left) and to provide a biomolecular interpretation of signals (gray rectangle). For DNA sequencing, variant and mutations calls as well as all functional annotations in single subjects (e.g. missense mutation) already bridge this gap. However, this intermediate step is often omitted for other molecules of life, such as mRNAs, miRNAs, proteins, methylated DNA regions and metabolites (carbohydrates and lipids). This review focuses on single-subject methods that analyze transcriptome data. ‘Clinical applications’ section provides emerging evidence that the newly available, unbiased SSA of the transcriptome enable innovative types of studies to investigate their clinical utility by addressing the gap of biomolecular interpretation of raw -omics signals. Among possible studies, we demonstrate that -omics clinical prediction classifiers that operate directly at the -omics scale may be redesigned for the parsimonious transformed signal of single-subject studies for improved clinical utility.
Figure 2SSA studies included in this review. (A) Each numbered point represents a publication plotted by year of publication and the relative number of citations (in log2 scale). Numbers correspond to the publication in this article’s reference list, colors indicate the type of input required, i.e. one single-subject sample (1 ss SAMPLE—green), two paired single-subject sample (2 ss SAMPLES—purple) or if the method requires the collection of multiple samples from the same subject (multiple ss SAMPLES—orange). The shapes represent the type of output provided by the selected studies, i.e. DEGs—circle, DEPs—X. Finally, blue squares indicate methods based on the integration of transcriptome data with other - omics. (B) Number of citations over time starting from the publication year for the single-subject studies analyzing transcriptome data. Color and shape codification is the same as for the (A).
Table of content of the review
| Section | Pages |
|---|---|
| Transcriptome | p. 2 |
| Cross-subject transcriptome analyses | p. 4 |
| Single-subject transcriptome analyses | p. 4 |
| DEGs identification in single-subjects | p. 7 |
| DEPs identification in single-subjects | p. 8 |
| Longitudinal time series analyses of transcriptome | p. 10 |
| Single-subject transcriptome integrated with other ‐omics | p. 11 |
| Validation of single-subject methods | p. 12 |
| Clinical applications | p. 12 |
| Perspective and conclusion | p. 13 |
Figure 5Summary of single-subject methods that analyze transcriptome data to identify DEPs.
Figure 3Current strategies to analyze single-subject transcriptomes. Analysis of single-subject transcriptome can be usually divided into two categories based on the required number of samples: (i) single sample analyses, (ii) paired sample analyses, or (iii) more samples (not shown). They can also be categorized according to their outputs: (i) Differentially Expressed Genes (DEGs), (ii) Differentially Expressed Pathways (DEPs), or Disease Scores (DSs). Note: DEP* = not true DEP, rather a relative expression level of the pathways because there are no references or baseline to compare the pathway expression of a single sample.
Figure 4Summary of single-subject methods that analyze transcriptome data to identify DEGs. Note: Additional details are available in Table 2.
Additional details on single-subject transcriptome analyses of DEGs
| Publication | Name | Description |
|---|---|---|
| Wang | RankComp | RankComp requires two inputs: (i) a disease sample and (ii) a set of accumulated normal samples, which can be can be accrued during the same experiment or a priori from various external resources. RankComp begins by ranking genes within the samples (both the case and the normal) according to increasing expression values. Next, pairwise rank comparison are performed to identify (a) stable gene pairs, and (b) reversal gene pairs. Stable gene pairs are defined as those with the same ordering in 99% of the accumulated normal samples [expressiongeneA > expressiongeneB] while reversal gene pairs are identified by disruption of that ordering in the disease sample [expressiongeneA < expressiongeneB]. Fisher’s exact test is conducted to test the null hypothesis that the numbers of reversal gene pairs supporting its upregulation or downregulation are equal. This procedure enables extraction of a list of DEGs for a single subject, and interpretable results can be obtained through manual examination or by performing gene set enrichment analyses |
| Liu | DNB | Computational approach based on DNB theory to detect pre-disease states |
| Wang. | DEGseq | DEGseq identifies DEGs using RNA-Seq data collected from a single subject. When replicates are not available, the authors suggest a MA-plot-based method with a random sampling model, which assumes the expression counts follow a binomial distribution. Given the average of log2-transformed expression levels, it approximates the log2 expression fold change by a normal distribution, and then calculates a Z-score based on this distribution. |
| Tarazona | NOISeq | NOISeq is a data-adaptive and nonparametric approach, which has a variant, NOISeq-sim, that works without replicates. NOISeq-sim uses simulated replicates when real replicates do not exist. It simulates replicates under the assumption that gene expression counts follow multinomial distribution in which the probability of each gene corresponds to the probability of a read mapping to that gene. The probability of each gene is estimated by the proportion of its read counts relative to the total number of mapped reads from the only sample under the corresponding experimental condition. With the simulated replicates, NOISeq-sim generates a joint null distribution of fold-changes (M) and absolute differences (D) of the expression counts from the replicates within the same condition. This joint null distribution is then used to assess differential expression by gene‘s (M, D) pair computed between conditions |
| Feng | GFOLD | This method assumes a Poisson distribution ( |
| Anders | DESeq | When neither condition (i.e. affected and control sample) has replicate transcriptomes, DESeq assumes the majority of the genes as non-DEGs and estimates a mean–variance relationship from treating the two samples as if they were replicates [ |
| Robinson | edgeR | edgeR assumes that RNA-Seq data follow negative binomial distribution for which, given the mean, the variance is determined by a dispersion parameter. When working without replicates, edgeR assigns the same value of the dispersion parameter to all genes and conducts a negative binomial exact test to compute |
Additional details on single-subject transcriptome analyses of DEPs
| Publication | Name | Description |
|---|---|---|
| Wang | IndividPath | IndividPath computes REOs from a pathway point of view reducing the dimension of the sample representation. Patient-specific DEPs of a sample are obtained by applying a similar procedure to RankComp [ |
| Drier | Pathifier | Pathifier has been developed to compute PDSs for cancer tumor samples by aggregating gene-level information into pathway-level information, providing meaningful dimension reduction. Pathifier analyzes one pathway at a time and assigns a PDS to each sample by using the expression levels of the genes belonging to the pathway. To calculate PDSs, a PCA is performed to reduce the dimensions and capture the variation of the data. Next, the method identifies the best principal curve using both cohort samples (normal and disease). Then, the PDS of a sample is obtained by computing the distance of a single sample from the median of the normal samples on the principal curve. The output of this approach is therefore a list of DEPs for each sample representing the level of deregulation of each pathway |
| Ahn | iPAS | iPAS provide gene-level statistics (i.e. Z-score) by standardizing the gene expression level of the disease sample with the mean and the standard deviation of the normal samples. Z-scores are used as inputs to calculate iPAS for the disease sample, for example, using the average of the Z-scores in a pathway. iPAS is then computed for every normal sample to construct a null distribution, which assesses the significance of disease iPAS’s deviation from the normal reference. |
| Yang | FAIME | The FAIME transforms a vector of mRNA quantification into pathway-level metrics derived from a single biological sample. Each mRNA is annotated to a gene, and genes are annotated to gene sets via knowledge base integration. Every pathway receives a score that quantifies the ‘average’ over-expression of genes within the pathway, when compared with genes in background (not in the pathway). This process provides mechanism-level interpretation to a single transcriptome. |
| Barbie | ssGSEA | ssGSEA uses the difference in empirical cumulative distribution functions of gene expression ranks inside and outside a gene set (i.e. pathway) to calculate an enrichment statistic per sample, akin to the FAIME methodology described above. The procedure adopted is similar to GSEA [ |
| Gardeux | N-of-1 pathways Wilcoxon | This method aggregates gene expression values from two paried samples into gene sets provided by external knowledge sources (e.g. GO, KEGG). Each externally defined gene set is assessed for differential expression using the nonparametric analog of a paired |
| Schissler | N-of-1 pathways MD | N-of-1-pathways MD seeks to improve the differential expression testing component of the framework introduced by Gardeux |
| Schissler | ClusterT | The Cluster-T is yet another improvement to the differential test procedure of N-of-1-pathways. It was shown that under nontrivial inter-genetic correlation, the bootstrapping procedure of the MD failed to produce adequate estimates of the standard error of the average log2 fold-change of expression. This problem proved to be challenging without bringing in external knowledge of context-specific gene–gene correlation. With this external knowledge, genes are clustered within pathways and, under certain assumptions, the test statistic was shown to follow a |
| Li | N-of-1-pathways MixEnrich | N-of-1 pathways MixEnrich improves both N-of-1 pathways Wilcoxon and MD by detecting DEPs when they are bidirectionally dysregulated and/or background noise is present. Both Wilcoxon and MD are not designed to detect dysregulated pathways with upregulated and downregulated genes (bidirectional dysregulation), which are ubiquitous in biological systems. MixEnrich identifies bidirectional dysregulation by first clustering genes into upregulated, downregulated and unaltered genes. Subsequently, MixEnrich identifies pathways enriched with upregulated and/or downregulated transcripts. The enrichment test performed by MixEnrich detects only pathways with a significantly higher proportion of dysregulated genes with respect to the background. It is therefore more robust in presence of background noise (i.e. a large number of dysregulated genes unrelated to the phenotype) |
| Li | N-of-1-pathways kMEn | N-of-1 pathways kMEn further improves the N-of-1 pathways MixEnrich method by using a nonparametric model (i.e. |
REOs = Relative expression orderings.
Figure 6Summary of single-subject methods that analyze transcriptome data combined with other -omics.
Summary of the method validation in single subjects
| Publication | Method | In silico validation | Real dataset validation | Independent dataset validation |
|
| Clinical trial validation |
|---|---|---|---|---|---|---|---|
| Transcriptome | |||||||
| Gardeux | N-of-1 pathways W | • | • | ⊘ | • | ⊘ | • |
| Wang | DEGseq | • | • | • | ⊘ | ⊘ | ⊘ |
| Anders | DESeq | • | • | • | ⊘ | ⊘ | ⊘ |
| Feng | GFOLD | • | • | • | ⊘ | ⊘ | ⊘ |
| Wang | RankComp | • | • | • | ⊘ | ⊘ | ⊘ |
| Yang | FAIME | • | • | • | ⊘ | ⊘ | ⊘ |
| Drier | Pathifier | ⊘ | • | • | ⊘ | ⊘ | ⊘ |
| Li | N-of-1 pathways MixEnrich | • | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Li | N-of-1- | • | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Schissler | N-of-1 pathways MD | • | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Liu | DNB | • | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Wang | IndividPath | ⊘ | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Ahn | iPAS | ⊘ | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Schissler | ClusterT | • | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ |
| Tarazona | NOISeq | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ |
| Robinson | edgeR | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ |
| Wu | FPCA | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ |
| Barbie | ssGSEA | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ |
| Martini | timeClip | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ | ⊘ |
| Multi-omics | |||||||
| Vaske | PARADIGM | • | • | ⊘ | ⊘ | ⊘ | ⊘ |
| Chen | iPOP | ⊘ | • | ⊘ | ⊘ | ⊘ | ⊘ |