| Literature DB >> 34604741 |
Thanh M Nguyen1, Samuel Bharti2, Zongliang Yue1, Christopher D Willey3, Jake Y Chen1.
Abstract
Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as "clinotypes," such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)-patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/.Entities:
Keywords: SEAS; clinotype; glioblastoma multiforme; patient-derived xenograft; sample enrichment analysis
Year: 2021 PMID: 34604741 PMCID: PMC8481385 DOI: 10.3389/fdata.2021.725276
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
FIGURE 1Overview of data processing and analysis.
FIGURE 2Screenshot showing that SEAS visualizes the TCGA-GBM patients using embedding, and the user manually selects the subcohort.
FIGURE 7SEAS identifies enriched clinical features for the subcohort in Figure 6.
FIGURE 4SEAS identifies a subcohort by a circle region around PDX JX14P_A datapoint.
FIGURE 6SEAS identifies a subcohort by a circle region around PDX JX14P_RT_A datapoint.
FIGURE 3SEAS identifies a subcohort by clustering the TCGA-GBM patients (green dots on the top-right of the embedding scatterplot).
FIGURE 5SEAS identifies enriched clinical features for the subcohort in Figure 4.