| Literature DB >> 28185568 |
Sara Aibar1, Maria Abaigar2,3, Francisco Jose Campos-Laborie1, Jose Manuel Sánchez-Santos1,4, Jesus M Hernandez-Rivas2,3, Javier De Las Rivas5.
Abstract
BACKGROUND: In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease. The stages usually correspond to different subtypes or classes of the disease, and the difficulty to identify them often comes from patient heterogeneity and sample variability that can hide the biomedical relevant changes that characterize each stage, making standard differential analysis inadequate or inefficient.Entities:
Keywords: Bioinformatics; Cancer; Data integration; Disease progression; Disease stage; Disease subtype; Expression pattern; Expression profiling; Gene expression; Gene signature; Leukemia; Pattern recognition; Transcriptomics
Mesh:
Year: 2016 PMID: 28185568 PMCID: PMC5133487 DOI: 10.1186/s12859-016-1290-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of samples in each dataset of the studied diseases (myelodysplastic syndrome MDS, Alzheimer's disease AD and colorectal cancer CRC) divided in stages and ordered according to the progression of each disease
| Disease subtypes and number of patient samples | ||||||
|---|---|---|---|---|---|---|
| Myelodysplastic Synd. (MDS) | Control | RCUD | RCMD | RAEB1 | RAEB2 | AML |
| Patient samples (N) | 11 | 6 | 17 | 4 | 5 | 10 |
| Alzheimer’s disease (AD) | Control | Incipient | Moderate | Severe | ||
| Patient samples (N) | 8 | 7 | 8 | 7 | ||
| Colorectal cancer (CRC) | Control | Stage 1 | Stage 2 | Stage 3 | Stage 4 | |
| Patient samples (N) | 25 | 13 | 37 | 34 | 20 | |
MDS dataset includes 53 samples in 6 stages. AD dataset includes 30 samples and 4 stages. CRC dataset includes 129 samples and 5 stages. The controls correspond in all cases to samples of individuals without the disease. The stages are placed according to the progression of the diseases from the controls (no-disease) to the more acute or severe pathological states
Fig. 1Example of disease stages. Scheme showing two ways of setting up the stages of a disease taking MDS case as example. In both cases the stages must be placed in progressive order considering one initial stage, usually taken as control or normal stage, and one terminal stage that usually corresponds to the most severe or acute stage of the disease (Acute Disease). The stages are considered as discrete –i.e. not as continuous variables– and independent –since they correspond to the evolution measured in different individuals–. a An example for 6 stages taken from the MDS case, considering different disease subtypes. b An example for 4 stages taken from the MDS case, considering only low-risk and high-risk subtypes
Fig. 2Workflow overview of the results provided by the methodology proposed. a Expression patterns (clusters) found using SOM on the correlations obtained for each gene along the stages with the Gamma rank correlation. Highlighted in blue the 4 patterns selected (for the case of the MDS dataset) as the most representative of 9 profiles explored, which included most of the features and the largest changes: 2 increasing (p1 and p2) and 2 decreasing (p3 and p4). b Standardized and sorted expression of the genes included in each pattern. Blue: samples in control or initial stages; red: samples in late or acute stage; grey: intermediate stages. c Boxplots of the expression signals of four example genes that follow each one of the 4 patterns found. These genes also correspond to the MDS dataset and the plots include 6 stages of the disease
Fig. 3Patterns found in the analyses of the gene-stage expression profiles of 3 disease datasets. a Myelodysplastic Syndrome, MDS. b Alzheimer’s Disease, AD. c Colorectal Cancer, CRC. The results correspond to the outputs of the SOM analyses done in all cases with a maximum of 9 (3x3) possible distinct profiles. In all cases 4 significant patterns were found. The number of genes included in each pattern are indicated in each case
Fig. 4Patterns found in the analyses of the expression patterns of a simultaed RNA-seq dataset. The dataset includes RPKM signals for 1000 genes and 18 samples divided in 6 stages. a Outputs of the SOM analysis that identifies 4 main patterns including: 50 genes up-early, 51 up-late, 50 down-early and 50 down-late. In this 201 genes found, only one was a false positive and there are not any false negatives. b Plots of expression distributions of the genes found in each pattern. The plots represent the expression signal distributions (as log2 of the RPKM values +1) of the genes in each of the 4 patterns