| Literature DB >> 32670412 |
Olga Fajarda1, Sara Duarte-Pereira1,2, Raquel M Silva1,2,3, José Luís Oliveira1.
Abstract
BACKGROUND: Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used.Entities:
Keywords: Gene expression signature; Heart disease; Microarray data; Random forest
Year: 2020 PMID: 32670412 PMCID: PMC7346458 DOI: 10.1186/s13040-020-00217-8
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Summary of the nine datasets used in this study
Number of GenBank identifiers remaining after the removal of repeated GenBank identifiers
| Platform | GPL96 | GPL570 | GPL6244 | GPL11532 |
|---|---|---|---|---|
| No. of GenBank identifier | 20079 | 48100 | 121142 | 57714 |
Fig. 1Venn diagram of common GenBank identifiers. The Venn diagram presents the overlap of GenBank identifiers across the four microarray platforms: GPL96, GPL570, GPL6244 and GPL11532
Fig. 2MDS plot before and after batch-adjustment. MDS was performed using the 689 samples of the merged dataset. Before batch-adjustment four clusters of samples driven by the four platforms can be observed. After batch-adjustment no cluster can be observed
For every fold change, the number of features, the number of genes, the accuracy and the balanced accuracy of the classifier
For every fold change the mean and the 95% confidence interval of the specificity, precision, and recall of the classifier
For every fold change the mean and the 95% confidence interval of the F1 score and MCC of the classifier
For every fold change the mean and the 95% confidence interval of the AUC and the AUCPR of the classifier
Common up-regulated genes in heart disease obtained when using a fold change cutoff of 1.6
Common down-regulated genes in heart disease obtained when using a fold change cutoff of
Fig. 3Gene Ontology significant results. Gene Ontology significant results of the up-regulated genes from the gene set with a fold change of 1.6, showing biological process (blue) and molecular function (orange)
Fig. 4Protein-protein interactions of up-regulated protein coding genes. a Protein-protein interactions of the up-regulated protein coding genes from the feature set with fold change cutoff of 1.6, was retrieved from STRING database, resulting in a network of 25 edges between 17 nodes. Each node represents one protein and the edges represent the interactions. The line thickness indicates the strength of data support (text mining, experiments, databases and co-expression were selected from the options), with a default median level of confidence. b The number of interactions of the nodes with more than one interaction is represented