| Literature DB >> 32099411 |
Joseph Finkelstein1, Irena Parvanova1, Frederick Zhang2.
Abstract
As biomedical data integration and analytics play an increasing role in the field of stem cell research, it becomes important to develop ways to standardize, aggregate, and share data among researchers. For this reason, many databases have been developed in recent years in an attempt to systematically warehouse data from different stem cell projects and experiments at the same time. However, these databases vary widely in their implementation and structure. The aim of this scoping review is to characterize the main features of available stem cell databases in order to identify specifications useful for implementation in future stem cell databases. We conducted a scoping review of peer-reviewed literature and online resources to identify and review available stem cell databases. To identify the relevant databases, we performed a PubMed search using relevant MeSH terms followed by a web search for databases which may not have an associated journal article. In total, we identified 16 databases to include in this review. The data elements reported in these databases represented a broad spectrum of parameters from basic socio-demographic variables to various cells characteristics, cell surface markers expression, and clinical trial results. Three broad sets of functional features that provide utility for future stem cell research and facilitate bioinformatics workflows were identified. These features consisted of the following: common data elements, data visualization and analysis tools, and biomedical ontologies for data integration. Stem cell bioinformatics is a quickly evolving field that generates a growing number of heterogeneous data sets. Further progress in the stem cell research may be greatly facilitated by development of applications for intelligent stem cell data aggregation, sharing and collaboration process.Entities:
Keywords: data integration; databases; stem cells
Year: 2020 PMID: 32099411 PMCID: PMC6996484 DOI: 10.2147/SCCAA.S237361
Source DB: PubMed Journal: Stem Cells Cloning ISSN: 1178-6957
Figure 1A diagram of major utility features in the reviewed databases.
Data Elements in Stem Cell Databases
| Database Tittle | Author | Data Elements | Link |
|---|---|---|---|
| SyStemCell | Yu et al | Stem cell type and species; Gene annotations; DNA CpG 5 hmC/5 mC; | |
| Histone modification; Karyotype; Treatments performed on cell; | |||
| miRNA-based regulation; Protein abundance; Protein phosphorylation; | |||
| Transcription factor regulation | |||
| CODEX | Sanchez-Castillo et al | Next Generation Sequencing data; Cell type, subtype, and species; | |
| Tissue ontology; Cell culture conditions; Cell drug treatments; | |||
| Human chromosomal abnormalities | |||
| ESTOOLS Data@Hand | Kong et al | Microarray data; Sequencing platform; Species, sex, and disease status of donor; Tissue of origin; Cell type of parent and | |
| current cell line; Differentiation status | |||
| Stem Cell Discovery Engine | Sui et al | Microarray data; Species, strain, developmental stage, disease state of donor; Cell type and drug treatment; Tissue type and histology; Transcription profiling; Histone modification profiling;Transcription factor binding site identification; Immunoprecipitation antibody; Phenotype quality; Cell surface marker | |
| StemFormatics | Wells et al | Gene expression data; Sequencing platform; Cell type and subtype; | |
| Species, sex, age, and disease state of door; Tissue of origin | |||
| LINCS | Koleti et al | Species, disease state, mutation status, and genetic modification status of donor; Tissue of origin; Cell culture conditions; Protein treatment information; Antibody treatment; Small molecule treatment information; siRNA/shRNA treatment information | |
| LifeMap Discovery | Edgar et al | Gene expression data; Anatomic compartment and tissue of origin; Cell type and subtype; Cell culture conditions and treatments; Related clinical trials for cell therapies | |
| StemMapper | Pinto et al | Microarray data; Species, age, and mutation status of donor; Cell type and subtype | |
| ESCAPE | Xu et al | Cell surface markers; mRNA expression; Protein-protein interactions; | |
| Chip-Seq interactions; miRNA target interactions; Histone modification; Gene annotations; Cell type | |||
| SKiP Stemcell Knowledge and Information Portal | Species, age, sex, disease state, and ethnicity of donor; Tissue of origin; | ||
| Cell type; Cell morphology; Culture medium, feeders; Karyotype | |||
| hPSCreg | Sex and disease state of donor; Cell type; Cell derivation process; | ||
| Cell culture conditions; Cell surface markers | |||
| CellNet | Cahan et al | Stem cell subtype; Stem cell species; Sex of the donor; Transcription factors; | |
| Microarray data | |||
| StemCellNet | Pinto et al | Stem cell type; Stem cell lineage; Stem cell species; Cell surface markers; | |
| Transcription factors; Immunoprecipitation antibody; Microarray data; | |||
| Sequencing platform | |||
| HSC-explorer | Montrone et al | Stem cell lineage; Stem cell species; Differentiation status; Cell surface markers; Gene expression data; mRNA expression; Transcription profiling; | |
| Sequencing platform | |||
| CORTECON | van de Leemput et al | Stem cell type; Stem cell lineage; Stem cell species; Disease state of the donor; Cell surface markers; Gene expression data | |
| ESCD | Jung et al | Stem cell type; Stem cell lineage; Stem cell species; Cell surface markers; | |
| Transcription factors; Immunoprecipitation antibody; Gene expression data; | |||
| Sequencing platform |
Operational Characteristics of Stem Cell Databases
| Database | Search Features | Storage Platform and Interface | Privileges Needed for Access | Update Methodology | Current Status |
|---|---|---|---|---|---|
| SyStemCell | - Browse by organism, level of regulation, stem cell type, or control sample | - Stored in MySQL relational database and configured on RedHat Linux Server | Free access | Not specified | No longer available |
| - Search by Gene Entrez ID, gene symbol, or gene alias | - Web interface implemented on an Apache server | ||||
| - Online analysis tools developed in R | |||||
| CODEX | - Browse by cell type, repository, transcription factor, histones and associated proteins, or sequencing platform | - Not specified | Free access | Self-developed web crawler which constantly searches GEO database | Still available, unclear if still being updated |
| - Search by transcription factor, gene, or cell type | |||||
| ESTOOLS Data@Hand | - Can search by any annotation parameter: organism, cell type, disease state, etc. | - Stored in MySQL relational database | Free access with registration | Manually updated four times per year | No longer available |
| - Web interface implemented on Apache server | |||||
| Stem Cell Discovery Engine | - Search by free text, organism, measurement, technology, or platform | - Implemented on Harvard’s custom Galaxy server system | Free with registration | Researchers submit experiments which are then manually curated | Still available but has been merged into Harvard’s Stem Cell Commons data repository |
| StemFormatics | - Search by gene, species, author, platform, or cell type | - Not specified | Free access but registration needed to access certain features | Manually updated | Still active and available |
| LINCS | - Search by datasets, small molecules, or cells via any annotation | - Stored in PostgreSQL database | Free access | Data is submitted and uploaded via the LINCS Data Registry protocol | Still active but all features not yet available |
| - Free text search of all data with Boolean logic integration | - Web interface implemented using Apache Solr | ||||
| - Search by chemical structure by uploading mol files | |||||
| LifeMap Discovery | - Search by organ/tissue, cell type, related cell therapies, or gene expression | - Not specified | Free access but registration needed for access to all features | Not specified | Still active and available |
| StemMapper | - Search by tissue, cell type, and gene | - Stored in MySQL relational database | Free access | Not specified | Still active and available |
| - Web interface implemented using JavaScript ad JavaServer Faces 2.1 | |||||
| ESCAPE | - Browse by data type | - Stored in MySQL relational database | Free access | Not specified | Still available |
| - Search by gene name, gene ID, cell type, PubMed ID, or submission date | - Displayed via Cytoscape web plugin | ||||
| - Web interface implemented on Apache server and Java server | |||||
| SKiP Stem Cell Knowledge and Information Portal | - Search by associated disease, cell type, tissue type, disease status, donor characteristics, vector, database, PubMed ID, or text search | Not specified | Free access | Not specified | Still available but last updated in 2017 |
| hPSCreg | - Search by country, date, PubMedID, disease, cell type, or free text search | Not specified | Free access but registration needed for access to all features | Not specified | Still active and available |
| CellNet | - Browse by organism | - Online analysis tool developer in R | Free access | Not specified | Computational platform is available |
| - Search by cell deriviation, transcriptional targets, gene regulatory | - Affy packages by Bioconductor need to be installed | ||||
| -Review targets, gene regulatory network of starting cell type, gene expression | - Web interface implemented using JavaScript and JavaServer Faces 2.1 | ||||
| StemCellNet | - Browse by organism | - Not specified | Free access | Not specified | Still available versitile platform |
| - Search by proteins, transcription factors, genes | |||||
| HSC-explorer | - Browse by Gene, protein, SNP, biological process, cellular component, tissue, cell line, organism, chemical compound | - Not specified | Free access | Manually curated | The platform is expanded to cover all hematopoietic progenitors |
| ‘- Search by entry ID, Pubmed ID, comment, author, etc. | |||||
| CORTECON | - View by gene, disease, KEGG pathway, GO ontology | - Not specified | Free access | Not specified | Available for download |
| - Provides tools for browsing genes, expressed during cortical development | |||||
| ESCD | - Browse by Ensembl gene IDs and gene names | ||||
| - Browse by organism, transcription factors, GO terminology | - Not specified | Free access | Not specified | Datasets originate from genome version platforms |
Data Analysis Tools Used in Stem Cell Databases
| Database | Data Analysis Tools |
|---|---|
| 1. SyStemCell | 1.1 Co-localization analysis; 1.2 Venn Diagram plotting; 1.3 DAVID enrichment analysis |
| 2. CODEX | 2.1 Motif discovery analysis and peak profile correlation analysis |
| 3. ESTOOLS Data@Hand | 3.1 Differential expression analysis; 3.2 Co-expression analysis; 3.3 Clustering and heatmap analysis; 3.4 Gene expression profiling; 3.5 Sample-wise clustering and dendrogram generation; 3.6 GO and KEGG enrichment |
| 4. Stem Cell Discovery Engine | 4.1 Gene list comparison by gene signatures, molecular signatures, and pathways |
| 5. StemFormatics | 5.1 Gene clustering and heatmap analysis through Hamlet; 5.2 Hierarchical clustering and comparative marker selection through GenePattern module |
| 6. LINCS | 6.1 Drug pathway browser; 6.2 LINCS analytics; 6.3 Drug/Cell line Browser; 6.4 Repurposing App; etc. |
| 7. LifeMap Discovery | 7.1 Partnered with custom GeneAnalytics gene analysis suite |
| 8. StemMapper | 8.1 Heatmap, Principal Components Analysis, and Pearson correlation analysis |
| 9. ESCAPE | 9.1 Enrichment analysis and lineage prediction |
| 10. SKiP Stem Cell Knowledge and Information Portal | - None |
| 11. hPSCreg | - None |
| 12. CellNet | 12.1 R software analysis of protein and transcription regulatory interactions |
| 13. StemCellNet | 13.1 Web-based analysis of molecular networks; 13.2 Statistical analysis examining physical protein interaction, transcriptional reg. interactions; |
| 13.3 Online statistical analysis platform allowing up to 500 genes input | |
| 14. HSC-explorer | 14.1 Bioinformatics resource allowing the search of pathways correlated to hematopoiesis; |
| 14.2 Web-based analysis platform allowing visualization of data. Diagrams can be downloaded as SBML, graphML, or jpg files; | |
| 14.3 Database is linked to EntrezGene, KEGG, miRBAse, Gene Ontology, or CORUM | |
| 15. CORTECON | 15.1 R-analysis using developing cerebral cortex transcriptome in humans; 15.2 Online resource allowing multiple views of developing cortex data |
| 16. ESCD | 16.1 Online repository which contains perturbation and ChIP experiments in ESCs |
Ontology-Based Applications in Stem Cell Databases
| Databases | Ontology Interactions |
|---|---|
| 1. SyStemCell | 1.1 Gene Ontology database; 1.2 Biocarta Pathway; 1.3 Biosystems Pathway, and 1.4 dbDEPC database (correlation analysis). |
| 2. CODEX | 2.1 Gene Quest (TF’s targets and gene associations for human samples; ChIPpeakAnno and peak-to-gene associations). |
| 3. ESTOOLS Data@Hand | 3.1 Differential expression analysis; 3.2 Co-expression analysis; 3.3 Clustering and heatmap analysis; 3.4 Gene expression profiling. |
| 3.5 Sample-wise clustering and dendrogram generation; 3.6 GO and KEGG enrichment. | |
| 4. Stem Cell Discovery Engine | 4.1 Galaxy analysis software to enhance ontology searches. |
| 5. StemFormatics | 5.1 Gene clustering; 5.2 Hierarchical clustering and comparative marker selection through GenePattern module |
| 6. LINCS | 6.1 Drug-pathway browser, TieDIE (HMS LINCS); 6.2 Drug/Cell line Browser, Enrichr, Drug Response Browser (BD2K-LINCS DCIC); |
| 6.3 Omics Integrator (Neuro LINCS); 6.4 ICV App (LINCS Transcriptomics), etc. | |
| 7. LifeMap Discovery | 7.1 A detailed description of the developmental ontology of organ/tissues, anatomical compartments and cells; 7.2 Gene expression profiling in adult mammalian organs, tissues, anatomical compartments and cells, cultured stem, progenitor and primary cells, or cells derived via differentiation protocols to allow characterization of cells by gene expression patterns. |
| 8. StemMapper | 8.1 Heatmaps; 8.2. Correlation analysis. |
| 9. ESCAPE | 9.1 GOI enrichment with ESCAPE-listed genes RNAi screens, protein lists from IP-MS pull-downs, genes differentially expressed after knock-down or over-expression, and target genes for transcription factors and histone modifications as determined by ChIP-seq. |
| 10. SKiP Stem Cell Knowledge and Information Portal | n/a |
| 11. hPSCreg | n/a |
| 12. CellNet | 12.1 Analysis of protein and transcription regulatory interactions. |
| 13. StemCellNet | 13.1 Statistical analysis examining physical protein interaction, transcriptional reg. interactions; 13.2 Online statistical analysis platform allowing up to 500 genes input. |
| 14. HSC-explorer | 14.3 Database is linked to EntrezGene, KEGG, miRBAse, Gene Ontology, or CORUM. |
| 15. CORTECON | 15.1 Analysis using developing cerebral cortex transcriptome in humans. |
| 16. ESCD | 16.1 Query by Gene ID; 16.2 Query by GO term. |
Stem Cell Databases are Characterized by Different Number of Rubrics
| Stem Cells Databases | # of Rubrics per Database |
|---|---|
| SCDE | 21 |
| CODEX | 16 |
| ESTOOLS | 16 |
| SyStemCell | 15 |
| StemFormatics | 14 |
| StemMapper | 14 |
| SKiP Stemcell | 13 |
| LifeMap Discovery | 12 |
| ESCAPE | 12 |
| hPSCreg | 12 |
| StemCellNet | 12 |
| HSC-explorer | 12 |
| ESCD | 12 |
| LINCS | 10 |
| CORTECON | 10 |
| CellNet | 9 |
Representation of Characterization Rubrics in Stem Cell Databases (Plus Sign Indicates Presence in a Database)
| MIACARM Module | Rubric | SyStemCell | CODEX | ESTOOLS | SCDE | StemFormatics | LINCS | LifeMap Discovery | StemMapper | ESCAPE | SKiP | hPSCreg | CellNet | StemCellNet | HSC-explorer | CORTECON | ESCD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. Project | 1. Summary | + | + | + | + | + | + | + | + | + | + | + | + | + | + | + | + |
| 2. Organization | + | + | + | + | + | + | + | + | + | + | + | + | |||||
| 3. PI/Corresponding researcher | + | + | + | + | + | + | |||||||||||
| 4. Publication | + | + | + | + | + | + | + | + | + | + | + | + | + | + | |||
| 2. Assay | 5. Clinical Trials for Cell Therapies | + | |||||||||||||||
| 3. Source Cell | 6. Stem Cell Type/Name | + | + | + | + | + | + | + | + | + | + | + | + | + | + | ||
| 7. Stem Cell Subtype | + | + | + | + | |||||||||||||
| 8. Stem Cell Lineage | + | + | + | + | + | + | + | + | + | + | + | + | + | + | + | + | |
| 9. Stem Cell Species | + | + | + | + | + | + | + | + | + | + | + | + | + | + | + | + | |
| 10. Sex of the Donor | + | + | + | ||||||||||||||
| 11. Age of Donor | + | + | + | ||||||||||||||
| 12. Ethnicity of the Donor | + | ||||||||||||||||
| 13. Disease State of Donor | + | + | + | + | + | + | + | + | |||||||||
| 14. Developmental Stage | + | ||||||||||||||||
| 15. Cell Derivation Process | + | ||||||||||||||||
| 16. Differentiation Status | + | + | + | ||||||||||||||
| 17. Cell Morphology | + | ||||||||||||||||
| 18. Phenotype Quality | + | ||||||||||||||||
| 19. Tissue Type | + | + | + | + | + | + | + | ||||||||||
| 20. Tissue Histology | + | ||||||||||||||||
| 21. Cell Surface Markers | + | + | + | + | + | + | + | + | + | ||||||||
| 22. Transcription Factors Expression | + | + | + | + | + | + | |||||||||||
| 23. Epigenetic Modification | + | + | + | + | |||||||||||||
| 24. Gene Annotation | + | + | |||||||||||||||
| 25. miRNA-based Regulation/Interactions | + | + | |||||||||||||||
| 26. Protein Phosphorylation | + | ||||||||||||||||
| 27. Karyotype | + | + | + | ||||||||||||||
| 28. Mutation Status of a Donor | + | + | + | ||||||||||||||
| 29. Cell Culture Conditions | + | + | + | + | + | + | + | ||||||||||
| 4. Experimental Technology | 30. Immunoprecipitation Antibody | + | + | + | |||||||||||||
| 31. Gene Expression Data | + | + | + | + | + | ||||||||||||
| 32. Microarray Data | + | + | + | + | + | + | |||||||||||
| 33. miRNA Expression/Transcription | + | + | + | + | + | ||||||||||||
| 5. Data | 34. Sequencing Platform | + | + | + | + | + | + | + | + | + | + | + | + | ||||
| 35. Data analysis tools | + | + | + | + | + | + | + | + | + | + | + | + | + |
Complexity of Rubrics Characterizing Stem Cell Research Databases
| Stem Cells Database Rubrics | Complexity of Each Rubric | Complexity Breakdown | % of Databases with the Rubric |
|---|---|---|---|
| Summary | N/A | N/A | 100% |
| Stem Cell Lineage | 8 | Adult stem/progenitor cell; embryonic progenitor cell; ESC; fetal stem/progenitor cell; iPSC; cell lines; primary cells; and tissue/cell composite | 100% |
| Stem Cell Species | 5 | human, mouse, rat, pig, macac | 100% |
| Publications | N/A | N/A | 88% |
| Stem Cell Type/Name | Thousands of stem cell types | 88% | |
| Data analysis tools | 28 | Co-localization analysis; Venn Diagram plotting; DAVID-enriched analysis; peak profile correlation analysis; gene set control analysis; motif discovery analysis; differential expression analysis; clustering and heatmap analysis; gene expression; GO and KEGG enrichment; gene list comparison, and 15 more | 88% |
| Organization | N/A | N/A | 75% |
| Sequencing Platform | Next Generation Sequencing | Illumina, others | 75% |
| Cell Surface Markers | Large number of cell surface proteins | 56% | |
| Disease State of Donor | Large number of diseases | X-linked juvenile retinoschisis; Pathological myopia; Short rib-thoracic dysplasia syndrome; DFNB4, | 50% |
| Pendred syndrome; Xp22.2 Exon; Childhood acute B-lymphoblastic leukemia; Parkinson’s disease; | |||
| Alzheimer’s disease; Fragile X-syndrome and others | |||
| Tissue Type | Large number of tissue types | Blood; Adipose; Mesenchymal Stem Cells; Endothelium; Bone; Heart; Cartlidge; Liver; Amnion; | 44% |
| Tooth; Placenta; Uterus; Epithelial Cells; Umbilical Cord, and others | |||
| Cell Culture Conditions and Treatment | Variety of conditions | 44% | |
| PI/Corresponding Researcher | N/A | N/A | 38% |
| Transcription Factors Expression/Regulation | 210 (CODEX); +9 other databases | RUNX1; CBFB; CEBPA; CTDP1; EP300; ERG; FUSELF1; ERG; FLI1; GATA2; HDAC1; MYH11 | 38% |
| RUNX3; CDK7; CDK9; POLR2A; BCL6; BCOR; SMRT; ZNF143; SPI1; BACH2, and others | |||
| Microarray Data | 420 microarray datasets (StemFormatics); +5 other databases | 38% | |
| Gene Expression Data | 10,401 (LifeMap Discovery); 5813 (ESCAPE); +4 other databases | 31% | |
| miRNA Expression/Transcription Profiling | N/A | N/A | 31% |
| Stem Cell Subtype | 132 (CODEX); +3 other databases | [CL]CMK; [PC]CD34+ acute myeloid leukemia blast cells; [CL]Kasumi-1+ scrambled siRNA; | 25% |
| [CL]ME-1; [CL]NB4; [CL]Lymphoblastoid; [PC] H1 ES derived angiogenic hematopoietic progenitor; | |||
| [PC] H7 ES derived anterior foregut, and others | |||
| Epigenetic modifications | 72 (CODEX); +11 other databases (Histone Modifications) | H3K4me1; H2A.Zac; H3K9Ac; H3K9K14ac; H2AFZ/H2A.Z; H3K4me3; H4K20me3; H3K27ac; | 25% |
| H3K27me3; H3K4me2; Ab4729; H3K79me2; H3Y41ph, and others | |||
| Sex of the donor | 2 | Female, male | 19% |
| Age of the Donor | N/A | N/A | 19% |
| Differentiation Status | N/A | N/A | 19% |
| Karyotype | N/A | N/A | 19% |
| Mutation Status of a Donor | Large number of abnormalities listed (CODEX); +2 other databases | Trisomy 21; translocation t(16,21; Acute Monocytic Leukemia; cervical adenocarcinoma; | 19% |
| p53 mutation; K562 erythrocytic leukaemia cells; erythrocytic leukemia cells; and others | |||
| Immunoprecipitation Antibody | N/A | N/A | 19% |
| Gene Annotation | N/A | N/A | 13% |
| miRNA-based Regulation/Interactions | N/A | N/A | 13% |
| Clinical Trials for Cell Therapies | 568 (LifeMap Discovery) | 6% | |
| Ethnicity of the Donor | N/A | N/A | 6% |
| Developmental Stage | N/A | N/A | 6% |
| Cell Derivation Process | N/A | N/A | 6% |
| Cell Morphology | N/A | N/A | 6% |
| Phenotype Quality | N/A | N/A | 6% |
| Tissue Histology | N/A | N/A | 6% |
| Protein Phosphorylation | N/A | N/A | 6% |