| Literature DB >> 34976030 |
Shuguang Han1, Ning Wang2, Yuxin Guo1,3, Furong Tang1,4, Lei Xu4, Ying Ju5, Lei Shi6.
Abstract
Inspired by L1-norm minimization methods, such as basis pursuit, compressed sensing, and Lasso feature selection, in recent years, sparse representation shows up as a novel and potent data processing method and displays powerful superiority. Researchers have not only extended the sparse representation of a signal to image presentation, but also applied the sparsity of vectors to that of matrices. Moreover, sparse representation has been applied to pattern recognition with good results. Because of its multiple advantages, such as insensitivity to noise, strong robustness, less sensitivity to selected features, and no "overfitting" phenomenon, the application of sparse representation in bioinformatics should be studied further. This article reviews the development of sparse representation, and explains its applications in bioinformatics, namely the use of low-rank representation matrices to identify and study cancer molecules, low-rank sparse representations to analyze and process gene expression profiles, and an introduction to related cancers and gene expression profile database.Entities:
Keywords: cancer; gene expression profile; low-rank representation; machine learning; sparse representation
Year: 2021 PMID: 34976030 PMCID: PMC8715914 DOI: 10.3389/fgene.2021.810875
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Common cancer databases.
| Database name | Database introduction |
|---|---|
| GEO | The GEO database stores the records (series, samples, and platforms) provided by the original submitter and the sorted data set, but not all the records provided by the original submitter have been assembled into a selected data set. And the selected data sets form the basis of GEO’s advanced data display and analysis functions |
| TCGA | The Cancer Genome Atlas (TCGA) is a publicly funded project aimed at cataloging and discovering major oncogenic genome changes in order to create a comprehensive “atlas” of cancer genome maps. So far, TCGA researchers have passed large-scale genome sequencing and synthesis Multidimensional analysis analyzed a large cohort of more than 30 human tumors |
| KEGG | The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for analyzing gene function based on genetic and molecular network systems. KEGG maintains the GENES database and the LIGAND database |
| COSMIC | COSMIC provides comprehensive information about somatic mutations in human cancers. Version v48 (July 2010) describes more than 136,000 coding mutations in nearly 542,000 tumor samples; it aims to collect, manage, organize and present cancer somatic mutations in the world. The information is provided free of charge in a variety of useful ways and can be accessed at |
| UCSC Cancer Genomics Browser | UCSC Cancer Genomics Browser is a set of web-based tools designed to integrate, visualize and analyze genomic and clinical data. It consists of three main components: hgHeatmap, hgFeatureSorter and hgPathSorter, which can be browsed at |
| ArrayMapCancer | ArrayMap provides preprocessed tumor genome chip data and CNA maps. In the ArrayMap database, users can search for samples they are interested in, and on this basis, analyze the CNA on the gene or genome fragment of interest |
Commonly used gene expression profile database.
| Name database | The data source | Database introduction |
|---|---|---|
| RNA-Seq Atlas | Network-based RNA-Seq gene expression profile and query tool library | This is the first open-access database that provides data mining tools and large-scale RNA-Seq expression profiling. Its application will be multifaceted, because it will help to identify tissue-specific genes and expression profiles, compare gene expression profiles between different tissues, and systems biology methods that link tissue function to changes in gene expression |
| GEO | The National Center for Biotechnology Information (NCBI) was established | The initial goal was to serve as a public repository for high-throughput gene expression data mainly generated by microarray technology. In addition, the database also includes comparative genome analysis, chromatin immunoprecipitation analysis describing genomic protein interactions, non-coding RNA analysis, SNP genotyping, and genome methylation status analysis |
| ArrayExpress | Alvis Brazma from EBI et al | It is a functional genomics database under the European Bioinformatics Association (EMBL-EBI), which collects and organizes data from genomics experiments based on microarrays and sequencing to support reproducible research. It is also one of the main knowledge bases for functional genomics experiments based on microarray and high-throughput sequencing. All data is provided in MAGE-TAB format |
FIGURE 1Method for mining cancer molecular features.
FIGURE 2Method for mining cancer molecular features using a low-rank representation matrix.
FIGURE 3Schematic diagram of the robustness feature acquisition of medical images based on perceptual hashing and a neural network.
FIGURE 4Research procedure for gene database analysis based on low-rank sparse representation.