| Literature DB >> 33334320 |
Yanzhen Xu1, Xiaohan Zhao1, Shuai Liu1, Wen Zhang2.
Abstract
BACKGROUND: Many transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand.Entities:
Keywords: Attention mechanism; Feature ensemble learning; lncRNA prediction
Mesh:
Substances:
Year: 2020 PMID: 33334320 PMCID: PMC7745355 DOI: 10.1186/s12864-020-07237-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Details of six types of features
| Feature | Dimensionality | Parameter | ||
|---|---|---|---|---|
| Transcript-specified | ORF length | 1 | No parameter | |
| ORF integrity | 1 | No parameter | ||
| ORF coverage | 1 | No parameter | ||
| Fickett score | 1 | No parameter | ||
| Hexamer score | 1 | No parameter | ||
| pI | 1 | No parameter | ||
| Gravy | 1 | No parameter | ||
| Instability index | 1 | No parameter | ||
| CTD | 30 | No parameter | ||
General sequence-derived | Spectrum profile | 1-mer | 4 | No parameter |
| 2-mer | 16 | No parameter | ||
| 3-mer | 64 | No parameter | ||
| 4-mer | 256 | No parameter | ||
| 5-mer | 1024 | No parameter | ||
| Mismatch profile | (3, m)-mismatch profile | 64 | m: the maximum mismatch | |
| (4, m)-mismatch profile | 256 | m: the maximum mismatch | ||
| (5, m)-mismatch profile | 1024 | m: the maximum mismatch | ||
| Reverse complement k-mer profile | 1-RevcKmer | 2 | No parameter | |
| 2-RevcKmer | 10 | No parameter | ||
| 3-RevcKmer | 32 | No parameter | ||
| 4-RevcKmer | 136 | No parameter | ||
| 5-RevcKmer | 528 | No parameter | ||
| Pseudo nucleotide composition | PC-PseDNC-General | 16 + λ | λ: the highest counted rank | |
| PC-PseTNC-General | 64 + λ | λ: the highest counted rank | ||
| SC-PseDNC-General | 16 + 6 × λ | λ: the highest counted rank | ||
| SC-PseTNC-General | 64 + 12 × λ | λ: the highest counted rank | ||
| PseDNC | 16 + λ | λ: the highest counted rank | ||
| Auto-cross covariance | DACC | 36 × lag | lag: the distance between residues | |
| TACC | 4 × lag | lag: the distance between residues | ||
Fig. 1Training processes of LncPred-IEL and LncPred-ANEL. a AUC scores of LncPred-IEL models in each iterating time. b Loss scores of LncPred-ANEL models in each training epoch
Fig. 2LargeVis visualization of feature vectors. a and b are the visualizations of feature vectors before and after feature ensemble of LncPred-IEL respectively. c and d are the visualizations of feature vectors before and after feature ensemble of LncPred-ANEL respectively
Fig. 3LncRNA prediction model performances on balanced and imbalanced datasets of Human and Mouse respectively
Summary of the datasets
| Description | Species | # Positive | # Negative |
|---|---|---|---|
| Main datasets | Human | 24,162 | 24,162 |
| Mouse | 27,595 | 27,595 | |
| CPPred datasets | Human | 23,384 | 23,384 |
| Mouse | 15,345 | 15,345 | |
| Fruit Fly | 2775 | 17,399 | |
| Zebrafish | 6840 | 15,534 |
Fig. 4Cross-species prediction results of LncPred-IEL (a) and LncPred-ANEL (b)
Fig. 5The workflow of LncPred-IEL (a) and LncPred-ANEL (b)