| Literature DB >> 35035785 |
Yongheng Wang1,2, Jincheng Zhai1, Xianglu Wu3, Enoch Appiah Adu-Gyamfi2, Lingping Yang3, Taihang Liu1,2, Meijiao Wang2, Yubin Ding2,3, Feng Zhu4, Yingxiong Wang2, Jing Tang1,2.
Abstract
The long non-coding RNAs (lncRNAs) play critical roles in various biological processes and are associated with many diseases. Functional annotation of lncRNAs in diseases attracts great attention in understanding their etiology. However, the traditional co-expression-based analysis usually produces a significant number of false positive function assignments. It is thus crucial to develop a new approach to obtain lower false discovery rate for functional annotation of lncRNAs. Here, a novel strategy termed DAnet which combining disease associations with cis-regulatory network between lncRNAs and neighboring protein-coding genes was developed, and the performance of DAnet was systematically compared with that of the traditional differential expression-based approach. Based on a gold standard analysis of the experimentally validated lncRNAs, the proposed strategy was found to perform better in identifying the experimentally validated lncRNAs compared with the other method. Moreover, the majority of biological pathways (40%∼100%) identified by DAnet were reported to be associated with the studied diseases. In sum, the DAnet is expected to be used to identify the function of specific lncRNAs in a particular disease or multiple diseases.Entities:
Keywords: Coefficient of variation; Disease-associated SNPs; Functional prediction; Long non‐coding RNA; WGCNA
Year: 2021 PMID: 35035785 PMCID: PMC8724965 DOI: 10.1016/j.csbj.2021.12.016
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Twenty-four datasets of eight disease types were collected for function analysis of lncRNA. The first 22 datasets were collected from GEO and the last two datasets were collected from TCGA. MDD: major depressive disorder; VHD: valvular heart disease; AF-VHD: valvular heart disease with atrial fibrillation; SLE: systemic lupus erythematosus; ALL: acute lymphoblastic leukemia; TPM: Transcripts Per Million; Normalized: DESeq normalized; nRPKM: normalized Reads Per Kilobase of transcript, per Million mapped reads; FPKM: Fragments Per Kilobase of exon per Million; RPKM: Reads Per Kilobase of transcript per Million reads mapped; Normalized signal intensity: Quantile normalization using the GeneSpring software.
| 8A20 | GSE113524 | 19 Alzheimer disease20 Healthy controls | TPM (RNA-Seq) | 12,937 lncRNAs & 18,969 mRNAs |
| 8A20 | GSE104704 | 12 Alzheimer disease10 Healthy controls | Normalized (RNA-Seq) | 2,199 lncRNAs & 17,965 mRNAs |
| 8A20 | GSE125583 | 219 Alzheimer disease70 Healthy controls | nRPKM (RNA-Seq) | 2,803 lncRNAs & 18,852 mRNAs |
| 6A70 | GSE101521 | 30 MDD29 Healthy controls | Normalized (RNA-Seq) | 11,109 lncRNAs & 18,754 mRNAs |
| 6A70 | GSE102556 | 26 MDD22 Healthy controls | FPKM (RNA-Seq) | 12,718 lncRNAs & 18,793 mRNAs |
| 6A20 | GSE112523 | 29 Schizophrenia28 Healthy controls | Reads Count (RNA-Seq) | 12,179 lncRNAs & 18,437 mRNAs |
| BA41 | GSE65705 | 32 Myocardial infarction2 Healthy controls | RPKM (RNA-Seq) | 1,351 lncRNAs & 17,801 mRNAs |
| BA41 | GSE127853 | 3 Myocardial infarction3 Healthy controls | FPKM (RNA-Seq) | 503 lncRNAs & 10,216 mRNAs |
| BD40 | GSE97210 | 3 Atherosclerosis3 Healthy controls | Normalized signal intensity (Microarray) | 10,347 lncRNAs & 18,604 mRNAs |
| BD40 | GSE120521 | 4 Atherosclerosis unstable4 Atherosclerosis stable | FPKM (RNA-Seq) | 10,343 lncRNAs & 18,381 mRNAs |
| BC81 | GSE113013 | 5 AF-VHD5 VHD | Normalized signal intensity (Microarray) | 10,347 lncRNAs & 18,604 mRNAs |
| BC81 | GSE108660 | 5 Atrial fibrillation5 Non-atrial fibrillation | Normalized signal intensity (Microarray) | 8,090 lncRNAs & 18,807 mRNAs |
| CA23 | GSE106388 | 15 Mild asthma4 Healthy controls | Reads Count (RNA-Seq) | 8,036 lncRNAs & 17,244 mRNAs |
| CA23 | GSE96783 | 21 Asthma30 Healthy controls | Reads Count (RNA-Seq) | 10,451 lncRNAs & 18,324 mRNAs |
| DD71 | GSE128682 | 14 Ulcerative colitis16 Healthy controls | Reads Count (RNA-Seq) | 1,756 lncRNAs & 17,355 mRNAs |
| 4A40 | GSE131525 | 3 SLE3 Healthy controls | Reads Count (RNA-Seq) | 6,031 lncRNAs & 16,972 mRNAs |
| 5A10 | GSE131526 | 12 Type-1 diabetes3 Healthy controls | Reads Count (RNA-Seq) | 6,798 lncRNAs & 16,458 mRNAs |
| 5B81 | GSE129398 | 12 Obesity10 Controls | Reads Count (RNA-Seq) | 822 lncRNAs & 14,300 mRNAs |
| 5B81 | GSE145412 | 8 Obesity8 Controls | TPM (RNA-Seq) | 6,896 lncRNAs & 16,595 mRNAs |
| 5A11 | GSE133099 | 6 Type-2 diabetes6 Lean controls | Reads Count (RNA-Seq) | 8,843 lncRNAs & 17,480 mRNAs |
| 2B33 | GSE141140 | 13 ALL4 Healthy controls | Reads Count (RNA-Seq) | 867 lncRNAs & 16,297 mRNAs |
| 2B91 | GSE144259 | 6 Colorectal cancer3 Healthy controls | FPKM (RNA-Seq) | 3,249 lncRNAs & 18,604 mRNAs |
| 2C6Z | TCGA-BC | 115 Breast cancer113 Healthy controls | FPKM (RNA-Seq) | 14,097 lncRNAs & 19,631 mRNAs |
| 2D10 | TCGA_TC | 510 Thyroid cancer58 Healthy controls | Reads Count (RNA-Seq) | 13,618 lncRNAs & 19,493 mRNAs |
Optimization for the KCV and CD across different datasets. When the Nexp was maximum, the lower KCV/CD was identified as the optimal value. Nexp: the number of experimental verified lncRNAs; KCV: the top number of lncRNAs with the higher variabilities; NA: Not available.
| Alzheimer disease | GSE113524 | 12,937 | 1680 | 5 | 400 | 400 kb |
| Alzheimer disease | GSE104704 | 2199 | 407 | 5 | 200 | 5 kb |
| Alzheimer disease | GSE125583 | 2803 | 537 | 5 | 400 | 50 kb |
| Major depressive disorder | GSE101521 | 11,109 | 1043 | 2 | 600 | 5 kb |
| Major depressive disorder | GSE102556 | 12,718 | 1098 | 2 | 1000 | 5 kb |
| Schizophrenia | GSE112523 | 12,179 | 917 | 3 | 300 | 5 kb |
| Myocardial infarction | GSE65705 | 1351 | 35 | 2 | 35 | 100 kb |
| Myocardial infarction | GSE127853 | 503 | 16 | 2 | 16 | NA |
| Atherosclerosis | GSE97210 | 10,347 | 163 | 1 | 100 | NA |
| Atherosclerosis | GSE120521 | 10,343 | 120 | 1 | 100 | 5 kb |
| Atrial fibrillation | GSE113013 | 10,347 | 38 | 1 | 38 | NA |
| Atrial fibrillation | GSE108660 | 8090 | 33 | 1 | 33 | NA |
| Asthma | GSE106388 | 8036 | 291 | 2 | 200 | 5 kb |
| Asthma | GSE96783 | 10,451 | 352 | 2 | 100 | 5 kb |
| Lupus erythematosus | GSE131525 | 6031 | 64 | 1 | 64 | 5 kb |
| Ulcerative colitis | GSE128682 | 1756 | 20 | 1 | 20 | 70 kb |
| Type-1 diabetes mellitus | GSE131526 | 6798 | 283 | 3 | 200 | 5 kb |
| Obesity | GSE129398 | 822 | 46 | 1 | 46 | 5 kb |
| Obesity | GSE145412 | 6896 | 197 | 1 | 100 | 5 kb |
| Type-2 diabetes mellitus | GSE133099 | 8843 | 1075 | 5 | 600 | 5 kb |
| Acute lymphoblastic leukemia | GSE141140 | 867 | 12 | 1 | 12 | NA |
| Colorectal cancer | GSE144259 | 3249 | 43 | 6 | 43 | 300 kb |
| Breast cancer | TCGA_BC | 14,097 | 528 | 12 | 500 | 5 kb |
| Thyroid cancer | TCGA_TC | 13,618 | 8 | 1 | 8 | NA |
Fig. 1Performance comparison between DAnet and DEA across the 24 benchmark datasets (shown in Table 1) based on the percentage of successful prediction (Rate, %), the Rate was for characterizing the experimentally verified disease associated lncRNAs.
Fig. 2Performance comparison between DAnet and DEA across the 24 benchmark datasets (shown in Table 1) based on the enrichment factor (EF), the EF represented the comparison between the concentration of the experimentally verified lncRNAs in the identification results of DAnet/DEA and the concentration in the entire lncRNAs expression.
Fig. 3Optimization for the KCV across these benchmark datasets. X axis: the top number of lncRNAs with the higher variabilities, Y axis: the number of experimental verified lncRNA (Nexp). When the number of lncRNA identified by SNPs (Nsnp) was less than 100, the K was equal to the Nsnp, if else, the K was from 100 to Nsnp with gradient of 100.
Fig. 4The function of lncRNA in disease characterized by DAnet. A-D: co-expression network of module (contains the most genes with significant correlation) constructed by WGCNA for each dataset. A: GSE113524, B: GSE65705, C: GSE131525, D: GSE131526, green square: lncRNA, blue dot: mRNA. E: chord diagram of enriched pathways of 15 benchmark datasets (p less than 0.05). F: the statistic of diseases-associated pathways.
Fig. 5Associations between lncRNAs identified by DAnet and the specific disease. The blue lines mean the reported associations between lncRNAs and diseases. The squares represent the type of diseases. The dots indicate lncRNAs identified by DAnet. Orange square: diseases of the nervous system; brown square: mental, behavioural or neurodevelopmental disorders; blue square: circulatory system disease; pink square: diseases of the respiratory system; purple square: diseases of the immune system; turquoise square: diseases of the digestive system; yellow square: endocrine, nutritional or metabolic diseases; green square: neoplasms; grey dot: lncRNA not reported in the studied disease; green dot: lncRNA associated with a single disease; red dot, lncRNA associated with multiple diseases.