Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 MLSeq: Machine learning interface for RNA-sequencing data.

Literature DB >> 31104710

MLSeq: Machine learning interface for RNA-sequencing data.

Dincer Goksuluk¹, Gokmen Zararsiz², Selcuk Korkmaz³, Vahap Eldem⁴, Gozde Erturk Zararsiz⁵, Erdener Ozcetin⁶, Ahmet Ozturk⁷, Ahmet Ergun Karaagaoglu⁸.

Abstract

BACKGROUND AND
OBJECTIVE: In the last decade, RNA-sequencing technology has become method-of-choice and prefered to microarray technology for gene expression based classification and differential expression analysis since it produces less noisy data. Although there are many algorithms proposed for microarray data, the number of available algorithms and programs are limited for classification of RNA-sequencing data. For this reason, we developed MLSeq, to bring not only frequently used classification algorithms but also novel approaches together and make them available to be used for classification of RNA sequencing data. This package is developed using R language environment and distributed through BIOCONDUCTOR network.
METHODS: Classification of RNA-sequencing data is not straightforward since raw data should be preprocessed before downstream analysis. With MLSeq package, researchers can easily preprocess (normalization, filtering, transformation etc.) and classify raw RNA-sequencing data using two strategies: (i) to perform algorithms which are directly proposed for RNA-sequencing data structure or (ii) to transform RNA-sequencing data in order to bring it distributionally closer to microarray data structure, and perform algorithms which are developed for microarray data. Moreover, we proposed novel algorithms such as voom (an acronym for variance modelling at observational level) based nearest shrunken centroids (voomNSC), diagonal linear discriminant analysis (voomDLDA), etc. through MLSeq. MATERIALS: Three real RNA-sequencing datasets (i.e cervical cancer, lung cancer and aging datasets) were used to evalute model performances. Poisson linear discriminant analysis (PLDA) and negative binomial linear discriminant analysis (NBLDA) were selected as algorithms based on dicrete distributions, and voomNSC, nearest shrunken centroids (NSC) and support vector machines (SVM) were selected as algorithms based on continuous distributions for model comparisons. Each algorithm is compared using classification accuracies and sparsities on an independent test set.
RESULTS: The algorithms which are based on discrete distributions performed better in cervical cancer and aging data with accuracies above 0.92. In lung cancer data, the most of algorithms performed similar with accuracies of 0.88 except that SVM achieved 0.94 of accuracy. Our voomNSC algorithm was the most sparse algorithm, and able to select 2.2% and 6.6% of all features for cervical cancer and lung cancer datasets respectively. However, in aging data, sparse classifiers were not able to select an optimal subset of all features.
CONCLUSION: MLSeq is comprehensive and easy-to-use interface for classification of gene expression data. It allows researchers perform both preprocessing and classification tasks through single platform. With this property, MLSeq can be considered as a pipeline for the classification of RNA-sequencing data.

Entities: Disease

Keywords: Classification; Linear discriminant analysis; Negative Binomial; Poisson; RNA-Sequencing

Mesh：

Substances：
RNA

Year: 2019 PMID： 31104710 DOI： 10.1016/j.cmpb.2019.04.007

Source DB: PubMed Journal: Comput Methods Programs Biomed ISSN： 0169-2607 Impact factor: 5.428

Keyword Cloud
Cited

8 in total

1. Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19.

Authors: Md Rabiul Auwul; Md Rezanur Rahman; Esra Gov; Md Shahjaman; Mohammad Ali Moni
Journal: Brief Bioinform Date: 2021-04-12 Impact factor: 11.622

2. Automated Classification of Osteosarcoma and Benign Tumors using RNA-seq and Plain X-ray.

Authors: Olivia Alge; Lu Lu; Zhi Li; Yingqi Hua; Jonathan Gryak; Kayvan Najarian
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2020-07

3. Diagnosis of Cervical Cancer based on Ensemble Deep Learning Network using Colposcopy Images.

Authors: Venkatesan Chandran; M G Sumithra; Alagar Karthick; Tony George; M Deivakani; Balan Elakkiya; Umashankar Subramaniam; S Manoharan
Journal: Biomed Res Int Date: 2021-05-04 Impact factor: 3.411

4. Genes and regulatory mechanisms associated with experimentally-induced bovine respiratory disease identified using supervised machine learning methodology.

Authors: Matthew A Scott; Amelia R Woolums; Cyprianna E Swiderski; Andy D Perkins; Bindu Nanduri
Journal: Sci Rep Date: 2021-11-25 Impact factor: 4.379

5. Blood and brain gene expression signatures of chronic intermittent ethanol consumption in mice.

Authors: Laura B Ferguson; Amanda J Roberts; R Dayne Mayfield; Robert O Messing
Journal: PLoS Comput Biol Date: 2022-02-17 Impact factor: 4.779

6. Primary cicatricial alopecias are characterized by dysregulation of shared gene expression pathways.

Authors: Eddy H C Wang; Isha Monga; Brigitte N Sallee; James C Chen; Alexa R Abdelaziz; Rolando Perez-Lorenzo; Lindsey A Bordone; Angela M Christiano
Journal: PNAS Nexus Date: 2022-07-11

7. Dual-Organ Transcriptomic Analysis of Rainbow Trout Infected With Ichthyophthirius multifiliis Through Co-Expression and Machine Learning.

Authors: HyeongJin Roh; Nameun Kim; Yoonhang Lee; Jiyeon Park; Bo Seong Kim; Mu Kun Lee; Chan-Il Park; Do-Hyung Kim
Journal: Front Immunol Date: 2021-07-08 Impact factor: 7.561

Review 8. Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding.

Authors: Salvatore Esposito; Domenico Carputo; Teodoro Cardi; Pasquale Tripodi
Journal: Plants (Basel) Date: 2019-12-25

8 in total