Literature DB >> 35085348

Multiple Criteria Optimization (MCO): A gene selection deterministic tool in RStudio.

Isis Narváez-Bandera¹, Deiver Suárez-Gómez¹, Clara E Isaza^1,2,3,4, Mauricio Cabrera-Ríos^1,5.

Abstract

Identifying genes with the largest expression changes (gene selection) to characterize a given condition is a popular first step to drive exploration into molecular mechanisms and is, therefore, paramount for therapeutic development. Reproducibility in the sciences makes it necessary to emphasize objectivity and systematic repeatability in biological and informatics analyses, including gene selection. With these two characteristics in mind, in previous works our research team has proposed using multiple criteria optimization (MCO) in gene selection to analyze microarray datasets. The result of this effort is the MCO algorithm, which selects genes with the largest expression changes without user manipulation of neither informatics nor statistical parameters. Furthermore, the user is not required to choose either a preference structure among multiple measures or a predetermined quantity of genes to be deemed significant a priori. This implies that using the same datasets and performance measures (PMs), the method will converge to the same set of selected differentially expressed genes (repeatability) despite who carries out the analysis (objectivity). The present work describes the development of an open-source tool in RStudio to enable both: (1) individual analysis of single datasets with two or three PMs and (2) meta-analysis with up to five microarray datasets, using one PM from each dataset. The capabilities afforded by the code include license-free portability and the possibility to carry out analyses via modest computer hardware, such as personal laptops. The code provides affordable, repeatable, and objective detection of differentially expressed genes from microarrays. It can be used to analyze other experiments with similar experimental comparative layouts, such as microRNA arrays and protein arrays, among others. As a demonstration of the capabilities of the code, the analysis of four publicly-available microarray datasets related to Parkinson´s Disease (PD) is presented here, treating each dataset individually or as a four-way meta-analysis. These MCO-supported analyses made it possible to identify MMP9 and TUBB2A as potential PD genetic biomarkers based on their persistent appearance across each of the case studies. A literature search confirmed the importance of these genes in PD and indeed as PD biomarkers, which evidences the code´s potential.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35085348 PMCID： PMC8794188 DOI： 10.1371/journal.pone.0262890

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The analysis of gene expression leads to insight into biological processes and identification of biomarkers, as well as characterization of differing responses to therapy by individuals. Some of the first high throughput experiments used to analyze gene expression were microarray experiments. These were and still are typically used in comparative experiments between case and control groups to identify differentially expressed genes. The data generated by experiments like this populate large public repositories such as Gene Expression Omnibus. Ideally, it would be possible to pool several of these experiments to carry out a meta-analysis and arrive at statistically more robust conclusions about potential biomarkers. However, the fact that many of these experiments are measured in different units and scales has often made simultaneous analysis difficult. Nevertheless, several techniques are used to discover biomarkers using microarrays [1, 2]; these range from traditional statistical models to more computationally complex machine learning methods. For instance, in [3-6], methods based on genetic algorithms, spearman correlation, relief-F, and joint mutual information, among others, are used to analyze microarray data with that purpose. However, the outputs of these methods, require the configuration of a varying number of parameters that significantly affect their results. This hampers both analysis objectivity and repeatability. To this end, our research group in Camacho et al [7] proposed the MCO approach as it appears in this manuscript, which can analyze microarrays and other -omics datasets relying on Pareto-optimality conditions. The MCO-based analysis is carried out without the assumption of underlying statistical distributions a priori, the selection of a threshold value, or the adjustment of parameters by the user. Moreover, MCO presents a ranking based on the simultaneous consideration of performance measures included in the analysis. Our succeeding works presented in Cruz-Rivera et al. [8], Lorenzo et al. [9], Isaza et al. [10], successfully developed and applied MCO for gene selection in Alzheimer’s Disease, cervix cancer, and lung cancer, respectively. In the present work, MCO is fully automated using R. The resulting code maintains the nonparametric qualities of MCO and minimizes possible errors due to manual handling of data. Lead time to carry out analysis is also significantly decreased, making MCO a convenient and powerful tool to support the search for potential biomarkers. The MCO R tool can be accessed from the address https://server-deiver.shinyapps.io/MCO_TURBO/.

Design and implementation

MCO algorithm

As discussed in this work, the MCO algorithm requires at least one treatment vs control, comparative, high-throughput, replicated experiment. Several microarray datasets follow this organization in the characterization of relative gene expression. The comparative layout allows obtaining PMs for every gene in the experiment to measure differences in relative expression. For example, one PM can be the absolute value of the difference of means between the two groups (treatment, control), another can be the absolute value of the difference of medians between them, and so on. It is doubtful that selecting genes solely using one PM at a time will coincide precisely with the resulting selection. This evidences that there exists conflicts between different PMs. In the case of MCO, each gene is represented through several PMs -very much like a coordinate system- and the final selection is made up of the genes showing the best possible compromises between all PMs. In mathematical terms, this refers to identifying the Pareto-efficient solutions of the associated multiple criteria optimization problem. The Pareto-efficient solutions, in turn, form the Pareto-efficient frontier of such problem, from here on also referred to simply as the frontier. There exists sufficiency in Pareto-optimality, which means that for the genes that meet the Pareto-efficient conditions (i.e., those lying in the frontier), no other gene can be found in the experiment offering a better compromise between the PMs at hand. This confers certainty to the results.

MCO tool

The present work describes the development of an open-source code in RStudio of the MCO algorithm. The tool permits to identify, in a single run, the first F frontiers in an analysis. The user can specify F to create a hierarchy of genes organized in succeeding frontiers. The MCO tool was designed to support: (1) individual analysis of microarray datasets using two or three PMs, and (2) meta-analysis using two to five different datasets with one PM from each dataset (see Fig 1). It should be noted that the MCO algorithm could handle more datasets in a meta-analysis; however, to keep the computational cost low, the MCO tool is limited to a maximum of five datasets. The application of MCO results in sets of genes organized in F frontiers with decreasing levels of significance. In both options, the MCO algorithm follows three steps (Algorithm 1). In the first step, the PMs are selected from a predefined list, including the median, the mean, the mode, the third quartile, or a quantile of interest to the analyst. When the analysis is performed for an individual dataset, choosing between two or three PMs is possible. In the case of a meta-analysis, the default PM is the median from each dataset. The difference of the selected PMs between cases and controls is calculated for each gene. The absolute value function is further applied to then be subjected to a linear transformation aimed to set up a minimization MCO problem. In the second step, the MCO tool allows the user to divide the dataset into S groups (parallelism) to address RAM constraints when using hardware with modest capabilities. The MCO tool proceeds to find the local frontier of each group. The genes in each one of the S local frontiers are analyzed together to find the global Pareto-efficient frontier. This second step returns the genes with the best possible balances among the PMs to be minimized and are the ones identified as potential biomarkers. The third step entails applying the MCO algorithm F times, each time removing the previous frontier. This approach returns genes organized hierarchically in F frontiers.

Fig 1

Overview of the MCO tool.

Overview of the MCO tool.

Gene expression datasets can be analyzed individually or by combining several datasets. The PMs can be generated for individual analysis or meta-analysis of up to five datasets simultaneously. The MCO results can then be visualized by the function plotMCO. Algorithm 1: Pseudocode of the MCO tool, the procedure to carry out the selection of the first F Pareto-efficient frontiers. D represents the K datasets to be analyzed, in this implementation K falls between 1 and 5. PMc is the number of performance measures used to quantify relative expression changes between treatment and control. F is the number of successive frontiers to be presented in the analysis. The output is presented as a list of genes, hierarchically organized by frontier number. Input: D ← Number of datasets |D ← (X, Y) |k ∈ 1, 2, …, K; X ∈ samples– m = 1, 2, …, M; Y ∈ gene sets– n = 1, 2, …, N; PM ← Number of PMs |c = 1, 2, …C; (c = PMs in conflict); S ← Number of groups in which the genes are split |s = 1, 2, …, S; F ← Number of frontiers |F = 1, 2, …, f; for D ← 1 to K do for PM ← 1 to c do for s ← 1 to S do for f ← 1 to F do MCO function (PM) end end end end Output: Genes sets for each frontiers F

Results

MCO tool—Application

To demostrate how the MCO tool works, four microarray datasets relating to Parkinson’s Disease (PD) were selected from the Gene Expression Omnibus (GEO) repository. These are GSE99039 [11], GSE18838 [12], GSE19587 [13], and GSE57475 [14] (http://www.ncbi.nlm.nih.gov/geo). In the aggregation of the four datasets, there are 282 control samples (108 male/174 female) and 313 PD samples (182 male/131 female). Table 1 lists their characteristics. These datasets were selected using the following query filters: (1) ´Parkinson’s´ was used as a keyword, (2) the type of dataset was defined as ‘expression profiling by array’, (3) the organism was ‘Homo sapiens’, and (4) the gender information was set to ‘available’. The latter was included since it has been considered relevant to differentiate PD profiles [15]. Following these criteria, each dataset contained four distinct groups arising from the intersection of sex and type of sample: MaleControl, MalePD, FemaleControl, and FemalePD.

Table 1

List of PD studies from GEO.

GEO accession	Year	Platform	Probe sets	Genes	Control (Male/Fem)	Condition (Male/Fem)
GSE99039	2017	GPL570	54675	23520	212 (70/142)	191 (101/90)
GSE18838	2010	GPL5175	316919	17326	11 (6/5)	17 (13/4)
GSE19587	2010	GPL571	22277	13515	10 (6/4)	12 (6/6)
GSE57475	2015	GPL6947	49576	25146	49 (26/23)	93 (62/31)

Two types of analysis are presented: individual analysis of datasets and meta-analysis of multiple datasets. For the first type, we treated each of the four datasets individually. For the second type, we meta-analyzed three datasets simultaneously and four datasets simultaneously. Six analysis instances result in this manner: four individual analyses and two meta-analyses. In each instance, the goal was to detect genes with significant relative expression changes through MCO to characterize and infer their biological meaning in PD. MCO requires that at least two PMs are identified to work. On each of the four individual analysis instances, two PMs were used: the absolute value of the difference of the means and the absolute value of the difference of the medians between the pair of groups under comparison. The MCO analyses in these instances were, then, all 2-dimensional (2D). On the other hand, in the meta-analysis involving three datasets, there were three PMs: the absolute value of the difference of medians between the pair of groups under comparison from each dataset. The MCO analyses involved here are 3D. Finally, in the last meta-analysis instance involving four datasets, there were four PMs: the absolute value of the difference of medians between the pair of groups under comparison from each dataset, resulting in an MCO analysis that is 4D. On each of the six analysis instances, MCO was applied four times (Fig 2A): MCO 1 compares the pair of groups MaleControl—MalePD; MCO 2 compares the groups FemaleControl—FemalePD; MCO 3 compares the groups MaleControl—FemaleControl; and MCO 4, the groups MalePD—FemalePD. The genes of interest were those with biomarking potential solely for PD. Following set theory -see Venn diagram in Fig 2B this results in selecting the genes in the intersection of MCO 1 and MCO 2 that are not in MCO 3 or MCO 4.

Fig 2

The general scheme for all six analysis instances.

The general scheme for all six analysis instances.

(A) MCO 1 compares the pair of groups MaleControl—FemaleControl; MCO 2 compares the groups FemaleControl—FemalePD; MCO 3 compares the groups MaleControl—FemaleControl; and MCO 4, the groups MalePD-FemalePD; (B) The genes of interest were those with biomarking potential solely for PD. Following set theory, this selects the genes in the intersection of MCO 1 and MCO 2 that are not in MCO 3 or MCO 4. One last difference is that, in the four individual analysis instances, the number of frontiers, F, was set to 10. In the two meta-analyses, it was set to 1 to keep results manageable. The results of the six analysis instances can be identified by defining: number of datasets in the analysis, the number of PMs (dimensions), and the last four digits of the identifiers of the datasets involved. Explicitly, the instances are: 1–2D-9039, 1–2D-8838, 1–2D-9587, 1–2D-7475, 3–3D-9039/8838/9587, 4–4D-9039/8838/9587/7475. The genes of interest for the four individual analysis instances can be consulted in Table 2. For illustration purposes, Fig 3 shows the graphical results for MCO 1 on each dataset, while Fig 4 does the same for MCO 2. The complete lists identifying ten frontiers on each dataset can be found in the S2-S5 Tables in S1 File.

Table 2

Genes of interest from individual analysis instances 1–2D-9039, 1–2D-8838, 1–2D-9587, and 1–2D-7457 and the references supporting their roles in Parkinson’s Disease (PD) or neurodegenerative diseases (ND).

Individual Analysis Instance	Gene symbol	Gene name	Reference related to
Individual Analysis Instance	Gene symbol	Gene name	PD	ND
1–2D-9039	TUBB2A	Tubulin Beta 2A Class IIa	[16–18]	[19]
	CFD	Complement Factor D	[20]	[21]
	PTGDS	Prostaglandin D2 Synthase	[16]
	LRRN3	Leucine Rich Repeat Neuronal 3	[16, 22]
	ANXA3	Annexin A3		[23, 24]
	GPR97	G Protein-Coupled Receptor 97	[25]
	PRKCD	Protein Kinase C Delta	[26, 27]
	MMP9	Matrix Metallopeptidase 9	[28–30]
	PGLYRP1	Peptidoglycan Recognition Protein 1	[31]
	SPI1	Spi-1 Proto-Oncogene	[22]
1–2D-8838	ND6	NADH Dehydrogenase Subunit 6	[32]
	GTF2B	General Transcription Factor IIB
	RPL18	Ribosomal Protein L18
	TAGLN2	Transgelin 2
	TMEM14B	Transmembrane Protein 14B
	GABARAP	GABA(A) Receptor-Associated Protein	[18]
	HIST1H1E	Histone Cluster 1
1–2D-9587	OPHN1	Oligophrenin 1
1–2D-7475	OAZ1	Ornithine Decarboxylase Antizyme 1
	EEF1A1	Elongation Factor 1-Alpha 1	[33, 34]
	ARHGDIB	Rho GDP Dissociation Inhibitor Beta	[35]
	HBD	Hemoglobin Subunit Delta	[36]
	CFD	Complement Factor D	[20]	[21]
	UBA52	Ubiquitin Carboxyl Extension Protein 52	[18]

Fig 3

MCO 1: MalesPD Vs MaleControl.

Graphical representations for MCO 1 associated to individual analysis instances 1–2D-9039, 1–2D-8838, 1–2D-9587, and 1–2D-7457. Solutions toward the origin (0,0) are more significant.

Fig 4

MCO 2: FemalesPD Vs FemaleControl.

Graphical representations for MCO 2 associated to individual analysis instances 1–2D-9039, 1–2D-8838, 1–2D-9587, and 1–2D-7457. Solutions toward the origin (0,0) are more significant.

MCO 1: MalesPD Vs MaleControl.

Graphical representations for MCO 1 associated to individual analysis instances 1–2D-9039, 1–2D-8838, 1–2D-9587, and 1–2D-7457. Solutions toward the origin (0,0) are more significant.

MCO 2: FemalesPD Vs FemaleControl.

Graphical representations for MCO 2 associated to individual analysis instances 1–2D-9039, 1–2D-8838, 1–2D-9587, and 1–2D-7457. Solutions toward the origin (0,0) are more significant. Table 3 contains the genes of interest for the two meta-analysis instances. In 3–3D-9039/8838/9587, four genes were identified as potential biomarkers: MMP9, RPS11, TUBA1B, and TUBB2A. Fig 5 shows the results for MCO 1 and MCO 2 for illustration purposes for this instance. The complete lists for MCO 1 through MCO 3 can be found in the S6 Table in S1 File.

Table 3

Genes of interest for meta-analysis instances 3–3D-9039/8838/9587 and 4–4D-9039/8838/9587/7475 and the references supporting their roles in Parkinson’s Disease (PD) or neurodegenerative diseases (ND).

Meta-Analysis Instance	Gene symbol	Gene name	Reference related to
Meta-Analysis Instance	Gene symbol	Gene name	PD	ND
3–3D-9039/8838/9587	MMP9	Matrix Metallopeptidase 9	[28–30]
	RPS11	Ribosomal Protein S11	[37]
	TUBA1B	Tubulin Alpha 1b, K-ALPHA-1	[35]
	TUBB2A	Tubulin Beta 2A Class IIa	[16–18]	[19]
4–4D-9039/8838/9587/7475	EEF2	Eukaryotic Translation Elongation Factor 2	[38, 39]
	MMP9	Matrix Metallopeptidase 9	[28–30]
	CFD	Complement Factor D	[20]	[21]
	DAZAP2	DAZ Associated Protein 2
	MYL6	Myosin Light Chain 6	[40]
	ARHGDIB	Rho GDP Dissociation Inhibitor Beta	[35]
	RPL18	Ribosomal Protein L18
	RPS11	Ribosomal Protein S11	[37]
	CD81	CD81 Molecule	[41]
	TUBB2A	Tubulin Beta 2A Class IIa	[16–18]	[19]

Fig 5

Meta-analysis instance 3–3D-9039/8838/9587.

Graphical representations for MCO 1 and MCO 2 are associated with this instance. Solutions toward the origin (0,0,0) are more significant.

Meta-analysis instance 3–3D-9039/8838/9587.

Graphical representations for MCO 1 and MCO 2 are associated with this instance. Solutions toward the origin (0,0,0) are more significant. In 4–4D-9039/8838/9587/7475, 10 genes of interest were identified: EEF2, MMP9, CFD, DAZAP2, MYL6, ARHGDIB, RPL18, RPS11, CD81, and TUBB2A, as shown in Table 3. The complete lists for MCO 1 through MCO 4 can be found in the S7 Table in S1 File. The Venn diagrams in Fig 6 show the overlap within the meta-analysis instances 3–3D-9039/8838/9587 in (A) and 4–4D-9039/8838/9587/7475 in (B). Notably, three genes (MMP9, RPS11, and TUBB2A) were identified in both instances. In addition, two out of these three, MMP9 and TUBB2A, were also identified in instance 1–2D-9039.

Fig 6

Venn diagrams of meta-analysis instances 3–3D-9039/8838/9587 in (A) and 4–4D-9039/8838/9587/7475 in (B).

The accompanying table lists the genes in each intersection.

Venn diagrams of meta-analysis instances 3–3D-9039/8838/9587 in (A) and 4–4D-9039/8838/9587/7475 in (B).

The accompanying table lists the genes in each intersection.

Discussion

In the literature, our team performed a series of queries combining the name of each of the ten genes and either ‘Parkinson´s Disease’ or ‘Neurodegenerative’ to look for related biological evidence. This process found out that 9 out of 10 genes (MMP9, RPS11, TUBB2A, EEF2, CFD, DAZAP2, MYL6, ARHGDIB, RPL18, and CD81) identified in the meta-analysis instances have appeared in the literature as related to PD or neurodegenerative conditions. For instance, MMP9, a protein-coding gene, appears in 17 articles describing a direct association with PD. In [30], MMP9 was identified as a potential marker for Lewy body disorders, i.e. Parkinson’s Disease. In [16], using the LASSO algorithm with 10-fold cross-validation cycles, TUBB2A was one of the 25 genes selected as differentially expressed mRNAs for PD. Furthermore, in [17], it is argued that TUBB2A is a molecular biomarker for PD in the blood, supporting similar assertions in [18]. The authors in [17] demonstrated that TUBB2A in reduced expression reasonably predicted PD as a blood biomarker via a meta-analysis of 11 datasets from GEO (8 from substantia nigra and 3 from blood samples) with further validation analyzing mRNA expression levels in the blood of 50 sporadic PD patients and 50 control subjects. In agreement with known biology, TUBB2A was one of the top identified genes from both meta-analysis instances (Table 3). TUBB2A is a strong candidate for more in-depth explorations at the experimental level for PD. The fact that MMP9 and TUBB2A appeared in the results of instances 3–3D-9039/8838/9587 and 4–4D-9039/8838/9587/7475 as well as in 1–2D-9039, supports the sensitivity of our method to detect genes that play a crucial role in the disease under study. Besides this supporting biological evidence on the efficacy of the MCO tool, the code is also computationally efficient as it can meta-analyze up to five datasets simultaneously. The largest instance presented here, 4–4D-9039/8838/9587/7475, took around 10 minutes to process for any of the MCO 1 thru MCO 4 analyses on a personal laptop with a 2.90 GHz Intel Core i7 CPU and 16G RAM. The ten genes of interest identified in meta-analysis instance 4–4D-9039/8838/9587/7475 were the subject of gene ontology (GO) enrichment analysis using ReactomePA R package [42] and Enrichr tool [43]. Table 4 lists the results from enrichPathway, and the Table 5 the results from Enrichr.

Table 4

Pathways enriched for the genes of interest identified in meta-analysis.

ID	Description	GeneRatio	BgRatio	pvalue	p.adjusted	qvalue	Gene	Count
R-HSA-156902	Peptidechainelongation	3/9	89/10856	4.32E-05	0.003275522	0.001607	EEF2/RPL18/RPS11	3
R-HSA-156842	EukaryoticTranslationElongation	3/9	93/10856	4.93E-05	0.003275522	0.001607	EEF2/RPL18/RPS11	3
R-HSA-166658	Complementcascade	2/9	58/10856	9.86E-04	0.028553514	0.014011	CFD/CD81	2
R-HSA-2766	Translation	3/9	291/10856	1.42E-03	0.028553514	0.014011	EEF2/RPL18/RPS11	3
R-HSA-192823	Viral mRNATranslation	2/9	89/10856	2.30E-03	0.028553514	0.014011	RPL18/RPS11	2
R-HSA-2682334	EPH-Ephrinsignaling	2/9	92/10856	2.46E-03	0.028553514	0.014011	MMP9/MYL6	2

Table 5

GO biological process.

Term	P-value	Overlap_genes
Translation (GO:0006412)	0.000137136	[EEF2, RPL18, RPS11]
Cellular macromolecule biosynthetic process (GO:0034645)	0.000423823	[EEF2, RPL18, RPS11]
SRP-dependent cotranslational protein targeting to membrane (GO:0006614)	0.000880239	[RPL18, RPS11]
Cytoplasmic translation (GO:0002181)	0.000939488	[RPL18, RPS11]
Cotranslational protein targeting to membrane (GO:0006613)	0.000959656	[RPL18, RPS11]
Protein targeting to ER (GO:0045047)	0.001150535	[RPL18, RPS11]
Nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)	0.001382294	[RPL18, RPS11]
neutrophil degranulation (GO:0043312)	0.001462415	[CFD, EEF2, MMP9]
neutrophil activation involved in immune response (GO:0002283)	0.001497690	[CFD, EEF2, MMP9]
neutrophil mediated immunity (GO:0002446)	0.001524499	[CFD, EEF2, MMP9]

Having a short solution list made it possible to perform an in-depth literature search for each gene. The information added to the ontology analyses and found in which pathways the unlisted genes could be included. From the solution list: EEF2, RPL18, RPS11 code for proteins involved in protein synthesis (translation). It has been reported in various studies that protein synthesis is affected in PD and that some ribosomal proteins have expression changes in the condition [38, 44, 45], and eEF2 (the protein product of EEF2) has been reported to be expressed less in PD [45]. The DAZAP2 gene product, not included in the ontology analysis, could also be associated with translation (protein synthesis). The association is through the DAZAP2 gene product role in stress granules (SGs) that enclose different translation system components when the cell is under stress [46]. The DAZAP2 gene product also participates in translation through interactions with eukaryotic initiation factor 4G (https://www.genecards.org/cgi-bin/carddisp.pl?gene=DAZAP2). The ARHGDIB and TUBB2A gene products have roles in cytoskeletal organization and dynamics. The expression of genes for proteins involved in cytoskeleton dynamics, such as tubulin, changes in PD [44]. MMP9 and CD81 gene products are involved in cell motility and extracellular matrix dynamics, MMP9 expression changes in PD, and amyotrophic lateral sclerosis [47].

Contains a comparative study of MCO Vs CFS, IG and eBayes gene selection methods and all the supporting tables and figures.

(ZIP) Click here for additional data file. 8 Jul 2021 PONE-D-20-38908 Multiple Criteria Optimization (MCO): a gene selection deterministic tool in RStudio PLOS ONE Dear Dr. Cabrera-Rios, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. According to referees' suggestions (see detailed comments below), some methodological approaches have to be better explained, some new results could be considered to be included (as gene enrichment analyses) and, in general, the format (typos and pictures) also improved. Please submit your revised manuscript by Aug 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Francisco J. Esteban, Ph.D., M.Sc. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No Reviewer #3: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Gene selection is of great importance for the analysis of biological data. In this study, the authors proposed a method that select differential genes according to the Pareto criterion. Overall, the technical novelty is limited. Also, the authors claimed that they have used the proposed method “Our [22] previous works, Cruz-Rivera et al. [4], Lorenzo et al. [5], Isaza et al. [6], have 23 successfully developed and applied MCO for gene expression changes' selection in 24 conditions like Alzheimer's disease, cervix cancer, and lung cancer, respectively.” There are points that help improve the manuscript. 1. How many datasets are used in the experiments and how are they used in this study? In Algorithm 1, the authors mentioned there are five datasets, while there are four Parkinson’s disease related datasets. 2. It is not clear how we get F? Also, rather than use 1, 2, 3, 4, and 5, it would be better use K to denote the number of datasets. Do rewrite Algorithm 1 to better illustrate it. 3. The pictures are of poor quality. 4. What are the differences between the MCO method with the commonly used filter, wrapper, embedded, and hybrid gene selection methods? 5. The discussion is kind of superficial, which needs writing. Reviewer #2: 1. This work reported multiple criteria optimization (MCO) in gene selection for the analysis of microarray datasets. MCO selects genes with the largest expression changes without user manipulation of neither informatics nor statistical parameters. Furthermore, the user did not have to choose neither a preference structure among multiple measures of differential expression nor a predetermined quantity of genes to be deemed significant a priori. This implies that using the same datasets and performance measures (PMs), the method can converge to the same set of selected differentially expressed genes (repeatability) in spite of who the analyst is (objectivity). The reported work described the development of an open-source tool in RStudio to enable both: 1) individual analysis of single datasets with two or three PMs and 2) meta-analysis with up to five microarrays datasets, using one PM from each dataset. 2. In classification research, now I consider this contribution as pretty good to make a paper valuable for the field of interest. I am a convincing advocate of introducing the rigor of gene classification research in high dimensional datasets. However, I read this manuscript and I suggest to author please apply to show the competent with others recent statistical comparison method. 3. Quite good concept of paper and require complete language check should be performed to handle all typos and language issues. 4. There are many similar paper published on this topic, how your paper is different from existing ones? Explain. 5. Include some of the latest and relevant references for the benefit of the readers/authors of Cancer classification/ Parkinson disease based journal. The following citations will be very useful for the current, future and young research scholars in this research field from all over the globe. a. Hybrid approach for gene selection and classification using filter and genetic algorithm. b. Detecting biomarkers from microarray data using distributed correlation based gene selection c. Dna gene expression analysis on diffuse large b-cell lymphoma (dlbcl) based on filter selection method with supervised classification method d. An efficient stacking model with label selection for multi-label classification e. Medical diagnosis of Parkinson disease driven by multiple preprocessing technique with Scarce Lee Silverman voice treatment data f. Knowledge discovery in medical and biological datasets by integration of Relief-F and correlation feature selection techniques The evaluation section is also not clear. Did authors use cross-validation or an independent test set? Did they train their model for one benchmark and used the trained model on the rest of the benchmarks? Reviewer #3: The authors develop an open-source tool in RStudio for implementing their previous works about multiple criteria optimization (MCO) in gene selection for the analysis of microarray datasets. The novelty of this paper is not enough, and it needs further experiments. My suggestions are shown below. 1. In this paper and their previous works, they only used median or mean value to obtain differential genes. However, in many similar works about the selection of differential genes in two-class problems, researchers usually used more complicated rank-based measures, such as t-test, wilcoxon rank sum test, signal-to-noise ratio, etc. Authors should further demonstrate their MCO efficacy using such measures. 2. Please show their MCO can be further applied to multi-class problems. 3. Please further perform classification processes, i.e., selected features in conjunction with classifiers, to demonstrate their selected features are better than the selected features by other feature selection approaches for obtaining better classification results. 4. It needs to perform gene set enrichment analysis, such as DAVID. It can be used to show that the selected genes can be enriched in some biological processes or pathways. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 17 Nov 2021 Three reviewers provided feedback on our manuscript. Reviewer 1 raised six issues, Reviewer 2 raised five issues, and Reviewer 3 four issues. We provide our follow-up to each of them in the following sections and express our gratitude to the reviewers for their insight. The complete document is provided under the name "Response to reviewers.pdf". Some information might have been lost in the transcription below. Reviewer 1, Issue 1 Gene selection is of great importance for the analysis of biological data. In this study, the authors proposed a method that select differential genes according to the Pareto criterion. Overall, the technical novelty is limited. Also, the authors claimed that they have used the proposed method “Our [22] previous works, Cruz-Rivera et al. [4], Lorenzo et al. [5], Isaza et al. [6], have 23 successfully developed and applied MCO for gene expression changes' selection in 24 conditions like Alzheimer's disease, cervix cancer, and lung cancer, respectively.” There are points that help improve the manuscript: Follow up to Reviewer 1, Issue 1 Thank you for your comment. The methods outlined in this paper have been, indeed, successfully tested on previous instances and their competitive performance has been demonstrated on previous works, as properly referenced in our manuscript. The resulting code presented here is an open-source scientific software, which fits the scope of The PLOS Computational Biology Software Section: (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002799) The code makes the analysis of single datasets efficient, reliable, and objective when pursuing gene selection. Furthermore, it allows the simultaneous meta-analysis of up to five datasets possible, which is a novel capability. This is especially true when considering that the datasets can be measured in different units, an issue that has majorly hampered the automation of meta-analysis. It is our belief that this code, as a computational artifact, brings strong analysis capabilities to gene selection that have not been completely available in the past, hence its novelty. Reviewer 1, Issue 2 How many datasets are used in the experiments and how are they used in this study? In Algorithm 1, the authors mentioned there are five datasets, while there are four Parkinson’s disease related datasets. Follow up to Reviewer 1, Issue 2 We consider that this comment has to do with mixing up the capabilities of the MCO method, with those of the R code presented here, and not differentiating it from the number of datasets used in the illustrative example related to PD. We apologize for this confusion and proceed to disambiguate: MCO is not mathematically/theoretically bounded in the number of datasets (and their corresponding performance measures) that it can analyze simultaneously. However, practical limits will arise related to the available computing power as well as the number of independent data points. The focus of the manuscript is the presentation of the automated R code for MCO, which can support meta-analysis of up to five databases simultaneously. So, even if MCO is mathematically unbound in the number of datasets that it can accommodate, the R code has been designed for up to five datasets. Finally, in the cases outlined for the illustrative example, four databases are available as described in Table 1, selected as explained in the Results section. These four datasets are used to support the analysis and meta-analysis cases detailed on Tables 2 and 3. To make sure that the distinction is clear, the material was organized in subsections: MCO algorithm, MCO tool, and MCO tool – Application, where the particular bounds are discussed. Reviewer 1, Issue 3 It is not clear how we get F? Also, rather than use 1, 2, 3, 4, and 5, it would be better use K to denote the number of datasets. Do rewrite Algorithm 1 to better illustrate it. Follow up to Reviewer 1, Issue 3 Thanks for the advice. Besides the disambiguation provided previously, we have adopted the letter K to denote the number of databases of interest to the user to be analyzed through MCO (See Algorithm 1). For the R code, K falls between 1 and 5, and for the particular cases outlined in Tables 2 and 3, K is 1, 3, or 4 depending on the case. On the other hand, F is the only algorithm-related parameter that is user-defined. If F=1, the Pareto Frontier is identified with all data points involved in the analysis. If F=2, the first frontier is calculated with all data points involved and then the solutions to this first frontier are removed to proceed to a second run of the algorithm. This procedure is necessary to accommodate experimental error in the datasets. Obtaining more than 1 efficient frontier results in a hierarchy of solutions. In our experience, MCO converges to tens of solutions when analyzing tens of thousands initial genes. The number of frontiers (F) used on each analysis case in our manuscript is duly specified for single datasets (10 frontiers) and for meta-analysis (1 frontier). Reviewer 1, Issue 4 The pictures are of poor quality. Follow up to Reviewer 1, Issue 4 We have tested our figures using the software provided by PLOS. All of them passed the quality inspection. In attention to the reviewer’s comment, we have increased the quality of our pictures even beyond these tests. The improved compliant set is now included in our resubmission. In particular, we adjusted figures 1,3, 4,5 and 6. We also improved the following: the content in Fig 1 has been reorganized for clarity, the letter size was made larger for all graphs in the manuscript, and the size and letter type of the table in Fig 6 was modified to make it more readable. All figures passed the testing provided by the PACE program, as requested by the Journal instructions. Reviewer 1, Issue 5 What are the differences between the MCO method with the commonly used filter, wrapper, embedded, and hybrid gene selection methods? Follow up to Reviewer 1, Issue 5 MCO is based on mathematical optimization. Mathematical optimization is applied when searching for the demonstrably best possible solution(s) to a particular problem. The demonstrably best possible solution to an optimization problem is called a global optimum, and it might or might not be accessible or recognizable in general. In our paper and our previous work, gene selection is casted as a multiple criteria optimization problem. In addition, our algorithm guarantees the global optimality of the solutions (that is, the resulting selected set of genes) thanks to the Pareto efficiency conditions. So, MCO guarantees global optimality, which -to the best extent of our knowledge- is a characteristic not afforded by the other methods. In addition, MCO arrives to the globally optimal solutions to the gene selection problem without asking the user to define a priori a final number of solutions, a preference structure between the different performance measures, the value of a threshold, or the use of a particular normalization procedure. The vast majority of gene selection methods would have at least one of these requirements. The fact that MCO does not require any of them, makes our method objective and repeatable. Finally, as demonstrated by the cases outlined in Table 3, MCO can select important genes based on the simultaneous consideration of multiple datasets, in spite of the underlying experiments being incommensurable (i.e. measured in dissimilar units). This is, indeed, a capability unique to MCO. Reviewer 1, Issue 6 The discussion is kind of superficial, which needs writing Follow up to Reviewer 1, Issue 6 We are deepening biological interpretation based on Gene Ontology analysis in the Discussion Section to further support the biological significance of the results that are possible through the capabilities afforded by our method. Reviewer 2, Issue 1 In classification research, now I consider this contribution as pretty good to make a paper valuable for the field of interest. I am a convincing advocate of introducing the rigor of gene classification research in high dimensional datasets. However, I read this manuscript and I suggest to author please apply to show the competent with others recent statistical comparison method. Follow up to Reviewer 2, Issue 1 When it comes to approaching a general class of optimization problems, such as those falling within Gene Selection, it is necessary to keep in mind the no-free-lunch theorem (NFLT) which states “that a general-purpose, universal optimization strategy is impossible. The only way one strategy can outperform another is if it is specialized to the structure of the specific problem under consideration”. In the development of MCO as presented in our manuscript, two important characteristics were considered: (1) the strategy should not require the manipulation of parameters for it to result in a competitive solution, and (2) the resulting solution should be globally optimal in presence of conflicting performance measures. The first characteristic would bring repeatability and the second one, objectivity. In Reviewer 1’s issues 1 and 5, we outlined and explained the distinctive advantages of our Gene Selection method, namely certainty (optimality), objectivity, consistent convergence, non-parametric nature, and the possibility of truly carrying out simultaneous meta-analysis in spite of incommensurability. These characteristics have been also discussed in our preceding publications. A comparative study is now available as supplementary material in attention to Reviewer 2’s Issue 1 and issue 3. This comparison includes four methods of gene selection: MCO, CFS, IG, and eBayes. The four methods were applied to the four datasets utilized in this manuscript. MCO was applied standardly with 10 frontiers as advised here, while the other methods were varied in their adjustment parameters. In addition, four types of classifiers were used to assess their classification accuracy: support vector machine, KNN, Treebag, Linear Discriminant Analysis and RF. The comparison involves classification accuracy, classification accuracy variance, and number of selected genes for the four gene selection methods across the datasets, the variation created by their parameters, and the classifiers. The results are as follows: In terms of classification accuracy (Figure A1) , an ANOVA deemed that at least one of the methods had a different classification accuracy mean (p-value = 0.003). Furthermore, using Tukey´s multiple comparison scheme, it was determined that the CFS method provided the largest classification accuracy, while the other three (MCO, IG and eBayes) showed no significant difference between them at an alpha value of 0.05. This provides evidence of the competitiveness of MCO with the distinctive characteristic that no parameters need to be adjusted by the user. Regarding classification accuracy variance (dispersion), there was evidence that at least one of the methods had a different variance than the rest (p-values of 0.012 and 0.090 for Multiple Comparison´s and Levene´s methods respectively). Furthermore, through the analysis of the confidence intervals, it was determined that MCO´s interval did not overlap with the other three method’s intervals and that MCO´s interval contained lower values than the rest of the methods. This evidences MCO´s robust classification performance across different classifiers and datasets. Finally, regarding the number of selected genes (parsimony), an ANOVA showed that at least one of the methods had a different mean number of selected genes (p-value = 0.011). Furthermore, using Tukey´s multiple comparison scheme, two methods were deemed statistically equivalent in terms of their number of selected genes: MCO, CFS, both of which showed a significantly lower mean number of selected genes than eBayes. The IG method did not show significant difference with CFS and eBayes due to its large dispersion. This result evidences the parsimonious gene selection of MCO. So, in conclusion, this comparison shows the robustness of the solutions of MCO, the results of which tend to be parsimonious, providing a competitive classification accuracy performance without requiring the adjustment of statistical or computational parameters and with the unique capability to support simultaneous meta-analysis of multiple datasets. Figure A1. ANOVA and multiple comparison results for classification accuracy of the four gene selection methods. Figure A2. Test for equal variances of classification accuracy for the four gene selection methods. Figure A3. ANOVA and multiple comparison results for number of selected genes for the four gene selection methods. Reviewer 2, Issue 2 Quite good concept of paper and require complete language check should be performed to handle all typos and language issues. Follow up to Reviewer 2, Issue 2 We have thoroughly revised our use of language. We used the online tool “Grammarly.com” to improve our exposition in this new version. Reviewer 2, Issue 3 There are many similar papers published on this topic, how your paper is different from existing ones? Explain. Follow up to Reviewer 2, Issue 3 The reviewer is kindly referred to our follow-up to Reviewer 1´s Issues 1 and 5, as well as Reviewer 2´s Issue 1. Reviewer 2, Issue 4 Include some of the latest and relevant references for the benefit of the readers/authors of Cancer classification/ Parkinson disease based journal. The following citations will be very useful for the current, future and young research scholars in this research field from all over the globe. a. Hybrid approach for gene selection and classification using filter and genetic algorithm. b. Detecting biomarkers from microarray data using distributed correlation based gene selection c. Dna gene expression analysis on diffuse large b-cell lymphoma (dlbcl) based on filter selection method with supervised classification method d. An efficient stacking model with label selection for multi-label classification e. Medical diagnosis of Parkinson disease driven by multiple preprocessing technique with Scarce Lee Silverman voice treatment data f. Knowledge discovery in medical and biological datasets by integration of Relief-F and correlation feature selection techniques Follow up to Reviewer 2, Issue 4 Thank you for your suggestions. We have added the most relevant ones in our literature review and references. Reviewer 2, Issue 5 The evaluation section is also not clear. Did authors use cross-validation or an independent test set? Did they train their model for one benchmark and used the trained model on the rest of the benchmarks? Follow up to Reviewer 2, Issue 5 The idea for this work is to result in biological relevance, thus our validation method goes along the lines of the biological evidence that emerges with our selection of genes. To this end, the discussion section was enhanced in this revised version. Although the classification problem was not the goal of this work, a complete comparison of classification performance is now added as an appendix following Reviewer 2´s Issue 1. This includes a comparison of four gene selection methods across multiple datasets, classifiers, and parameter variation. The cross-validation specifics have been added to this section, as these were varied for all methods for comparison purposes. Reviewer 3, Issue 1 In this paper and their previous works, they only used median or mean value to obtain differential genes. However, in many similar works about the selection of differential genes in two-class problems, researchers usually used more complicated rank-based measures, such as t-test, wilcoxon rank sum test, signal-to-noise ratio, etc. Authors should further demonstrate their MCO efficacy using such measures. Follow up to Reviewer 3, Issue 1 The existence of several types of PMs measures and using a single one of them after a normalization procedure to select genes has contributed to the problem of reproducibility in this task. This was one of the reasons that motivated the generation of MCO. The rationale to use the (absolute) difference of medians or the difference of means has had to do more with transparency and with keeping assumptions to a minimum in our analyses. In the R code, there is a possibility of using the (absolute) difference of means, medians, third quartiles, or the kth-percentiles to give analysts further possibilities. A revision of our code will include adding a column to the dataset with values from performance measures calculated by the analyst with the additional information if these values are to be either maximized or minimized. This will swiftly accommodate the instances described by the reviewer. As long as a performance measure´s values are numerical and subject to either maximization or minimization, it can be perfectly analyzed using MCO, regardless of its measuring units and its complexity. Also, the possibility of carrying out meta-analysis (to up to five datasets simultaneously) remains unchanged in our code. Reviewer 3, Issue 2 Please show their MCO can be further applied to multi-class problems. Follow up to Reviewer 3, Issue 2 MCO, as coded in R, can be applied to comparative experiments involving case (treatment) vs. control subjects. This is established in the Introduction section of the paper. MCO can meta-analyze multiple datasets concurrently provided that these datasets have a case-control layout as described previously. An application to multi-class problems is left for future work. Reviewer 3, Issue 3 Please further perform classification processes, i.e., selected features in conjunction with classifiers, to demonstrate their selected features are better than the selected features by other feature selection approaches for obtaining better classification results. Follow up to Reviewer 3, Issue 3 The reviewer is kindly referred to Reviewer 2, Issue 1 which addresses this comment and provides a comparative study as suggested. Reviewer 3, Issue 4 It needs to perform gene set enrichment analysis, such as DAVID. It can be used to show that the selected genes can be enriched in some biological processes or pathways. Follow up to Reviewer 3, Issue 4 This analysis is now provided in the Discussion section of the paper. Thank you for the suggestion. Submitted filename: Response to reviewers.pdf Click here for additional data file. 10 Jan 2022 Multiple Criteria Optimization (MCO): a gene selection deterministic tool in RStudio PONE-D-20-38908R1 Dear Dr. Cabrera-Rios, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Francisco J. Esteban, Ph.D., M.Sc. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have addressed my concerns and the manuscirpt has been improved a lot. Please make sure that the references are closely related to the paper. Reviewer #2: Good revised paper and well-written. Multiple Criteria Optimization (MCO): a gene selection deterministic tool in RStudio can be Accept. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 14 Jan 2022 PONE-D-20-38908R1 Multiple Criteria Optimization (MCO): a gene selection deterministic tool in RStudio Dear Dr. Cabrera-Ríos: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Francisco J. Esteban Academic Editor PLOS ONE

42 in total

1. A novel peptide inhibitor targeted to caspase-3 cleavage site of a proapoptotic kinase protein kinase C delta (PKCdelta) protects against dopaminergic neuronal degeneration in Parkinson's disease models.

Authors: Anumantha G Kanthasamy; Vellareddy Anantharam; Danhui Zhang; Calivarathan Latchoumycandane; Huajun Jin; Siddharth Kaul; Arthi Kanthasamy
Journal: Free Radic Biol Med Date: 2006-08-25 Impact factor: 7.376

2. Proline-rich transcript in brain protein induces stress granule formation.

Authors: Jung-Eun Kim; Incheol Ryu; Woo Jae Kim; Ok-Kyu Song; Jeongeun Ryu; Mi Yi Kwon; Joon Hyun Kim; Sung Key Jang
Journal: Mol Cell Biol Date: 2007-11-05 Impact factor: 4.272

3. Gene expression profiling of substantia nigra dopamine neurons: further insights into Parkinson's disease pathology.

Authors: Filip Simunovic; Ming Yi; Yulei Wang; Laurel Macey; Lauren T Brown; Anna M Krichevsky; Susan L Andersen; Robert M Stephens; Francine M Benes; Kai C Sonntag
Journal: Brain Date: 2008-12-03 Impact factor: 13.501

4. Peptidoglycan recognition protein genes and risk of Parkinson's disease.

Authors: Samuel M Goldman; Freya Kamel; G Webster Ross; Sarah A Jewell; Connie Marras; Jane A Hoppin; David M Umbach; Grace S Bhudhikanok; Cheryl Meng; Monica Korell; Kathleen Comyns; Robert A Hauser; Joseph Jankovic; Stewart A Factor; Susan Bressman; Kelly E Lyons; Dale P Sandler; J William Langston; Caroline M Tanner
Journal: Mov Disord Date: 2014-05-17 Impact factor: 10.338

5. Multiple criteria optimization joint analyses of microarray experiments in lung cancer: from existing microarray data to new knowledge.

Authors: Katia I Camacho-Cáceres; Juan C Acevedo-Díaz; Lynn M Pérez-Marty; Michael Ortiz; Juan Irizarry; Mauricio Cabrera-Ríos; Clara E Isaza
Journal: Cancer Med Date: 2015-10-16 Impact factor: 4.452

6. Blood Transcriptomic Meta-analysis Identifies Dysregulation of Hemoglobin and Iron Metabolism in Parkinson' Disease.

Authors: Jose A Santiago; Judith A Potashkin
Journal: Front Aging Neurosci Date: 2017-03-29 Impact factor: 5.750

7. A meta-analysis of public microarray data identifies biological regulatory networks in Parkinson's disease.

Authors: Lining Su; Chunjie Wang; Chenqing Zheng; Huiping Wei; Xiaoqing Song
Journal: BMC Med Genomics Date: 2018-04-13 Impact factor: 3.063

8. Identification of potential diagnostic biomarkers for Parkinson's disease.

Authors: Fenghua Jiang; Qianqian Wu; Shuqian Sun; Guanghui Bi; Ling Guo
Journal: FEBS Open Bio Date: 2019-07-03 Impact factor: 2.693

9. Biological signaling pathways and potential mathematical network representations: biological discovery through optimization.

Authors: Clara Isaza; Juan F Rosas; Enery Lorenzo; Arlette Marrero; Cristina Ortiz; Michael R Ortiz; Lynn Perez; Mauricio Cabrera-Ríos
Journal: Cancer Med Date: 2018-04-10 Impact factor: 4.452

10. A Selection of Important Genes and Their Correlated Behavior in Alzheimer's Disease.

Authors: Yazeli E Cruz-Rivera; Jaileene Perez-Morales; Yaritza M Santiago; Valerie M Gonzalez; Luisa Morales; Mauricio Cabrera-Rios; Clara E Isaza
Journal: J Alzheimers Dis Date: 2018 Impact factor: 4.472

1 in total

1. Identification of hub biomarkers of myocardial infarction by single-cell sequencing, bioinformatics, and machine learning.

Authors: Qunhui Zhang; Yang Guo; Benyin Zhang; Hairui Liu; Yanfeng Peng; Di Wang; Dejun Zhang
Journal: Front Cardiovasc Med Date: 2022-07-25

1 in total