Literature DB >> 32258494

A decision support system for multi-target disease diagnosis: A bioinformatics approach.

Femi Emmanuel Ayo1, Joseph Bamidele Awotunde2, Roseline Oluwaseun Ogundokun3, Sakinat Oluwabukonla Folorunso4, Adebola Olayinka Adekunle5.   

Abstract

Malaria and typhoid fever are revered for their ability to individually or jointly cause high mortality rate. Both malaria and typhoid fever have similar symptoms and are famous for their co-existence in the human body, hence, causes problem of under-diagnosis when doctors tries to determine the exact disease out of the two diseases. This paper proposes a Bioinformatics Based Decision Support System (BBDSS) for malaria, typhoid and malaria typhoid diagnosis. The system is a hybrid of expert system and global alignment with constant penalty. The architecture of the proposed system takes input diagnosis sequence and benchmark diagnosis sequences through the browser, store these diagnosis sequences in the Knowledge base and set up the IF-THEN rules guiding the diagnosis decisions for malaria, typhoid and malaria typhoid respectively. The matching engine component of the system receives as input the input sequence and applies global alignment technique with constant penalty for the matching between the input sequence and the three benchmark sequences in turns. The global alignment technique with constant penalty applies its pre-defined process to generate optimal alignment and determine the disease condition of the patient through alignment scores comparison for the three benchmark diagnosis sequences. In order to evaluate the proposed system, ANOVA was used to compare the means of the three independent groups (malaria, typhoid and malaria typhoid) to determine whether there is statistical evidence that the associated values on the diagnosis variables means are significantly different. The ANOVA results indicated that the mean of the values on diagnosis variables is significantly different for at least one of the disease status groups. Similarly, multiple comparisons tests was further used to explicitly identify which means were different from one another. The multiple comparisons results showed that there is a statistically significant difference in the values on the diagnosis variables to diagnose the disease conditions between the groups of malaria and malaria typhoid. Conversely, there were no differences between the groups of malaria and typhoid fever as well as between the groups of typhoid fever and malaria typhoid. In order to show mean difference in the diagnosis scores between the orthodox and the proposed diagnosis system, t-test statistics was used. The results of the t-test statistics indicates that the mean values of diagnosis from the orthodox system differ from those of the proposed system. Finally, the evaluation of the proposed diagnosis system is most efficient at providing diagnosis for malaria and malaria typhoid at 97% accuracy.
© 2020 The Author(s).

Entities:  

Keywords:  Bioinformatics; Computer science; Expert system; Malaria; Sequence alignment; Typhoid fever

Year:  2020        PMID: 32258494      PMCID: PMC7113440          DOI: 10.1016/j.heliyon.2020.e03657

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Malaria is a life threatening disease common in temperate climate zones including Sub-Saharan Africa, Asia and the Americas. A female Anopheles mosquitoes carrying plasmodium parasite in their salivary glands is the transmitter of malaria (Poolphol et al., 2017). The severity of malaria rest on the class of this plasmodium parasite. Malaria could be a product of many sources such as insect stings, blood transfusion through contaminated needles or unscreened blood (Abduah and Karunamoorthi, 2016). When an infected source infects a person, the plasmodium parasites is injected into the blood and down to the liver for its life cycle. After a complete life cycle of the parasite in lever, it then travels through the circulatory system and attack red blood cells (Jan et al., 2018; Sajjad et al., 2016). Symptoms of malaria are high fever, sweating, vomiting, shaking, headache, muscle and joints pain, usually noticeable within a few weeks after infection. Typhoid on the other hand is a bacteria illness caused by the Salmonella enterica serotype Typhi and transmitted through a human carrier in the form of contaminated food and water (Qamar et al., 2018; Abatcha et al., 2019). The bacteria attack the intestine and temporarily stayed in the blood stream. The bacteria are then transported by white blood cells in the liver and bone marrow, where they regenerate and re-enter the blood stream. The maturity period of typhoid is basically a maximum of two weeks and the illness can take several weeks. The symptoms include headaches, diarrhea, high fever, poor appetite, and body pains. Both malaria and typhoid fever have similar symptoms and are famous for their co-existence in the human body i.e. malaria and typhoid can combine in human as malaria typhoid causing severe complexity in diagnosis. Precisely speaking, the joint infection of malaria typhoid in a host causes problem of under-diagnosis when doctors tries to determine the exact disease out of the two diseases. Malaria and typhoid fever have been identified by scholars as killer diseases accounting for the periodic death of several millions of people worldwide. This high mortality rate can be traced to reasons such as poor medical diagnosis methods and lack of competent medical personnel. Bioinformatics is an interdisciplinary field of science involving the use of information technology to solve problems inherent in biology and computer science (Edwards et al., 2009). Research in bioinformatics includes algorithms designed for storage, retrieval, and data analysis. Bioinformatics is a fast developing field of science combining biology, information engineering, computer science, mathematics and statistics to examine and understand biological phenomena. It has practical applications in specific areas such as molecular biology and medical disease diagnosis. Sequence Alignment is a form of bioinformatics that uses various algorithms to locate functional subsets in biological sequences (whether DNA or protein) (Rosenberg, 2009). Sequence alignments can also be deployed to non-biological phenomena such as in natural language, clustering and financial data. An expert system is an area of Artificial Intelligence (AI) designed to learn the skills of a human-expert coded in the form of rules (Yadav and Pandey, 2015). An expert system has been identified as a vibrant tool for the identification of various diseases such as skin diseases (melanoma, impetigo, and eczema), kidney diseases, meningitis, cerebral palsy, migraine, cluster headache, stroke, epilepsy, multiple sclerosis, parkinson, alzheimer and huntington disease (Amarathunga et al., 2015; Singla et al., 2014). Recently, a lot of researches has been geared towards the use of expert system for medical disease diagnosis and this has transformed to the emergence of technologically inclined medical consultation. Therefore, expert system is regarded as a decision support system in combination with other techniques in the field of AI for diseases diagnosis based on known symptoms (Horvitz et al., 1988). The main objectives of this paper are (1) To improve on existing systems that can only diagnose one disease at a time (2) To design benchmark sequences of symptoms for malaria, typhoid fever and malaria typhoid and (3) To design a bioinformatics approach for the identification and prediction of malaria, typhoid fever and malaria typhoid simultaneously. The rest of this paper is organized as follows: Section 2 presents related work. Materials and methods is presented in Section 3. The implementation procedure and discussion is presented in Section 4. System evaluation of the proposed approach is well highlighted in Section 5. Section 6 presents the conclusion and future work.

Related work

Computer inspired tool such as sequence alignment algorithms can be deployed in medical diagnosis systems to check death ratio and reduce the stress of waiting to see a medical doctor. Medical diagnosis system is an emerging technology in the field of AI used to help health care experts in making efficient and appropriate clinical decisions (Shortliffe, 1987). Medical diagnosis systems in combination with bioinformatics inspired techniques can provide useful information on medical data under the knowledge supervision of a human expert. This useful information can assist medical experts in identifying disease categories in patients and provide timely intervention in the form of treatments advice (Wan and Fadzilah, 2006). Researchers have developed several intelligent approaches for medical diagnosis systems in an attempt to identify disease category, reduce waiting time of patients, reduce health care service costs and increase service rate of medical experts. As seen in most studies (Bourlas et al., 1999; Alexopoulos et al., 1999; Ruseckaite, 1999; Manickam and Abidi, 1999; Zelic et al., 1999), intelligent approaches developed to assist experts in timely detection and prevention of diseases can only deal with one disease condition. Hence, it is important to develop a multi-target disease diagnosis system for the identification of two or more disease conditions in patients. Oguntimilehin et al. (2013) presented a machine learning approach for clinical diagnosis of typhoid fever. The authors collected labelled dataset with severity levels of typhoid fever from medical experts. The labelled dataset comprises of diagnosis variables and severity levels of very low, low, moderate, high and very high as classes to create reasonable guidelines for the diagnosis of typhoid fever with 96% detection accuracy. The authors asserted that the system could lead to reduction in mortality rate and patient waiting time respectively. One limitation of their work was the problem of rule extraction, which, if overcome, could lead to better diagnosis accuracy. Samuel and Omisore (2013) proposed a mixture of fuzzy logic and neural network for the efficient diagnosis of typhoid fever. The mixture model provide a method that allows the neural network module to automatically optimize the diagnosis of typhoid fever by generating the diagnosis rules for the fuzzy inference system. The mixture model was reported to offer reliable diagnosis that is time efficient and less expensive. However, the proposed mixture model could lead to computational overhead due to unproven concept of weight adjustment in neural networks. Fatumo et al. (2013) designed a robust computer simulated medical expert based on input diagnosis variables as rules stored in the inference engine for the identification of different types of malaria and typhoid problems. The designed medical expert system offers effectiveness in use and accessibility, although insufficient rules and symptoms in the knowledge base can reduce the effectiveness of their designed system. Djam et al. (2011) designed a fuzzy expert system for the diagnosis and treatment of malaria based on degree of participation of each diagnosis variables using the root sum square and centre of gravity for reasoning and diagnosis decision respectively. The designed fuzzy expert system was able to provide reasonable diagnosis for malaria with some degree of confidence. The authors considered the designed fuzzy expert system to be user friendly and a means to ease medical consultations. The disadvantage of the system is centered on the problem of knowledge representation inherent to most rule based systems. Adehor and Burrell (2008) designed a simple differential diagnostic model for detecting malaria, typhoid and unknown-fever in the subregions of Africa based on signs and symptoms provided from interaction with the users. The designed model provides a more simplified way for entering signs and symptoms by taking responses from both a supervising user and the patient. This way of information entry reduces erroneous information and enhances the diagnosis accuracy. The designed differential diagnostic model could lead to delays, risks and expensive inefficient diagnosis due to multiple alternative solutions that may be similar. Aminu et al. (2016) proposed a predictive symptoms-based system rooted in the binary classification of Support Vector Machines (SVM) to enhanced joint classification of malaria and typhoid fever. The authors reported that the proposed predictive symptoms-based system represents a reliable substitute for disease diagnosis and the evaluation results indicates a low classification accuracy. Samuel et al. (2013) proposed a Web-Based Decision Support System (WBDSS) rooted in Fuzzy Logic (FL) for the diagnosis of typhoid fever. The FL system is composed of a fuzzifier, fuzzy inference engine, and a defuzzifier for rules formulation, reasoning and diagnosis decision respectively. The results obtained showed that the proposed system is suitable for diagnosis problems. However, the fuzzy sets of fuzzy logic models cannot automatically adjust its linguistic variables to suit unseen conditions. Boruah and Kakoty (2019) provide a comparative analysis of different data mining techniques for the prediction and diagnosis of malaria. The study inferred that ensemble data mining techniques could be more efficient in the prediction and diagnosis of malaria than a single predictive model. The authors recognized that most literatures on disease diagnosis systems failed to test their systems on detection accuracy, simplicity and accessibility. Uzoka et al. (2011) proposed a combination of fuzzy logic and the Analytical Hierarchy Process (AHP) methods in the medical diagnosis of malaria. The fuzzy logic provides the rules needed to combine the multiple diagnosis decision variables supported by AHP in order to determine the relative importance of each variable in the diagnostic decision making process. The results of the research proved effective for non-expert medical practitioner in the diagnosis of malaria. The limitation of the system hinges on the problem of knowledge representation identified with fuzzy logic systems. Mutawa and Alzuwawi (2019) presented a multi-layered rule-based expert system for detecting uveitis. The rules combination on the diagnosis variables decreases as the network propagate from the input layers to the output layer. The network design assist in deciding the primary signs and symptoms of some diseases needed to evaluate the probability of that disease instead of integrating all the disease diagnosis variables. The system represents an intelligent guideline for young medical doctors in providing accurate treatment advice to patients. The system provides easy adaptability to unseen conditions through its unique multilayer design. Conversely, the system has no technique that can mitigate input errors of signs and symptoms from users.

Bioinformatics

Bioinformatics is an interdisciplinary field of science consisting of tools developed to gain knowledge about biological phenomena (Edwards et al., 2009). It is a technology initially designed for the practical purpose of introducing pattern into the big data generated by the modern development in molecular biology. The bioinformatics technology started with the idea of developing computer inspired tools for locating functional patterns in biological sequences e.g. locations of functional structures in Deoxyribonucleic Acid (DNA). Bioinformatics is a fast developing field of science combining biology, information engineering, computer science, mathematics, chemistry and statistics to derive useful knowledge from biological phenomena (See Figure 1). One common area of its application include medical disease diagnosis.
Figure 1

Bioinformatics disciplines (Source: Diniz and Canduri, 2017).

Bioinformatics disciplines (Source: Diniz and Canduri, 2017).

Sequence alignment methods

Sequence Alignment is a form of bioinformatics tool designed for the comparison of two or more sequences in order to derive important biological knowledge (Behbahani et al., 2016). It is used to discover both patterns and functional connection between sequences. Alignment locates similarity grade between text sequences and pattern sequences. Most sequence alignment employs divide-and-conquer approach for optimal alignment scores. The functional behaviour of an unknown pattern can be predicted by simply employing sequence alignment. The optimum similarity of the unknown pattern after alignment with a database of known text sequences is normally assumed as the functional information contained in the pattern. There are predominantly two techniques of sequence alignment: global alignment and local alignment.

Global alignment

In global alignment comparison is done from start till finish of the pattern to locate the optimal alignment. This kind of alignment followed the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970). This algorithm is very often used in many computer science applications. Global alignment method is most recommended for sequences that are identical in length.

Local alignment

Sequences with no identical length can be matched with local alignment technique. It divides sequences into subsets and compare subsets of all possible lengths. This kind of alignment followed the Smith-Waterman algorithm (Smith and Waterman, 1981). These two basic alignment techniques are known to follow the popular divide-and-conquer approach.

Gap penalty

A Gap penalty is a technique derived to allow for more character matching between closely related sequences. When comparing sequences, the use of gaps in the sequences can allow more characters to be matched by an alignment algorithm than it's possible in normal alignment. However, to arrive at suitable alignment it is essential to control the length and number of gaps in an alignment. The three basic types of gap penalties are constant, linear and affine (Manikandan and Ramyachitra, 2017).

Constant

This is the most basic type of gap penalty where a fixed negative score is assigned to every gap, irrespective of its length. For example, aligning two sequences as in Figure 2, with '-' showing a 1-gap alignment. Assume a 1 is assigned for every match and -1 for every gap, then total score is 7 − 1 = 6 as computed by (1).
Figure 2

1-gap alignment.

1-gap alignment.

Linear

The linear gap penalty in contrast to constant gap penalty consider the length (L) of each insertion/deletion in the gap. Hence, if the penalty for each gap is Y and the length of the gap is L; the resultant gap penalty is the product of the two YL. This technique discourages lengthy gaps, with total score decreasing for each additional gap. For example, the total score for Figure 2 using linear gap penalty is 7 − 3 = 4 as computed by (2).

Affine

This is a blend of the constant and linear gap penalty. It is the most common of the gap penalty types. The affine gap penalty is of the form.where X is the gap opening penalty, Y the gap extension penalty and L the length of the gap. Gap opening denotes the cost necessary to open a gap of any length, and gap extension is the cost for each additional length to an existing gap. Although, the value of X and Y varies according to purpose and thus the values cannot be ascertained. If the purpose is to find closely related matches, a bigger gap penalty is needed to discourage gap openings. Conversely, if the purpose is simply to find a less closely related match, then a reduced gap penalty is recommended.

Expert systems

The uncultivated habit of people to visit the hospital for regular check-ups coupled with their busy schedule, has triggered the emergence of medical diagnosis systems as an alternative for human experts. The wide acceptance of these medical diagnosis systems has translated to an alteration from human consultation to system consultation. Medical diagnosis system is an expert system with coded knowledge of some domain experts which can categorize diseases based on selected symptoms (Lingiardi et al., 2015; Weiss et al., 1978; Moses, 2015; Mutawa and Alzuwawi, 2019). These coded knowledge based on some inference mechanisms are deployed by the system for making smart decisions. Moreso, the deployment of these medical diagnosis systems can greatly assist medical staff in the discharge of quality health care services. AI as an umbrella word in intelligent computing, is the design and implementation of machines behaving at the level of a human expert. Expert systems have carved out a niche for itself in AI compared to other machine learning techniques. The most recognised area of expert systems application is inherent in the health care domain for detection and prevention of diseases (Pai et al., 2006). The friendly user interface and explanation facilities of expert systems has made them the most popular tool for problem solving.

Materials and methods

The architecture of the proposed Bioinformatics Based Decision Support System (BBDSS) for multi-target disease diagnosis is presented in Figure 3.
Figure 3

Architecture of BBDSS for multi-target disease diagnosis.

Architecture of BBDSS for multi-target disease diagnosis. Figure 3 consists of a fusion of expert system and sequence alignment techniques for the diagnosis of Malaria Fever (MF), Typhoid Fever (TF) and Malaria Typhoid Fever (MTF). The architecture takes input diagnosis variables through the browser representing benchmark diagnosis sequences for the three disease conditions, patient signs and symptoms and domain expert knowledge. The browser is the interface through which the users (health professional and domain expert) interact with the system and provide diagnosis results to the outside world. The Knowledge base comprises of the database and Rule Base (RB). The database stores benchmark diagnosis sequences, patient signs and symptoms and domain expert knowledge while the RB is made up of a set of IF-THEN rules depicting the benchmark diagnosis variables for each disease conditions of MF, TF and MTF respectively. The sequence alignment component receives as input the patient signs and symptoms (input sequence) supplied by the patient and applies global alignment technique with constant penalty for the matching between the input sequence and the three benchmark sequences in turns. The global alignment technique with constant penalty applies its pre-defined process to generate optimal alignment and determine the disease condition of the patient through comparing the alignment scores for the three benchmark diagnosis sequences. Finally, the best optimal alignment score is returned as the diagnosis result for the patient. The full details of the sequence alignment component showing the process flow is also presented in Figure 4.
Figure 4

Flowchart of the proposed system.

Flowchart of the proposed system.

Diagnosis variables

Given an input sequence of patient signs and symptoms and benchmark sequences of domain expert rules for the diagnosis of MF, TF and MTF respectively. Let Eqs. (4) and (5) represent input and benchmark diagnosis sequences respectively; Such that and represents joint disease diagnosis variables for the possible diagnosis of MF, TF and MTF as in Table 1. Table 1 shows the joint disease diagnosis variable, value and abbreviation code. Depending on the input sequence per patient in alignment with the benchmark diagnosis sequences, MF, TF or MTF can be diagnosed.
Table 1

MF, TF, MTF diagnosis variables.

SNMF, TF, MTF diagnosis variableValueCODE
1Generalized Body PainY/NGBP
2Generalized Body Discomfort (Malaise)Y/NGBD
3Generalized Body WeaknessY/NGBW
4Sweating ProfuselyY/NSP
5VomitingY/NVMT
6Loss of AppetiteY/NLOA
7Bitter taste in your ThroatY/NBT
8Diarrhea (discharging faeces from the bowels frequently in liquid form)Y/NDHE
9Type of Fever (Intermittent or Remittent)I/RTOF
10Abdominal Distension (Swelling Stomach)Y/NAD
11Abdominal PainY,NAP
12ConstipationY/NCON
13Loss of WeightY/NLOW
14Extreme Muscle WeaknessY/NEMW
15ConfusionY/NCF
16Irrational TalkingY/NIRR
17Epistaxis (Bleeding nose)Y/NEPI

MF = Malaria Fever, TF = Typhoid Fever, MTF = Malaria Typhoid Fever, Y=Yes, N=No.

MF, TF, MTF diagnosis variables. MF = Malaria Fever, TF = Typhoid Fever, MTF = Malaria Typhoid Fever, Y=Yes, N=No.

Database

The database stores patient interaction with the system providing symptoms details. The database is a repository storing the benchmark diagnosis sequences, patient signs and symptoms and domain expert knowledge. It receive input diagnosis variables and the domain expert knowledge through the browser interface. Hence, the authors sought and obtained ethical approval from the Landmark University Research Ethical Board. This ethical approval was given by the Landmark University in collaboration with its medical center.

Rule base

The rule base for MF, TF and MTF is composed of the benchmark diagnosis variables combined by a set of IF-THEN rules in which the IF-parts consist of the diagnosis variables combined by the AND operator while the THEN-parts involve the diagnosis decisions. The rules that constitute the benchmark sequences were intelligently formulated with the knowledge of domain experts. Table 2 represents the rule base of benchmark sequences for the three disease conditions. The three disease conditions have joint diagnosis variables but different diagnosis values. When an input sequence is aligned with the benchmark sequences a rule is fired for a disease condition with the maximum alignment score, otherwise the diagnosis result returns no decision if all the alignment scores is below a specified percentage threshold.
Table 2

MF, TF, MTF benchmark sequences.

DiseaseGBPGBDGBWSPVMTLOABTDHETOFADAPCONLOWEMWCFIRREPI
MFYYYYYYYYINNNNNNNN
TFNNYNYYNYRYYYYYYYY
MTFYYYYYYYYRYYYYYYYY
MF, TF, MTF benchmark sequences.

Dot plot matrix

A dot plot matrix or similarity matrix is constructed to permutate all the possible alignment between a given input diagnosis sequence and all the benchmark diagnosis sequences. In order to characterise all the possible combinations of variables and their resultant scores a similarity matrix is used. The similarity matrix is defined by an alignment matrix and scoring matrix.

Alignment

Alignment operation

Given two sequences and defined over an alphabet . An alignment operation is a pair . Note that − but . We call (x, y). substitution iff and deletion (del) iff = − insertion (in) iff = −

Alignment matrix

Let and Alignment matrix of and is the -matrix defined by (6): Given () is alignment of , where is the cost function.

Scoring matrix

We use a scoring scheme or cost function that simply give a value of 1 for each match, and 0 for mismatch using constant penalty as in (1). A simple scoring scheme (7) is used for the alignment between an input diagnosis sequence and benchmark diagnosis sequences, since no gap allowance is considered in the system. i.e., all sequences are of equal lengths.

Trace-back

Once the alignment matrix with the cost function is computed, the entry D{nm} provides the maximum score among all possible alignments. To compute optimal alignment, you start from the bottom right cell as follows: Start in (n, m). For every (i, j) determine optimal case. Compare the value with the three possible sources (match, insert, and delete) Sequence of trace arrows with maximum trace gives optimal alignment. Hence, the number of possible global alignments between an input diagnosis sequence and a benchmark diagnosis sequence of length N can be represented as in (8).

Research hypotheses

In order to investigate if the mean score of the proposed system is statistically equal to the orthodox system, the following research hypotheses were formulated:where H0 denotes the null hypothesis and H1 denotes the alternative hypothesis. H0: μ1 = μ2: The paired mean score between the orthodox system and the proposed system are equal, that is, differ on average by a small margin at most. H1: μ1 ≠ μ2: The paired mean score between the orthodox system and the proposed system are not equal, that is, differ on average by a large margin.

Implementation results and discussion

The proposed diagnosis system was implemented using a java programming language which runs on Netbeans IDE 8.0.2 environment and MySQL as the database management system. Figure 5 presents an instance interface through which a health professional enter the input diagnosis value in the form of signs and symptoms for a diagnosis decision. The health professional can enter the next input diagnosis value by pressing the next button. The diagnosis information were gathered from experts about symptoms of malaria and typhoid. These symptoms led to a total of 17 questions being asked on the GUI of the program with a “Yes” or “No” option. Any of the selected option generates a character (i.e. ‘Y’ for a “Yes” and ‘N’ for a “No”). These characters form a sequence of string at the end of the questions. Three benchmark sequences of strings are being stored in the rule base of the program, which represents malaria, typhoid and malaria typhoid sequences respectively.
Figure 5

Patient diagnosis pane.

Patient diagnosis pane. The string generated is then aligned with those in the rule base for comparison. The percentage of matches between the three sequences determines what disease the patient is most likely suffering from as in Figure 6. For example, if the generated string has a higher percentage of match with malaria sequence than that of the two other sequences, then the patient is most likely suffering from malaria. The proposed system is designed in such a way that if the percentage of matches between the generated sequence and the three benchmark sequences in the rule base is not up to 70%, then the patient is most likely not suffering from any of the three diseases. Hence, the patient is advised to visit the medical doctor (See Figure 4).
Figure 6

Diagnosis decision.

Diagnosis decision. When the ‘‘View Sequence Details’’ button in Figure 6 is pressed, the result of sequence alignment details for a given patient is displayed as in Figure 7. This result represents the diagnosis results leading to a diagnosis decision for the patient with ID ‘‘001’’.
Figure 7

Sequence alignment result for patient with ID ‘‘001’’.

Sequence alignment result for patient with ID ‘‘001’’. Table 3 represents the input diagnosis sequences of fifteen patients for MF, TF and MTF. Column 1 represent patient identification numbers. Columns 2–18 represents the diagnosis variables. Columns 19, 20 and 21 depicts the diagnosis scores for MF, TF and MTF respectively. The last column represents the diagnosis decision at each instance for the fifteen patients.
Table 3

Input diagnosis sequences for MF, TF and MTF.

IDGBPGBDGBWSPVMTLOABTDHETOFADAPCONLOWEMWCFIRREPIScore
Diagnosis
MFTFMTF
00111100111I000110000.76470.35290.4706malaria
00210111001I100000010.70590.41180.4118malaria
00300100101R010110110.35290.76470.5294typhoid
00411101001I111101010.47060.64710.6471neither
00501101100R111111110.23530.88240.7647typhoid
00600010010R001001100.41180.35290.3529neither
00711011110I010010010.70590.29410.5294malaria
00811111111R111111100.52940.70590.9412malaria typhoid
00900101100R011101110.29410.82350.5882typhoid
01011111011R111110110.47060.64710.8824malaria typhoid
01111110111R010001110.64710.47060.7059malaria typhoid
01201011110I000000000.82350.17650.2941malaria
01301111110I101110110.52940.58820.7059malaria typhoid
01400111101R110111110.35290.88240.7647typhoid
01500000100R111100110.17650.70590.4706typhoid

1 = Yes, 0 = No, I = Intermittent, R = Remittent.

Input diagnosis sequences for MF, TF and MTF. 1 = Yes, 0 = No, I = Intermittent, R = Remittent. Figure 8 presents the diagnosis category against the number of patients for the fifteen patients.
Figure 8

Diagnosis category Vs Number of patients.

Diagnosis category Vs Number of patients.

System evaluation

Since the diagnosis of malaria, typhoid and malaria typhoid is independent of one another but having joint diagnosis variables, a one-way ANOVA was used in order to compare the means of the three independent groups (malaria, typhoid and malaria typhoid) to determine whether there is statistical suggestion that the associated values on the diagnosis variables means are significantly different. In the proposed system, the diagnosis variables is the health professional's inputs (Y/N) to diagnose a given disease condition, and disease status is an indicator about whether or not the patient have (0 = malaria, 1 = typhoid, 2 = malaria typhoid). We use ANOVA to test if there is a statistically significant difference in diagnosis variables with respect to disease status. Diagnosis variables will serve as the dependent variable, and disease status will act as the independent variable. From Table 4, we conclude that the mean of the values on diagnosis variables is significantly different for at least one of the disease status groups (F (2, 351) = 9.194, p < 0.005). Since the ANOVA alone is considered insufficient to tell us explicitly which means were different from one another, multiple comparisons tests was further used.
Table 4

ANOVA.

Sum of SquaresDfMean SquareFSig.
Between Groups0.26720.1339.194.000
Within Groups5.0933510.015
Total5.359353
ANOVA. From the ANOVA results, we ascertained that there are statistically significant differences between the groups as a whole. From Table 5, multiple comparisons shows which groups differed from each other. The Tukey post hoc test was used due to its simplicity and preferred test for conducting post hoc tests on a one-way ANOVA. Table 5 shows that there is a statistically significant difference in the values on the diagnosis variables to diagnose the disease conditions between the groups of malaria and malaria typhoid (p = 0.000). However, there were no differences between the groups of malaria and typhoid fever (p = 0.141), as well as between the groups of typhoid fever and malaria typhoid (p = 0.520).
Table 5

Multiple Comparisons (measure mean difference based on the benchmark sequences).

(I) Category(J) Category
95% Confidence Interval
Mean Difference (I-J)Std. ErrorSig.Lower BoundUpper Bound
012-.0422215-.0707797.0222496.0173578.141.000-.094590-.111635.010147-.029925
102.0422215-.0285582.0222496.0261836.141.520-.010147-.090186.094590.033070
201.0707797.0285582.0173578.0261836.000.520.029925-.033070.111635.090186

The mean difference is significant at the 0.05 level, 0 = malaria, 1 = typhoid, 2 = malaria typhoid.

Multiple Comparisons (measure mean difference based on the benchmark sequences). The mean difference is significant at the 0.05 level, 0 = malaria, 1 = typhoid, 2 = malaria typhoid. In order to evaluate the performance of the proposed diagnosis system in terms of results accuracy, we performed a comparative analysis of one hundred and twenty random system diagnostic findings with those obtained from the orthodox method equivalent as shown in Table 6a, Table 6b, Table 6c. Table 6a, b, c gives the total diagnosis accuracy of the proposed system with accuracy of 38.1565, 38.703 and 39.5497 respectively. The mean accuracy can be computed by the summation of the three accuracies divided by the 120 data points.
Table 6a

Comparative analysis.

Patient IDDROMDRPSei(1ei)Disease status
10.81280.76470.04810.9519malaria
20.76380.70590.05790.9421malaria
30.79250.76470.02780.9722typhoid
40.89960.88240.01720.9828malaria typhoid
50.98240.94120.04120.9588typhoid
60.94120.87450.06670.9333malaria
70.82350.78640.03710.9629malaria
80.88240.82310.05930.9407malaria typhoid
90.70590.69420.01170.9883neither
100.82350.78640.03710.9629malaria typhoid
110.70590.69420.01170.9883neither
120.88240.82310.05930.9407malaria
130.70590.69420.01170.9883neither
140.81500.78330.03170.9683typhoid
150.85250.78140.07110.9289typhoid
160.86880.81340.05540.9446malaria typhoid
170.88680.81130.07550.9245malaria typhoid
180.77300.70250.07050.9295malaria
190.87530.82270.05260.9474typhoid
200.67310.62210.0510.949neither
210.76540.72430.04110.9589malaria
220.93880.82670.11210.8879typhoid
230.91340.85440.0590.941malaria
240.76740.72150.04590.9541malaria
250.74450.70870.03580.9642typhoid
260.86530.82620.03910.9609malaria
270.84440.80450.03990.9601malaria typhoid
280.85290.81530.03760.9624typhoid
290.78060.72340.05720.9428malaria
300.74410.71670.02740.9726malaria
310.74020.70140.03880.9612malaria
320.79610.73920.05690.9431malaria typhoid
330.79640.75540.0410.959malaria
340.95810.92210.0360.964typhoid
350.82980.75520.07460.9254malaria typhoid
360.75150.72610.02540.9746malaria
370.87040.82780.04260.9574typhoid
380.70410.64470.05940.9406neither
390.67620.62550.05070.9493neither
400.77190.74350.02840.9716malaria
Total1.843538.1565

DROM = Diagnosis Results of the Orthodox Method, DRPS = Diagnosis Results of the Proposed.

System, = error in diagnosis, and (1) = accuracy of the proposed system.

Table 6b

Comparative analysis.

Patient IDDROMDRPSei(1ei)Disease status
410.75600.71570.04030.9597typhoid
420.86920.78410.08510.9149malaria
430.81100.79810.01290.9871malaria typhoid
440.84190.78980.05210.9479malaria typhoid
450.81540.76280.05260.9474malaria
460.86410.81240.05170.9483typhoid
470.77650.72830.04820.9518malaria
480.84770.77680.07090.9291malaria
490.85850.81610.04240.9576typhoid
500.83950.78270.05680.9432malaria
510.73570.67420.06150.9385neither
520.86600.80980.05620.9438malaria
530.71650.67310.04340.9566neither
540.81250.78150.0310.969malaria typhoid
550.86140.81980.04160.9584malaria typhoid
560.85240.83540.0170.983malaria
570.74070.72970.0110.989malaria
580.84980.81920.03060.9694malaria
590.73220.71570.01650.9835typhoid
600.80820.79770.01050.9895malaria typhoid
610.87340.86710.00630.9937typhoid
620.73450.69240.04210.9579neither
630.85480.84860.00620.9938malaria typhoid
640.81010.79480.01530.9847malaria
650.82770.80420.02350.9765malaria
660.84150.81710.02440.9756typhoid
670.86130.85370.00760.9924malaria
680.71090.68980.02110.9789neither
690.76450.75710.00740.9926malaria
700.80510.77320.03190.9681malaria
710.91890.88480.03410.9659typhoid
720.93340.89280.04060.9594typhoid
730.76350.74230.02120.9788malaria
740.89030.86440.02590.9741malaria
750.83840.81920.01920.9808typhoid
760.72170.69110.03060.9694neither
770.79790.75330.04460.9554malaria
780.81420.78640.02780.9722malaria
790.91250.88810.02440.9756typhoid
800.70580.69530.01050.9895neither
Total1.29738.703

DROM = Diagnosis Results of the Orthodox Method, DRPS = Diagnosis Results of the Proposed.

System, = error in diagnosis, and (1) = accuracy of the proposed system.

Table 6c

Comparative analysis.

Patient IDDROMDRPSei(1ei)Disease status
810.82010.80840.01170.9883malaria
820.78720.77520.0120.988malaria
830.79220.78350.00870.9913malaria typhoid
840.76720.75570.01150.9885typhoid
850.90920.89610.01310.9869typhoid
860.79780.78740.01040.9896malaria
870.87700.86610.01090.9891typhoid
880.85490.84720.00770.9923malaria
890.74060.73910.00150.9985malaria
900.89740.88910.00830.9917malaria typhoid
910.88080.87810.00270.9973malaria typhoid
920.79140.78470.00670.9933malaria
930.87480.85310.02170.9783typhoid
940.70870.68590.02280.9772neither
950.78470.77810.00660.9934malaria
960.88160.87910.00250.9975typhoid
970.89880.88690.01190.9881typhoid
980.89030.87310.01720.9828malaria
990.87670.86340.01330.9867typhoid
1000.77750.76760.00990.9901malaria
1010.87700.86960.00740.9926malaria
1020.87040.85830.01210.9879typhoid
1030.84350.82470.01880.9812malaria typhoid
1040.79630.78190.01440.9856malaria
1050.87640.86650.00990.9901malaria typhoid
1060.73270.71720.01550.9845malaria
1070.76190.75830.00360.9964typhoid
1080.73120.71810.01310.9869typhoid
1090.74710.73930.00780.9922malaria
1100.83960.81750.02210.9779malaria
1110.81380.80050.01330.9867malaria typhoid
1120.78190.77830.00360.9964malaria
1130.79080.78770.00310.9969malaria
1140.93300.92750.00550.9945typhoid
1150.80790.78190.0260.974malaria typhoid
1160.78190.77830.00360.9964malaria
1170.80560.78830.01730.9827malaria
1180.77550.75050.0250.975typhoid
1190.78380.77010.01370.9863malaria
1200.90100.89760.00340.9966malaria
Total0.450339.5497

DROM = Diagnosis Results of the Orthodox Method, DRPS = Diagnosis Results of the Proposed.

System, = error in diagnosis, and (1) = accuracy of the proposed system.

Comparative analysis. DROM = Diagnosis Results of the Orthodox Method, DRPS = Diagnosis Results of the Proposed. System, = error in diagnosis, and (1) = accuracy of the proposed system. Comparative analysis. DROM = Diagnosis Results of the Orthodox Method, DRPS = Diagnosis Results of the Proposed. System, = error in diagnosis, and (1) = accuracy of the proposed system. Comparative analysis. DROM = Diagnosis Results of the Orthodox Method, DRPS = Diagnosis Results of the Proposed. System, = error in diagnosis, and (1) = accuracy of the proposed system. Therefore, the mean accuracy of the proposed diagnosis system and its efficiency is computed as follows: From the evaluation result, it can be concluded that the proposed diagnosis system is most efficient at providing diagnosis for malaria and malaria typhoid at 97% accuracy. The calculated MA and E values suggested that the proposed system is 97% accurate. In order to show the significant difference in the diagnosis scores between DROM and DRPS, we use the t-test statistics. It is used to test for differences between the two diagnosis methods based on the measured scores. In Table 7, t-test statistics for the two diagnosis methods under comparison was presented. It can be reported in Table 7 that the p-value of 2.24734E-29 means that the averages for DROM and DRPS are significantly different. In order words, the mean values of diagnosis from the orthodox system differ from those of the proposed system.
Table 7

t-test statistics.

DROMDRPS
Mean0.817280.788273333
Variance0.0043975820.004419468
Observations120120
Pearson Correlation0.949619763
Hypothesized Mean Difference0
df119
t Stat15.07592871
P(T<=t) one-tail1.12367E-29
t Critical one-tail1.657759285
P(T<=t) two-tail2.24734E-29
t Critical two-tail1.980099876
t-test statistics. In order to conclude that the mean value of the proposed system is statistically the same as that of the orthodox system, equivalence test was investigated. In other words, equivalence test was conducted to investigate if the accuracy of the proposed system is as good as the orthodox system. Table 8 shows the results of the equivalence test between the orthodox and the proposed diagnosis systems. From the results, it can be concluded that:
Table 8

Equivalence test.

MeanNStd. DeviationStd. Error Mean
Paired samples statisticsDROM0.8172801200.06631430.0060536

DRPS
0.787357
120
0.0668423
0.0061018


N
Correlation
Sig.

Paired correlation
DROM & DRPS
120
0.947
0.000



Mean
t
df
Sig. (2-tailed)
Paired differences
DROM-DRPS
0.0299233
15.091
119
0.000

Lower
Upper



95% Confidence Interval (CI) of the difference0.02599710.0338496

∗Lower = Lower equivalence bound ∗Upper = Upper equivalence bound.

DROM and DRPS scores were strongly and positively correlated (r = 0.947, p < 0.001). There was a significant average difference between DROM and DRPS scores (= 15.091, p < 0.001). On average, DROM scores were 0.0299233 points higher than DRPS scores (95% CI [0.0259971, 0.0338496]). Equivalence test. ∗Lower = Lower equivalence bound ∗Upper = Upper equivalence bound. Since the paired mean difference between the orthodox system and the proposed system differ on average by a small score margin of 0.0299233 and probability level for the equivalence test is less than the recommended value of alpha (0.05), the null hypothesis can be accepted and the accuracy of the proposed system can be considered valid.

Conclusion and future work

This paper developed a decision support system for malaria, typhoid fever and malaria typhoid diagnosis using bioinformatics approach. The system is a hybrid of expert system and global alignment with constant penalty. Both malaria and typhoid fever have similar symptoms and are famous for their co-existence in the human body. Hence, the need for an efficient method for detecting these disease conditions. The architecture of the proposed system takes input diagnosis variables through the browser representing benchmark diagnosis sequences for the three disease conditions, patient signs and symptoms controlled by a health professional and domain expert knowledge respectively. The browser is the interface through which a health professional interact with the system and provide diagnosis results to the outside world. The Knowledge base comprises of the database and rule base. The database stores benchmark diagnosis sequences, patient signs and symptoms and domain expert knowledge while the rule base is made up of a set of IF-THEN rules depicting the benchmark diagnosis variables for each disease conditions of malaria, typhoid fever and malaria typhoid respectively. The matching engine component receives as input the input sequence and applies global alignment technique with constant penalty for the matching between the input sequence and the three benchmark sequences in turns. The global alignment technique with constant penalty applies its pre-defined process to generate optimal alignment and determine the disease condition of the patient through comparing the alignment scores for the three benchmark diagnosis sequences. We used ANOVA to compare the means of the three independent groups (malaria, typhoid and malaria typhoid) to determine whether there is statistical evidence that the associated values on the diagnosis variables means are significantly different. The ANOVA results indicates that the mean of the values on diagnosis variables is significantly different for at least one of the disease status groups. Similarly, multiple comparisons tests was further used to explicitly tell us which means were different from one another. The multiple comparisons results showed that there is a statistically significant difference in the values on the diagnosis variables to diagnose the disease conditions between the group of malaria and malaria typhoid. Conversely, there were no differences between the groups of malaria and typhoid fever as well as between the groups of typhoid fever and malaria typhoid. In order to show the significant difference in the diagnosis scores between DROM and DRPS, t-test statistics was used. It is used to test for differences between the two diagnosis methods based on the measured scores. The results of the t-test statistics indicates that the mean values of diagnosis from the orthodox system differ from those of the proposed system. Equivalence test was also conducted to investigate if the accuracy of the proposed system is as good as the orthodox system. The result of the equivalence test validates the accuracy of the proposed system. Finally, the evaluation of the proposed system showed a high efficiency for the possibility of malaria and malaria typhoid diagnosis. One limitation of the proposed system is related to the age of the sick person. The likelihood that children will not give accurate details of symptoms without a health professional doing physical investigation on the sick children is a clear limitation. In the future, the system will include physical examination category for sick children. Secondly, it is recommended that reinforcement learning be adapted to improve the optimality of the sequence alignment method.

Declarations

Author contribution statement

F. E. Ayo: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Wrote the paper. J. B. Awotunde: Performed the experiments. R. O. Ogundokun: Analyzed and interpreted the data. S. O. Folorunso, A. O. Adekunle: Contributed reagents, materials, analysis tools or data.

Funding statement

This research is fully sponsored by Landmark University Centre for Research and Development, Landmark University, Omu-Aran, Kwara State, Nigeria.

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.
  12 in total

1.  Outbreak investigation of ceftriaxone-resistant Salmonella enterica serotype Typhi and its risk factors among the general population in Hyderabad, Pakistan: a matched case-control study.

Authors:  Farah Naz Qamar; Mohammad Tahir Yousafzai; Muhammad Khalid; Abdul Momin Kazi; Heeramani Lohana; Sultan Karim; Ayub Khan; Aneeta Hotwani; Shahida Qureshi; Furqan Kabir; Fatima Aziz; Naveed Masood Memon; Mudassar Hussain Domki; Rumina Hasan
Journal:  Lancet Infect Dis       Date:  2018-12       Impact factor: 25.071

Review 2.  REVIEW-ARTICLE Bioinformatics: an overview and its applications.

Authors:  W J S Diniz; F Canduri
Journal:  Genet Mol Res       Date:  2017-03-15

3.  Antibiotic susceptibility and molecular characterization of Salmonella enterica serovar Paratyphi B isolated from vegetables and processing environment in Malaysia.

Authors:  Mustapha Goni Abatcha; Mohd Esah Effarizah; Gulam Rusul
Journal:  Int J Food Microbiol       Date:  2018-09-22       Impact factor: 5.277

4.  Computer programs to support clinical decision making.

Authors:  E H Shortliffe
Journal:  JAMA       Date:  1987-07-03       Impact factor: 56.272

5.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

6.  Multilayered rule-based expert system for diagnosing uveitis.

Authors:  A M Mutawa; Mariam A Alzuwawi
Journal:  Artif Intell Med       Date:  2019-07-02       Impact factor: 5.326

Review 7.  New tools and emerging technologies for the diagnosis of tuberculosis: part II. Active tuberculosis and drug resistance.

Authors:  Madhukar Pai; Shriprakash Kalantri; Keertan Dheda
Journal:  Expert Rev Mol Diagn       Date:  2006-05       Impact factor: 5.225

8.  Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou's general pseudo amino acid composition.

Authors:  Mandana Behbahani; Hassan Mohabatkar; Mokhtar Nosrati
Journal:  J Theor Biol       Date:  2016-09-08       Impact factor: 2.691

Review 9.  Malaria and blood transfusion: major issues of blood safety in malaria-endemic countries and strategies for mitigating the risk of Plasmodium parasites.

Authors:  Saleh Abdullah; Kaliyaperumal Karunamoorthi
Journal:  Parasitol Res       Date:  2015-11-04       Impact factor: 2.289

10.  Bacterial Foraging Optimization -Genetic Algorithm for Multiple Sequence Alignment with Multi-Objectives.

Authors:  P Manikandan; D Ramyachitra
Journal:  Sci Rep       Date:  2017-08-18       Impact factor: 4.379

View more
  1 in total

Review 1.  Review on Bovine Tuberculosis: An Emerging Disease Associated with Multidrug-Resistant Mycobacterium Species.

Authors:  Mohamed Borham; Atef Oreiby; Attia El-Gedawy; Yamen Hegazy; Hazim O Khalifa; Magdy Al-Gaabary; Tetsuya Matsumoto
Journal:  Pathogens       Date:  2022-06-21
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.