Literature DB >> 21346864

Early diagnosis of systemic lupus erythmatosus using ANN models of dsDNA binding antibody sequence data.

Mohamad Hasan Bahari, Mahmoud Mahmoudi, Asad Azemi, Mir Mojtaba Mirsalehi, Morteza Khademi.

Abstract

In this paper a new method based on artificial neural networks (ANN), is introduced for identifying pathogenic antibodies in Systemic Lupus Erythmatosus (SLE). dsDNA binding antibodies have been implicated in the pathogenesis of this autoimmune disease. In order to identify these dsDNA binding antibodies, the protein sequences of 42 dsDNA binding and 608 non-dsDNA binding antibodies were extracted from Kabat database and encoded using a physicochemical property of their amino acids namely Hydrophilicity. Encoded antibodies were used as the training patterns of a general regression neural network (GRNN). Simulation results show that the accuracy of proposed method in recognizing dsDNA binding antibodies is 83.2%. We have also investigated the roles of the light and heavy chains of anti-dsDNA antibodies in binding to DNA. Simulation results concur with the published experimental findings that in binding to DNA, the heavy chain of anti-dsDNA is more important than their light chain.

Entities: Disease Species

Keywords: Anti-dsDNA; Antibody; General Regression Neural Network (GRNN); Systemic Lupus Erythematosus

Year: 2010 PMID： 21346864 PMCID： PMC3039990 DOI： 10.6026/97320630005058

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Systemic Lupus Erythematosus (SLE or ‘lupus’) is a major autoimmune rheumatic disease where autoantibodies frequently target against intracellular antigens of the cell nucleus (double and single stranded DNA). According to the result of an experiment on more than five million U.S. Armed Forces personnel, in 115 of the 130 patients with SLE (88 percent), at least one SLE autoantibody tested was present 3.3 years before the diagnosis [1]. This fact suggests that SLE can be predicted several years before the diagnosis. In patients with SLE, a wide variety of antibodies against nuclear antigens can be found, including antibodies to nucleic acids, histones and nonhistone nuclear proteins. However, anti-dsDNA antibodies are considered the most important autoantiboies in SLE. These antibodies were identified, for the first time, nearly forty years ago in the serum of patients with SLE [2]. Based on similar findings, many researchers have concluded that there is a close relationship between disease activity and levels of anti-dsDNA antibodies. In other words, dsDNA binding antibodies have been implicated in the pathogenesis of this autoimmune disease [3]. Anti-dsDNA antibodies seem highly likely to cause tissue damage in patients with SLE. They are present in the serum of patients several years before its diagnosis. Therefore, their identification and analysis may be useful in the identification of patients who would benefit from early diagnosis, as well as patients who do not require further evaluation. In addition, a powerful method of their analysis may lead to design of new drugs that interfere with antibody―DNA interactions, which might have therapeutic applications. Recently, bioinformatics approaches, including statistical techniques and intelligent systems, have tried to obtain a better grasp of the DNA-binding structures and utilize them in an early prediction of autoimmune diseases such as SLE. Although these kinds of works have been able to explain many biological phenomena, lead to the development of mathematical models for prediction and lead to reduction in the cost of experimental research, there is still no accurate mathematical method for identifying dsDNA binding antibodies [4]. We can trace the roots of this difficulty in the long amino acid sequence of antibodies, lack of suitable mathematical tools, and more importantly the vast diversity of antibodies. In this paper, we extended the computer-assisted autoantibody analysis methods by proposing a new approach with the capability of identifying dsDNA binding antibodies. The proposed identification system is constructed using a General Regression Neural Network (GRNN). In the proposed method, we encode the amino acid sequence of antibodies using Hydrophilicity values of amino acids. These signals are fed to a GRNN. In order to better interpret the role of different parts of the antidsDNA antibodies, we have investigated the roles of light and heavy chains and their bindings to DNA. In general, the proposed method provides an efficient and accurate method to identify the dsDNA.

Methodology

Dataset

In order to use ANN to identify dsDNA binding antibodies the training set needs to include a set of anti-dsDNA, as the positive set, and a set of other non dsDNA binding antibodies, as the negative set. Our training data set was constructed using the Kabat database. Kabat antibody sequence database contains primary structure and sequence information on antibodies and other proteins of immunological interest [5]. There are 42 sequences of anti-dsDNA (both heavy and light chains) available in the Kabat database, which we used as positive patterns, and we took 608 of the other non dsDNA binding antibody sequences (both heavy and light chains) as negative patterns.

Encoding

As we know, Immunoglobulins are heavy plasma proteins, composed of 20 different amino acids. From a one-dimensional point of view, a protein sequence contains characters of these 20 amino acids. In other words, function of a protein depends on its specific amino acid sequence called the primary sequence structure [6][8]. An important issue in applying computer-based systems to identify dsDNA binding antibodies is how to encode protein sequences; i.e., how to represent the protein sequences as the input of a system. This is crucial to the success of neural network learning process. Several different methods have been used for encoding proteins. For example, Wu [9] has utilized the 2-gram encoding method, which extracts various patterns of two consecutive amino acid residues in a protein sequence and counts the number of occurrences of the extracted residue pairs. Huang [10] has employed another encoding technique using six physiochemical properties (attributes) of amino acids, namely, composition, predicted secondary structure, hydrophobicity, normalized Van Der Waals volume, polarity, and polarizability. In fact, amino acids have many physiochemical properties such as isoelectric pH, surface area, hydrophilicity, polarity and the ability of reaction between the ion and the electron [11][14]. In this work, we have used just one physiochemical property of amino acids, which called Hydrophilicity, for encoding the protein sequence. In other words, the normalized number of the Hydrophilicity of the amino acid is used instead of each amino acid in the protein sequence. Consequently, each antibody was transformed into an array, which is used as the input of the GRNN.

Evaluation Criterion

As mentioned, new method based on ANNs, is introduced for identifying pathogenic antibodies in SLE. The efficacy of the proposed method is evaluated by calculating the following three parameters: positive accuracy, negative accuracy, and general accuracy. ( supplementary material)

Anti-dsDNA Identifier

As mentioned, for the simulation a GRNN was used. For training and testing of the proposed system, we have formed a database, from the KABAT database, consisting of 42 anti-dsDNAs, as the positive data set, and 608 other antibodies, as the negative data set.

A. Training Phase

To train the GRNN, a training set is formed by randomly choosing 33 positive and 99 negative samples from our database. Figure 1a illustrates the general architecture of the system in training phase. In the first layer, each antibody is encoded into an array, based on normalized quantities of the amino acids, Hydrophilicity. In the second layer, a GRNN is trained by arrays obtained from the previous layer so that +1 is assigned as the output for positive samples and -1 is assigned as the output for negative samples.

Figure 1

(a) Architecture of the of dsDNA binding antibodies identification system in training phase. (b) Architecture of dsDNA binding antibodies identification system in testing phase.

B. Testing Phase

To test the proposed scheme, a testing set is formed by randomly choosing 9 positive and 27 negative samples from the part of our database that was not used in the training set. Architecture of dsDNA binding antibodies identification system in testing phase is illustrated in Figure 1b. In the first layer, like the training phase, each antibody is encoded into a discrete signal, based on normalized quantities of the amino acids Hydrophilicity property. In the second layer, signals from the previous step are fed to the neural network. The output of each network will be a number between 1 and +1. By considering a threshold number between 1 and +1, we can separate the anti-dsDNAs from other antibodies. This threshold was selected to be 0.9.

Discussion

It is important to note that no mathematical method has been introduced for identifying anti-dsDNA antibodies yet. Therefore, we could not compare the effectiveness of proposed method with any other introduced scheme. In order to assure the validity of the results, the training, and testing phases of the proposed system were repeated 100 times and the final results were calculated by averaging over all 100 runs. Positive Accuracy, Negative Accuracy and General Accuracy obtained using the proposed methods are 78.55, 82.92 and 81.83, respectively. For investigating the roles of light and heavy chains of anti-dsDNA antibodies in binding to DNA, an experiment was also designed. Heavy chain and light chain of antibodies were extracted from the Kabat database. Next, we repeated the last simulation using heavy and light chains. Table 1 contains these results ( supplementary material). As illustrated in Table 1 ( supplementary material), the heavy chain of anti-dsDNA is more important than the light chain in binding to DNA. Our simulation results confirm the experimental studies [16,17]. Results indicated in Table 1 show that the proposed method, using heavy and light chains, provides a more accurate identification of anti-dsDNA than the heavy chain or light chain alone. The reason is that by considering a more inclusive model, we are including more information and therefore, we should expect better results. Results of Table I also indicate higher numbers for negative accuracies. This is because we have more negative (non anti-dsDNA) samples. Deleted We should mention that in this work negative accuracy (non-dsDNA binding antibodies) is more important than positive accuracy. Furthermore, the available data corresponding to the protein sequence for non-dsDNA binding antibodies are much more available than their counterparts. Using unequal number of data points for non-dsDNA and dsDNA during training will introduce some bias, but as it was mentioned since the negative accuracy is more important this action is justified. Simulations using equal number of data points for non-dsDNA and dsDNA produced much lower negative and general accuracy.

Conclusion

In this paper, we have introduced a new method for identifying pathogenic antibodies in SLE based on GRNN. For identification of dsDNA binding antibodies, the protein sequence of 42 dsDNA binding and 608 nondsDNA binding antibodies were extracted from the KABAT database. Next, they were encoded using Hydrophilicity values of their amino acids. Coded antibodies were used to train a GRNN. The simulation results indicate that the proposed method is very accurate in recognizing antidsDNA antibodies. We have also investigated the roles of light and heavy chains of anti-dsDNA antibodies in binding to DNA. Our simulation results confirmed the experimental findings that the heavy chain is more important than the light chain in regard to binding to DNA.

15 in total

1. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

Authors: Chuen-Der Huang; Chin-Teng Lin; Nikhil Ranjan Pal
Journal: IEEE Trans Nanobioscience Date: 2003-12 Impact factor: 2.935

2. A general regression neural network.

Authors: D F Specht
Journal: IEEE Trans Neural Netw Date: 1991

3. Protein classification artificial neural system.

Authors: C Wu; G Whitson; J McLarty; A Ermongkonchai; T C Chang
Journal: Protein Sci Date: 1992-05 Impact factor: 6.725

4. The arrangement of amino acids in proteins.

Authors: F SANGER
Journal: Adv Protein Chem Date: 1952

5. Accessing the Kabat antibody sequence database by computer.

Authors: A C Martin
Journal: Proteins Date: 1996-05

Review 6. The role of antibodies to DNA in systemic lupus erythematosus--a review and introduction to an international workshop on DNA antibodies held in London, May 1996.

Authors: D A Isenberg; C T Ravirajan; A Rahman; J Kalsi
Journal: Lupus Date: 1997 Impact factor: 2.911

7. A simple method for displaying the hydropathic character of a protein.

Authors: J Kyte; R F Doolittle
Journal: J Mol Biol Date: 1982-05-05 Impact factor: 5.469

8. Human autoantibody recognition of DNA.

Authors: S M Barbas; H J Ditzel; E M Salonen; W P Yang; G J Silverman; D R Burton
Journal: Proc Natl Acad Sci U S A Date: 1995-03-28 Impact factor: 11.205

9. Development of autoantibodies before the clinical onset of systemic lupus erythematosus.

Authors: Melissa R Arbuckle; Micah T McClain; Mark V Rubertone; R Hal Scofield; Gregory J Dennis; Judith A James; John B Harley
Journal: N Engl J Med Date: 2003-10-16 Impact factor: 91.245

Review 10. Serological markers of disease activity in systemic lupus erythematosus.

Authors: P E Spronk; P C Limburg; C G Kallenberg
Journal: Lupus Date: 1995-04 Impact factor: 2.911

1 in total

1. Application of Various Statistical Models to Explore Gene-Gene Interactions in Folate, Xenobiotic, Toll-Like Receptor and STAT4 Pathways that Modulate Susceptibility to Systemic Lupus Erythematosus.

Authors: Yedluri Rupasree; Shaik Mohammad Naushad; Ravi Varshaa; Govindaraj Swathika Mahalakshmi; Konda Kumaraswami; Liza Rajasekhar; Vijay Kumar Kutala
Journal: Mol Diagn Ther Date: 2016-02 Impact factor: 4.074

1 in total