| Literature DB >> 35140745 |
Hao Wan1, Jina Zhang2, Yijie Ding3, Hetian Wang4, Geng Tian2.
Abstract
Immunoglobulins have a pivotal role in disease regulation. Therefore, it is vital to accurately identify immunoglobulins to develop new drugs and research related diseases. Compared with utilizing high-dimension features to identify immunoglobulins, this research aimed to examine a method to classify immunoglobulins and non-immunoglobulins using two features, FC* and GC*. Classification of 228 samples (109 immunoglobulin samples and 119 non-immunoglobulin samples) revealed that the overall accuracy was 80.7% in 10-fold cross-validation using the J48 classifier implemented in Weka software. The FC* feature identified in this study was found in the immunoglobulin subtype domain, which demonstrated that this extracted feature could represent functional and structural properties of immunoglobulins for forecasting.Entities:
Keywords: MRMD; autoprop; immunoglobulin classification; key feature extraction; machine learning
Year: 2022 PMID: 35140745 PMCID: PMC8819591 DOI: 10.3389/fgene.2021.827161
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Flowchart of identifying immunoglobulins.
FIGURE 2Classification accuracy comparison between models with different feature selection methods.
FIGURE 3Scatter plot of GC* and FC* features.
FIGURE 4Motif discovered among immunoglobulin sequences using the MEME tool; the height of the letter indicates its relative frequency at the given position within the motif.
FIGURE 5Shared motif and its secondary structure (from PDB entry 3wyr) using InterproScan.