Literature DB >> 30666315

Dataset on significant risk factors for Type 1 Diabetes: A Bangladeshi perspective.

Sayed Asaduzzaman1,2, Fuyad Al Masud1, Touhid Bhuiyan1, Kawsar Ahmed2, Bikash Kumar Paul2, S A M Matiur Rahman1.   

Abstract

In this article, dataset and detailed data analysis results of Type-1 Diabetes has been given. Now-a-days Type-1 Diabetes is an appalling disease in Bangladesh. Total 306 person data (Case group- 152 and Control Group- 154) has been collected from Dhaka based on a specific questioner. The questioner includes 22 factors which were extracted by research studies. The association and significance level of factors has been elicited by using Data mining and Statistical Approach and shown in the Tables of this article. Moreover, parametric probability along with decision tree has been formed to show the effectiveness of the data was provided. The data can be used for future work like risk prediction and specific functioning on Type-1 Diabetes.

Entities:  

Keywords:  Analysis of data; Bangladesh perspective; Data of significant factors; Dataset on Type-1 Diabetes

Year:  2018        PMID: 30666315      PMCID: PMC6205358          DOI: 10.1016/j.dib.2018.10.018

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table Value of the data This data can be used at research in Type-1 Diabetes for Bangladeshi perspective. The size of data can be extended by the factors in which data is collected Provided data can be used in not only significance analysis but also in risk prediction functioning. These data introduced new approach of risk factor prediction and finding the significance level among factors as well as sub factors. Analyzed Dataset of both Data Mining and Statistical approach illustrates the comparison effect and realistic outcome of the research.

Data

Data provided in this article based on different factors among Type-1 Diabetes. Table 1, Table 2 Table 3 and Table 4 shows the significance level of Factors according to Info Gain, Gain Ratio, Gini Index and Chi-square (χ2)– Test. Table 1 illustrates the significance among the factors according to the analysis whereas Table 2, Table 3 and Table 4 also shows the significance level of sub factors like (Symptoms, Family history of Type-1 and Type-2 Diabetes). Table 5 shows the key factors on data analysis. Table 6 shows the Correlation among the significant factors which describes the dependency among the factors. P values and 95% C.I is shown in Table 7 which shows the significant factors. The factors whose P value is > 0.05 is significant and is shown in the table. Table 8 depicts the probability of Type-1 Diabetes according to data. The probability are shown among the factors and sub factors which leads to conclude effectiveness of those sub factors in Type-1 Diabetes.
Table 1

Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-test.

RankFactorsInfo. gainGain rationGiniχ2- Test
1HbA1c0.5200.5220.284111.447
2Hypoglycemia0.4640.5060.253103.342
3Age0.2860.1540.17992.146
4Pancreatic disease affected in child0.3210.3860.16777.000
5Area of Residence0.2100.1360.13645.003
6Education of Mother0.1230.1290.08218.491
7Adequate Nutrition0.1570.1870.10016.361
8Autoantibodies0.2430.3340.12915.961
9Sex0.0610.0610.04111.843
10Family History affected in Type-1 Diabetes0.0310.0350.0219.081
11Family History affected in Type-2 Diabetes0.0190.0190.0134.434
12Standardized growth rate infancy0.0540.0740.0332.741
13Standardized birth weight0.0960.1220.0520.517
14Impaired glucose metabolism0.0010.0010.0000.226
Table 2

Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-test (family history in Type-1 Diabetes).

Family History in Type-1 DiabetesInfo. gainGain ratioGiniχ2-Test
Mother0.0260.0580.0179.354
Father׳s Heredity0.0220.0470.0158.211
Mother׳s Heredity0.0060.0120.0042.309
Father0.0010.0040.0010.514
Table 3

Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-Test (family history in Type-2 Diabetes).

Family History in Type-2 DiabetesInfo. gainGain ratioGiniχ2-Test
Mother0.0330.0890.02111.847
Father׳s Heredity0.0070.0090.0052.217
Father0.0030.0050.0021.027
Mother׳s Heredity0.0010.0010.0010.290
Table 4

Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-Test (different symptoms).

SymptomsInfo. gainGain ratioGiniχ2-Test
Frequent Urination0.6680.6810.364129.684
Increased thirst0.6680.6810.364129.684
Fatigue and Weakness0.5730.5970.314118.539
Unintended weight loss0.5050.5400.276109.421
Extreme Hunger0.4450.4900.242100.303
Table 5

Comparative result dataset of factors using different algorithms.

Ranker AlgorithmBestFirst / Greedy Stepwise Algorithm
HbA1cAge
HypoglycemiaSex
pancreatic disease affected in childArea of Residence
AgeHbA1c
AutoantibodiesAdequate Nutrition
Area of ResidenceStandardized growth-rate in infancy
Adequate NutritionAutoantibodies
Education of MotherFamily History affected in Type 1 Diabetes
Standardized birth weightHypoglycemis
Sexpancreatic disease affected in child
Standardized growth-rate in infancyN/A
Family History affected in Type 1 DiabetesN/A
Family History affected in Type 2 DiabetesN/A
Impaired glucose metabolismN/A
Table 6

Correlation data among factors using Apriori Algorithm.

NoCorrelation
1Standardized growth-rate in infancy (Middle quartiles pancreatic disease affected in child) ==> Standardized birth weight Middle quartiles
2Autoantibodies pancreatic disease affected in child ==> Standardized birth weight Middle quartile
3Adequate Nutrition (Yes)- Standardized growth-rate in infancy (Middle quartiles) ==> Standardized birth weight (Middle quartiles)
4pancreatic disease affected in child =No 230 ==> Standardized birth weight=Middle quartiles 217 <conf:(0.94)> lift:(1.09) lev:(0.06) [18] conv:(2.25)
5Adequate Nutrition (Yes) ==> Standardized birth weight (Middle quartiles)
6Hypoglycemis (No) ==> Standardized birth weight (Middle quartiles)
7. Hypoglycemis (No) ==> pancreatic disease affected in child (No)
8Standardized growth-rate in infancy (Middle quartiles) Autoantibodies (Yes) ==> Standardized birth weight (Middle quartiles)
9Hypoglycemis ==> Autoantibodies
10Standardized growth-rate in infancy (Middle quartiles) Impaired glucose metabolism==> Standardized birth weight (Middle quartiles)
Table 7

P value and confidence interval of risk factors in Type-1 Diabetes dataset.

FactorsP-value95% C. I for Odds ratio
LowerUpper
Age0.000*0.26330.4884
Less than 5
Less than 11
Less than 15
Greater than 15
Sex0.000*0.11110.2235
Male
Female
Area of Residence0.000*0.14890.3162
Rural
Urban
Suburban
Height0.6650.2450.0384
Weight0.9961.880.1.89
BMI0.9960.700.70
Adequate Nutrition0.0080.01730.1163
Yes
No
Education of Mother0.9990.05440.0544
Yes
No
Standardized growth-rate infancy0.9990.2510.251
Lowest quartile
Middle quartile
Highest quartile
Family History in Type-1 Diabetes0.000*0.45220.5550
Father
Mother
Father׳s Heredity
Mother׳s Heredity
Family History in Type-2 Diabetes0.000*0.18640.2986
Father
Mother
Father׳s Heredity
Mother׳s Heredity

Significant Factors

Table 8

Data for probabilities and effectiveness of factors in Type-1 Diabetes.

NoFactorsSubfactorsProbabilitiesEffectiveness
1AgeGreater then 150.88High
Less Than 150.42Moderate
Less than 110.2Low
Less than 50.18Very Low
2HBA1cLess than 7.50.21Low
Greater than 7.50.72High
3HypoglycemisYes0.69High
No0.27Low
4Pancreatic Diseases diagnosed in affected childsYes0.5Moderate
No0.31Low
5Area of ResidenceRural0.82High
Suburban0.65Moderate
Urban0.22Low
6Adequate NutritionNo0.86High
Yes0.36Low
7AutoantibodiesNo0.4Moderate
Yes0.38Moderate
8SexFemale0.65High
Male0.36Low
9Family History type 1 DiabetesYes0.68High
No0.41Low
10Family History type 2 DiabetesYes0.59High
No0.44Low
11Standard Growth RateLowest0.96High
Height0.72Moderate
Middle0.45Low
Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-test. Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-test (family history in Type-1 Diabetes). Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-Test (family history in Type-2 Diabetes). Data table on significance of factors according to Info Gain, Gain Ratio, Gini Index and χ2-Test (different symptoms). Comparative result dataset of factors using different algorithms. Correlation data among factors using Apriori Algorithm. P value and confidence interval of risk factors in Type-1 Diabetes dataset. Significant Factors Data for probabilities and effectiveness of factors in Type-1 Diabetes.

Methodology of data analysis

Type 1 Diabetes is now a concerning factor that is increasing at an alarming rate in low incoming country like Bangladesh. The increase in Blood glucose level (Hypoglycemia) causes Type-1 Diabetes in childhood [1]. Work on dataset of Type-1 Diabetes [2] in different regions of the world has been done in recent years [3]. In this paper, dataset on Type-1 Diabetes has been provided for Low incoming country like Bangladesh.

Data collection and preprocessing

Data of Type-1 Diabetes was collected from Different Hospitals and Diagnostic center from Dhaka, Bangladesh. The Data collection process was done by following a questioner. The questioners have been formed by previous research studies and discussion with medical persons. Both Case (Affected) and Control (Unaffected) group data was collected for both male and female. The total data size is 306 where 152 was affected (Case) and 154 was unaffected (control) groups. The total 22 Factors (like Age, Sex, Area of residence, Education of Mother, Hba1c, BMI) was considered in account to collect fruitful data. After data collection there may be some inconsistent, missing and uncategorized data. Data preprocessing or so called data cleaning has been done using a Data preprocessing Feature of WEKA (A data Mining Tool). In previous studies [4] data is also preprocessed for future action.

Data mining approach

To find significant factors two Data mining tools Orange and WEKA was used. Probability of sub factors, χ2-Test, Info gain etc was done by Orange. WEKA was used for algorithm based analysis. WEKA was also used to find correlation among the factors using Apriori Algorithm. By these procedures the significance level among the factors are explored on the Dataset.

Statistical approach

Statistical approach has been used to find significance and correlation in article [5]. We have used SPSS V20.0 to find out the P-Value and Confidence Interval. By P value the significant factors can easily be defined from the dataset.

Significance formulation

Factors like Hypoglycemia (increase glucose level) and Insulin are key factors for Type-1 Diabetes [6], [7]. By all the data and Tables from the dataset the final decision tree can be formed. By the decision tree we can easily describe whether one person is affected or not. Disease Risk prediction and its analysis on dataset for different disease has been done before by Ahmed et al. in [8]. Fig. 1, Fig. 2, Fig. 3, Fig. 4 shows the detailed analysis results of data. The analysis was done using WEKA and Orange two different and powerful Algorithm based Data Mining Software. The outcome results and its data shows the risk factors and its significance to detect Type 1 Diabetes.
Fig. 1

Data on 2-D view of probability distribution of the age with respect to affected group.

Fig. 2

3-D visualization of the analyzed dataset and data distribution for BMI, height and weight.

Fig. 3

Visualization of parameters and its outcomes of dataset.

Fig. 4

Decision tree among the factors of Type-1 Diabetes.

Data on 2-D view of probability distribution of the age with respect to affected group. 3-D visualization of the analyzed dataset and data distribution for BMI, height and weight. Visualization of parameters and its outcomes of dataset. Decision tree among the factors of Type-1 Diabetes.

Financial support

There is no financial support for this research.
Subject areaBiology
More specific subject areaSignificant Risk Factors analysis from Data of Type 1 Diabetes using Statistical and Data Mining Approach.
Type of dataTable, figure, Raw Dataset
How data was acquiredSurvey, Questioner
Data formatRaw, analyzed
Data source locationFrom different hospitals and diagnostic center in Dhaka, Bangladesh.
Data accessibilityData is within this article
  1 in total

1.  Artificial Flora Algorithm-Based Feature Selection with Gradient Boosted Tree Model for Diabetes Classification.

Authors:  Nagaraj P; Deepalakshmi P; Romany F Mansour; Ahmed Almazroa
Journal:  Diabetes Metab Syndr Obes       Date:  2021-06-21       Impact factor: 3.168

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.