Deep Neural Network Approach for Early-Stage Diabetes Risk Prediction using Hybrid SMOTE-ENN and GAN with SHAP-Based Feature Explanations
DOI:
https://doi.org/10.63682/jns.v14i31S.7307Keywords:
SMOTE-ENN, Generative Adversarial Networks, SHAP, Feature selection, Symptom-based screening, Regional health data, Interpretability, Neural networksAbstract
Type 2 diabetes mellitus (T2DM) is prevalent in India and remains a major global health concern. Timely recognition of type 2 diabetes is essential for successful management, especially in resource-constrained environments where access to standard laboratory testing might be restricted. This research introduces a deep learning framework aimed at predicting early-stage diabetes, emphasizing clarity and practical application in clinical environments through the use of non-invasive, symptom-based inputs. The Southern India Diabetes Dataset (SIDD) is a notable regional collection, comprising 1,680 ethically sourced patient records along with 17 clinical and demographic variables. A hybrid augmentation strategy was employed to address the class imbalance, incorporating SMOTE-ENN in aggregation with Generative Adversarial Networks (GANs). Furthermore, SHAP (SHapley Additive exPlanations) values were utilized to identify essential predictive features, enhancing the model interpretability. We employed and assessed two neural architectures: the Radial Basis Function Neural Network (RBFNN) and the Deep Neural Network (DNN). The proposed DNN model achieved a test accuracy of 98%, surpassing the performance of models trained on standard datasets such as PIMA. The proposed framework shows significant potential for application in essential healthcare settings, due to its incorporation of interpretable artificial intelligence, strong augmentation, and pertinent clinical data. This will facilitate prompt intervention and improve patient outcomes.
Downloads
Metrics
References
Anjana, R. M., Unnikrishnan, R., Deepa, M., Pradeepa, R., Tandon, N., Das, A. K., ... & Ghosh, S. (2023). Metabolic non-communicable disease health report of India: the ICMR-INDIAB national cross-sectional study (ICMR-INDIAB-17). The Lancet Diabetes & Endocrinology, 11(7), 474-489.
Ramachandran, A., Snehalatha, C., & Ma, R. C. W. (2014). Diabetes in south-east Asia: An update. Diabetes research and clinical practice, 103(2), 231-237.
Anjana, R. M., Pradeepa, R., Deepa, M., Datta, M., Sudha, V., Unnikrishnan, R., ... & ICMR–INDIAB Collaborative Study Group. (2011). Prevalence of diabetes and prediabetes (impaired fasting glucose and/or impaired glucose tolerance) in urban and rural India: Phase I results of the Indian Council of Medical Research–INdia DIABetes (ICMR–INDIAB) study. Diabetologia, 54, 3022-3027.
Anjana, R. M., Deepa, M., Pradeepa, R., Mahanta, J., Narain, K., Das, H. K., ... & Yajnik, C. S. (2017). Prevalence of diabetes and prediabetes in 15 states of India: results from the ICMR–INDIAB population-based cross-sectional study. The lancet Diabetes & endocrinology, 5(8), 585-596.
Kumar, A., Gangwar, R., Ahmad Zargar, A., Kumar, R., & Sharma, A. (2024). Prevalence of diabetes in India: A review of IDF diabetes atlas 10th edition. Current diabetes reviews, 20(1), 105-114.
Rooney, M. R., Fang, M., Ogurtsova, K., Ozkan, B., Echouffo-Tcheugui, J. B., Boyko, E. J., ... & Selvin, E. (2023). Global prevalence of prediabetes. Diabetes Care, 46(7), 1388-1394.
Ortiz-Martínez, M., González-González, M., Martagón, A. J., Hlavinka, V., Willson, R. C., & Rito-Palomares, M. (2022). Recent developments in biomarkers for diagnosis and screening of type 2 diabetes mellitus. Current diabetes reports, 22(3), 95-115.
Zhang, X., Qian, B., Li, Y., Cao, S., & Davidson, I. (2021). Context-aware and time-aware attention-based model for disease risk prediction with interpretability. IEEE Transactions on Knowledge and Data Engineering, 35(4), 3551-3562.
Shahin, O. R., Alshammari, H. H., Alzahrani, A. A., Alkhiri, H., & Taloba, A. I. (2023). A robust deep neural network framework for the detection of diabetes. Alexandria Engineering Journal, 74, 715-724.
Prabhu, P., & Selvabharathi, S. (2019, July). Deep belief neural network model for prediction of diabetes mellitus. In 2019 3rd international conference on imaging, signal processing and communication (ICISPC) (pp. 138-142). IEEE.
Panigrahy, S., Dash, S., & Padhy, S. (2024). Optimized Deep Belief Networks Based Categorization of Type 2 Diabetes using Tabu Search Optimization. International Journal of Advanced Computer Science & Applications, 15(3).
Lang, L. Y., Gao, Z., Wang, X. G., Zhao, H., Zhang, Y. P., Sun, S. J., ... & Austria, R. S. (2021). Diabetes prediction model based on deep belief network. Journal of Computational Methods in Science and Engineering, 21(4), 817-828.
Liu, Y., Zhao, Z., Wang, J., Li, A., & Zhang, J. (2019). Research on Diabetes Management Strategy Based on Deep Belief Network. In Wireless and Satellite Systems: 10th EAI International Conference, WiSATS 2019, Harbin, China, January 12–13, 2019, Proceedings, Part II 10 (pp. 177-186). Springer International Publishing.
Reddy, S. S., Sethi, N., & Rajender, R. (2020, July). Evaluation of deep belief network to predict hospital readmission of diabetic patients. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 5-9). IEEE.
Ma, S., Yusoh, M., & Pongsiri, N. (2023). Predicting Diabetes using Deep Belief Network. Pridiyathorn Science Journal, 2(1), 24-32.
Olabanjo, O., Wusu, A., & Mazzara, M. (2023). Deep Unsupervised Machine Learning for Early Diabetes Risk Prediction using Ensemble Feature Selection and Deep Belief Neural Networks.
Nadesh, R. K., & Arivuselvan, K. (2020). Type 2: diabetes mellitus prediction using deep neural networks classifier. International Journal of Cognitive Computing in Engineering, 1, 55-61.
Ashiquzzaman, A., Tushar, A. K., Islam, M. R., Shon, D., Im, K., Park, J. H., ... & Kim, J. (2018). Reduction of overfitting in diabetes prediction using deep learning neural network. In IT Convergence and Security 2017: Volume 1 (pp. 35-43). Springer Singapore.
Munkhdalai, L., Munkhdalai, T., & Ryu, K. H. (2020). GEV-NN: A deep neural network architecture for class imbalance problem in binary classification. Knowledge-Based Systems, 194, 105534.
Alparslan, Y., Moyer, E. J., Isozaki, I. M., Schwartz, D., Dunlop, A., Dave, S., & Kim, E. (2021, July). Towards searching efficient and accurate neural network architectures in binary classification problems. In 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
Lowe, D., & Broomhead, D. (1988). Multivariable functional interpolation and adaptive networks. Complex systems, 2(3), 321-355.
Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural computation, 1(2), 281-294.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
Van den Broeck, G., Lykov, A., Schleich, M., & Suciu, D. (2022). On the tractability of SHAP explanations. Journal of Artificial Intelligence Research, 74, 851-886.
Sun, J., Sun, C. K., Tang, Y. X., Liu, T. C., & Lu, C. J. (2023, July). Application of SHAP for explainable machine learning on age-based subgrouping mammography questionnaire data for positive mammography prediction and risk factor identification. In Healthcare (Vol. 11, No. 14, p. 2000). MDPI.
Arslan, Y., Lebichot, B., Allix, K., Veiber, L., Lefebvre, C., Boytsov, A., ... & Klein, J. (2022, August). Towards refined classifications driven by shap explanations. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction (pp. 68-81). Cham: Springer International Publishing.
Priyadarshinee, S., & Panda, M. (2024, March). Optimizing Diabetes Risk Prediction: Metaheuristic-Driven Tuning of Deep Neural Networks. In 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU) (pp. 1-6). IEEE.
Ashiquzzaman, A., Tushar, A. K., Islam, M. R., Shon, D., Im, K., Park, J. H., ... & Kim, J. (2018). Reduction of overfitting in diabetes prediction using deep learning neural network. In IT Convergence and Security 2017: Volume 1 (pp. 35-43). Springer Singapore.
Early Stage Diabetes Risk Prediction [Dataset]. (2020). UCI Machine Learning Repository. DOI: 10.24432/C5VG8H
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.