Integrating RF-GBDT: Optimizing Machine Learning Techniques for Diabetic Prediction Model
DOI:
https://doi.org/10.52783/jns.v14.2903Abstract
Diabetes is a persistent medical condition that has seen a significant rise in incidence in recent years. Because precise datasets are necessary for early prognosis, this presents complications. Big data plays a significant role in predicting diabetes by examining enormous volumes of health-related information. Through advanced analytics, algorithms can identify patterns, risk factors, and correlations in data such as patient demographics, medical history, genetic markers, lifestyle factors, and biomarkers. This research introduces a diabetes prediction model, RF-GBDT classifier, tailored to identify potentially effective peptides against diabetes. Combining sequence data with the Random Forest – Optimized Gradient Boosting Decision Trees (GBDT) framework, RF-GBDT aims to improve the accuracy of antidiabetic peptide prediction. Results demonstrate the model's remarkable performance with an accuracy of 99.8% and an AUC of 95.2%. Furthermore, feature selection techniques streamline prediction times without compromising classifier accuracy. These findings, comparative to existing studies, affirm the efficacy of the proposed method, positioning it as a valuable adjunctive tool in diabetes diagnosis.
Downloads
Metrics
References
A. Misra, H. Gopalan, R. Jayawardena, A. P. Hills, M. Soares, A. A. Reza-Albarrán, et al., "Diabetes in developing countries", J. Diabetes, vol. 11, pp. 522-539, Mar. 2019.
Eswari, T., Sampath, P. and Lavanya, S.J.P.C.S., 2015. Predictive methodology for diabetic data analysis in big data. Procedia Computer Science, 50, pp.203-208.
Fatima, M.; Pasha, M. Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl. 2017, 9, 1.
Chang, V., Bailey, J., Xu, Q.A. and Sun, Z., 2023. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), pp.16157-16173.
Uddin, M.A., Islam, M.M., Talukder, M.A., Hossain, M.A.A., Akhter, A., Aryal, S. and Muntaha, M., 2023. Machine learning based diabetes detection model for false negative reduction. Biomedical Materials & Devices, pp.1-17.
Khaleel, F.A. and Al-Bakry, A.M., 2023. Diagnosis of diabetes using machine learning algorithms. Materials Today: Proceedings, 80, pp.3200-3203.
Aslan, M.F. and Sabanci, K., 2023. A novel proposal for deep learning-based diabetes prediction: converting clinical data to image data. Diagnostics, 13(4), p.796.
Tasin, I., Nabil, T.U., Islam, S. and Khan, R., 2023. Diabetes prediction using machine learning and explainable AI techniques. Healthcare technology letters, 10(1-2), pp.1-10.
A. Mir and S. N. Dhage, "Diabetes Disease Prediction Using Machine Learning on Big Data of Healthcare," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 2018, pp. 1-6, doi:10.1109/ICCUBEA.2018.8697439.
Rastogi, R. and Bansal, M., 2023. Diabetes prediction model using data mining techniques. Measurement: Sensors, 25, p.100605.
Smith, J.W. , Everhart, J.E. , Dickson, W.C. , Knowler, W.C. , Johannes, R.S. (1998) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Annual Symposium on Computer Applications in Medical Care pp. 261–265.
Xu, Z.; Wang, Z. A risk prediction model for type 2 diabetes based on weighted feature selection of random forest and xgboost ensemble classifier. In Proceedings of the 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), Guilin, China, 7–9 June 2019; pp. 278–283.
Al Shalabi, L.; Shaaban, Z. Normalization as a preprocessing engine for data mining and the approach of preference matrix. In Proceedings of the 2006 International Conference on Dependability of Computer Systems, SzklarskaPoreba, Poland, 25–27 May 2006; pp. 207–214.
Lee, T.-Y., Chen, S.-A., Hung, H.-Y., and Ou, Y. Y. (2011). Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. Plos one 6 (3), e17331. doi:10.1371/journal.pone.0017331
Li, K., Yao, S., Zhang, Z., Cao, B., Wilson, C. M., Kalos, D., et al. (2022). Efficient gradient boosting for prognostic biomarker discovery. Bioinformatics 38 (6), 1631–1638. doi:10.1093/bioinformatics/btab869.
J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for diabetes prediction,” ICT Express, vol. 7, no. 4, pp. 432–439, 2021.
El-Jerjawi NS, Abu-Naser SS. Diabetes prediction using artificial neural network. International Journal of Advanced Science and Technology. 2018;121:55–64.
Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques, IEEE Access. IEEE. 2019;7:1365–75. 10.1109/ACCESS.2018.2884249.
Swapna G, Vinayakumar R, Soman KP. Diabetes detection using deep learning algorithms. ICT Express. 2018;4(4):243–246. doi: 10.1016/j.icte.2018.10.005.
Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. 2008 IEEE/ACS International Conference on Computer Systems and Applications 2008;108–15. 10.1109/AICCSA.2008.4493524.
Huang CL, Chen MC, Wang CJ. Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl. 2007;33(4):847–856. doi: 10.1016/j.eswa.2006.07.007.
Jackins V, Vimal S, Kaliappan M, Lee MY. AI-based smart prediction of clinical disease using random forest classifier and naive Bayes. J Supercomput. 2021;77:5198–5219. doi: 10.1007/s11227-020-03481-x.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Le NQK, Do DT, Nguyen T-T-D, Le QA. A sequence-based prediction of Kruppel-like factors proteins using XGBoost and optimized features. Gene. 2021;787:145643. doi: 10.1016/j.gene.2021.145643
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 N. P. Jayasri, R. Aruna, S. Ravikumar, T. Thilagam

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.