Diabetes Prediction Using Logistic Regression Machine Learning Algorithm

Author:
Ramesh Prasad Bhatta, Dipendra Kumar Air
Assistant Professor , Central Department of CSIT, Far Western University, Nepal

DOI: doi.org/10.58924/rjmet.v4.iss6.p1

Published Date: 31-Dec, 2025

Keywords: Diabetes, Machine Learning, Prediction, Regression, Accuracy

Abstract:
Diabetes is a serious worldwide health issue that is becoming more of a problem in Nepal because of its high risk of death and other complications. This study develops an early prediction model using logistic regression, a widely applied machine learning classification technique in clinical research. The model was implemented in Python IDE with data from the Pima Indians Diabetes Database, which includes 768 patient records comprising eight independent features and one outcome variable. Exploratory data analysis was performed to extract insights and visualize trends in the dataset. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied, generating synthetic samples for the minority class. Model evaluation using a confusion matrix demonstrated satisfactory results, achieving an accuracy of 77%, precision of 75%, recall of 77%, and an F1-score of 76%. To further enhance performance, hyperparameter tuning was conducted using the grid search method. The model after grid search improved outcomes, reaching an accuracy of 82%. These findings suggest that logistic regression, supported by data preprocessing, resampling techniques, and hyperparameter optimization, can serve as an effective tool for early detection of diabetes, thereby supporting timely intervention and improved healthcare outcomes.

References:

1. Global Burden of Disease Collaborative Network, Global Burden of Disease Study 2021. Results, Institute for Health Metrics and Evaluation, 2024. https://vizhub.healthdata.org/gbd-results/

2. T. Panch, P. Szolovits, R. Atun, Artificial intelligence, machine learning, and health systems, Journal of Global Health, 8 (2018) 020303. https://doi.org/10.7189/jogh.08.020303

3. S. Uddin, A. Khan, M.E. Hossain, M. Rahman, et al., Comparing different supervised machine learning algorithms for disease prediction, BMC Medical Informatics and Decision Making, 19 (2019) 1–16.https://doi.org/10.1186/s12911-019-1004-8

4. A. Panesar, Machine Learning and AI for Healthcare, Springer, 2019. https://doi.org/10.1007/978-1-4842-3799-1

5. E. Erlin, Y.N. Marlim, Junadhi, L. Suryati, N. Agustina, Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm, Jurnal Nasional Teknik Elektro dan Teknologi Informasi, 11 (2022) 88–96. https://doi.org/10.22146/jnteti.v11i2.3586

6. Smith J. Pima Indians Diabetes Database. Kaggle; 2016. Available from: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database [Accessed 8 Sep 2025].

...
Journal: Research Journal of Multidisciplinary Engineering Technologies
ISSN(Online): 2945-4158
Publisher: Embar Publishers
Frequency: Bi-Monthly
Chief Editor: Dr. Osamah Ibrahim Khalaf
Language: English
Information
For Author
  Submit Manuscript