Comparison of machine learning techniques used in type II diabetes risk prediction

Show simple item record

dc.contributor.author Kaluarachchi, K.N.
dc.contributor.author Premachandra, K.P.
dc.contributor.author Dissanayake, R.B.N.
dc.date.accessioned 2023-02-10T03:35:35Z
dc.date.available 2023-02-10T03:35:35Z
dc.date.issued 2023-01-18
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10994
dc.description.abstract Native Americans living in Arizona called PIMA have several medical problems such as diabetes. Diabetes prediction of females in this population has been done using various machine learning techniques. The objective of this study is to compare eight supervised machine learning models to identify the best algorithm with a low bias-variance trade-off for the diagnosis of type II diabetes among female PIMA population. Eight prediction models namely logistic regression, decision tree, random forest, naïve bayes, k-nearest neighbor, support vector machine, gradient boosting, and artificial neural network (ANN) were developed for type II diabetes using the data published by the National Institute of Diabetes, Digestive and Kidney Diseases in the USA (PIMA Indian Diabetes Dataset). Among the 768 patient records, 430 (50% with diabetes and 50% without diabetes) were used to train the models to reduce data biasness, and the remaining 338 records were used for testing. The performance of each model was evaluated and compared using testing accuracy, mean squared error (MSE), sensitivity, precision, and F1-score. The results showed that the random forest model has the highest testing accuracy of 83.12% and the lowest MSE. This result shows that most significant predictor variables are number of pregnancies, insulin level, BMI level, and age. The ANN model achieved the highest MSE, due to the limited number of training data. Therefore, the random forest model with number of 50 subtrees is the most accurate machine learning model that can be used to diagnose type II diabetes in the PIMA Indian Diabetes Dataset. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Machine Learning en_US
dc.subject Testing Accuracy en_US
dc.subject Types II Diabetes Prediction en_US
dc.subject PIMA Indian en_US
dc.title Comparison of machine learning techniques used in type II diabetes risk prediction en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account