Comparison of machine learning techniques used in type II diabetes risk prediction

Kaluarachchi, K.N.; Premachandra, K.P.; Dissanayake, R.B.N.

IRUOR Home
→
Scholarly Publications
→
Conference and Symposia Proceedings
→
Ruhuna International Science and Technology Conference
→
RISTCON 2023
→
View Item

dc.contributor.author	Kaluarachchi, K.N.
dc.contributor.author	Premachandra, K.P.
dc.contributor.author	Dissanayake, R.B.N.
dc.date.accessioned	2023-02-10T03:35:35Z
dc.date.available	2023-02-10T03:35:35Z
dc.date.issued	2023-01-18
dc.identifier.issn	1391-8796
dc.identifier.uri	http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10994
dc.description.abstract	Native Americans living in Arizona called PIMA have several medical problems such as diabetes. Diabetes prediction of females in this population has been done using various machine learning techniques. The objective of this study is to compare eight supervised machine learning models to identify the best algorithm with a low bias-variance trade-off for the diagnosis of type II diabetes among female PIMA population. Eight prediction models namely logistic regression, decision tree, random forest, naïve bayes, k-nearest neighbor, support vector machine, gradient boosting, and artificial neural network (ANN) were developed for type II diabetes using the data published by the National Institute of Diabetes, Digestive and Kidney Diseases in the USA (PIMA Indian Diabetes Dataset). Among the 768 patient records, 430 (50% with diabetes and 50% without diabetes) were used to train the models to reduce data biasness, and the remaining 338 records were used for testing. The performance of each model was evaluated and compared using testing accuracy, mean squared error (MSE), sensitivity, precision, and F1-score. The results showed that the random forest model has the highest testing accuracy of 83.12% and the lowest MSE. This result shows that most significant predictor variables are number of pregnancies, insulin level, BMI level, and age. The ANN model achieved the highest MSE, due to the limited number of training data. Therefore, the random forest model with number of 50 subtrees is the most accurate machine learning model that can be used to diagnose type II diabetes in the PIMA Indian Diabetes Dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Science, University of Ruhuna, Matara, Sri Lanka	en_US
dc.subject	Machine Learning	en_US
dc.subject	Testing Accuracy	en_US
dc.subject	Types II Diabetes Prediction	en_US
dc.subject	PIMA Indian	en_US
dc.title	Comparison of machine learning techniques used in type II diabetes risk prediction	en_US
dc.type	Article	en_US