Comparative Analysis of Machine Learning Algorithms to Predict Type II Diabetes
ABSTRACT:Machine Learning (ML) models are becoming robust and more accurate nowadays as the rapid increase in the amount and quality of training data. Researchers are proposing complex models for real-life problems to achieve higher accuracy, which requires high computing and other resources. In the context of the healthcare disease diagnosis, detection and prediction is still a challenge. Early diagnosis of a disease or ailment helps in timely recovery. Moreover, health been core to every individual, a lot of work is being done in this field to improve upon by using all available information. Current paper experiments on Pima Indian Diabetes Dataset (PIDDS) in two stages A and B. The main objective of this study is to review the accuracy of the applied machine learning algorithms and analyze their efficiency in predictions. Another essential objective is to show the efficacy of simpler models. Fields like computer vision and NLP have given rise to deep learning with complex and high computational models setting the trend to apply them in almost all the fields While they help where we have an abundance of data and complex relationships, simpler models still can do wonders and on their day can challenge these behemoths. We have also applied preprocessing methods (imputation, feature selection, scaling and discretization) to improve the classification accuracy. The algorithms selected for this problem are Logistic regression (LR), Artificial Neural Networks (ANN), Support Vector Machine(SVM), Naïve Bayes (NB), and Decision Tree(DT). LR provided the best accuracy, and the rest of the models are very close to each other.