The impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes

Ramezankhani, A. and Pournik, O. and Shahrabi, J. and Azizi, F. and Hadaegh, F. and Khalili, D. (2016) The impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes. Medical Decision Making, 36 (1). pp. 137-144.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Objective. To evaluate the impact of the synthetic minority oversampling technique (SMOTE) on the performance of probabilistic neural network (PNN), naïve Bayes (NB), and decision tree (DT) classifiers for predicting diabetes in a prospective cohort of the Tehran Lipid and Glucose Study (TLGS). Methods. Data of the 6647 nondiabetic participants, aged 20 years or older with more than 10 years of follow-up, were used to develop prediction models based on 21 common risk factors. The minority class in the training dataset was oversampled using the SMOTE technique, at 100, 200, 300, 400, 500, 600, and 700 of its original size. The original and the oversampled training datasets were used to establish the classification models. Accuracy, sensitivity, specificity, precision, F-measure, and Youden's index were used to evaluated the performance of classifiers in the test dataset. To compare the performance of the 3 classification models, we used the ROC convex hull (ROCCH). Results. Oversampling the minority class at 700 (completely balanced) increased the sensitivity of the PNN, DT, and NB by 64, 51, and 5, respectively, but decreased the accuracy and specificity of the 3 classification methods. NB had the best Youden's index before and after oversampling. The ROCCH showed that PNN is suboptimal for any class and cost conditions. Conclusions. To determine a classifier with a machine learning algorithm like the PNN and DT, class skew in data should be considered. The NB and DT were optimal classifiers in a prediction task in an imbalanced medical database. © 2014 Society for Medical Decision Making.

Item Type: Article
Additional Information: cited By 6
Depositing User: eprints admin
Date Deposited: 02 Jul 2018 09:05
Last Modified: 02 Jul 2018 09:05
URI: http://eprints.iums.ac.ir/id/eprint/4465

Actions (login required)

View Item View Item