Data mining using a support vector machine, decision tree, logistic regression and random forest for pneumonia prediction and classification


Bahtiar Imran
Sriasih Sriasih
Surni Erniwati
Salman Salman


This study uses Data Mining with four classification models. The object of this research is pneumonia data. The proposed models are Support Vector Machine (SVM), Decision Tree, Logistic Regression and Random Forest. Tests have been carried out using Cross-Validation Sampling and Stratified Sampling using several Folds of 3, 10 and 20. The results obtained are Logistic Regression models get the highest and most consistent accuracy results compared to SVM, Decision Tree and Random Forest. The tests evidence this carried out with the results of Number of Folds 3 getting the AUC value of 0.990, Accuracy 0.962, F1 0.962, Precision 0.962 and Recall 0.962. Number of Folds 10 gets the AUC value of 0.991, Accuracy 0.961, F1 0.961, Precision 0.961 and Recall 0.961. Number of Folds 20 gets AUC 0.991, Accuracy 0.965, F1 0.965, Precision 0.965 and Recall 0.965. From this study, Logistic Regression got good results for predicting and classifying pneumonia.


How to Cite
Imran, B., Zaeniah, Sriasih, S., Surni Erniwati, & Salman, S. (2022). Data mining using a support vector machine, decision tree, logistic regression and random forest for pneumonia prediction and classification. INFOKUM, 10(02), 792-802. Retrieved from


[1] B. Imran and L. D. Bakti, “Implementation of Machine Learning Model for Pneumonia Classification Based on X-Ray Images,” J. Mantik, vol. 5, no. 3, pp. 2101–2107, 2021.
[2] V. Krishnaiah, D. Narsimha, and D. Chandra, “Diagnosis of lung cancer prediction system using data mining classification techniques,” Int. J. Comput. Sci. Inf. Technol., vol. 4, no. 1, pp. 39–45, 2013.
[3] Y. Luo et al., “Machine learning for the prediction of severe pneumonia during posttransplant hospitalization in recipients of a deceased-donor kidney transplant,” Ann. Transl. Med., vol. 8, no. 4, pp. 82–82, 2020, doi: 10.21037/atm.2020.01.09.
[4] O. Stephen, M. Sain, U. J. Maduh, and D. U. Jeong, “An Efficient Deep Learning Approach to Pneumonia Classification in Healthcare,” J. Healthc. Eng., vol. 2019, 2019, doi: 10.1155/2019/4180949.
[5] A. Sadiya, A. V. Illur, A. Nanda, E. Rao, K. P. Vidyashree, and M. Ahmed, “Differential diagnosis of tuberculosis and pneumonia using machine learning,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 6 Special Issue 4, pp. 245–250, 2019, doi: 10.35940/ijitee.F1049.0486S419.
[6] Y. Erdaw and E. Tachbele, "Machine learning model, applied on chest X-ray images enables automatic detection of COVID-19 cases with high accuracy," Int. J. Gen. Med., vol. 14, pp. 4923–4931, 2021, doi: 10.2147/IJGM.S325609.
[7] K. R. Swetha, M. Niranjanamurthy, M. P. Amulya, and M. Y. Manu, “Prediction of Pneumonia Using Big Data, Deep Learning and Machine Learning Techniques,” Proc. 6th Int. Conf. Commun. Electron. Syst. ICCES 2021, no. August, pp. 1697–1700, 2021, doi: 10.1109/ICCES51350.2021.9489188.
[8] K. M. Kuo, P. C. Talley, C. H. Huang, and L. C. Cheng, “Predicting hospital-acquired pneumonia among schizophrenic patients: A machine learning approach,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–8, 2019, doi: 10.1186/s12911-019-0792-1.
[9] E. Naydenova, A. Tsanas, S. Howie, C. Casals-Pascual, and M. De Vos, “The power of data mining in diagnosis of childhood pneumonia,” J. R. Soc. Interface, vol. 13, no. 120, 2016, doi: 10.1098/rsif.2016.0266.
[10] M. Zandehshahvar, M. van Assen, H. Maleki, Y. Kiarashi, C. N. De Cecco, and A. Adibi, “Toward understanding COVID-19 pneumonia: a deep-learning-based approach for severity analysis and monitoring the disease,” Sci. Rep., vol. 11, no. 1, pp. 1–10, 2021, doi: 10.1038/s41598-021-90411-3.
[11] H. Muneeb Ahmad, M. Sohail, M. Muneeb Ahmad, S. Iqbal, A. Sarfaraz, and K. Noor, “Predictions of Pneumonia Disease using Image Analytics in Orange Tool,” GS Int. Conf. Comput. Sci. Eng. 2020 (GSICCSE 2020, no. August 2020, 2020.
[12] S. Guhathakurata, S. Kundu, A. Chakraborty, and J. S. Banerjee, “A novel approach to predict COVID-19 using support vector machine,” Data Sci. COVID-19, pp. 351–364, 2021, doi: 10.1016/b978-0-12-824536-1.00014-9.
[13] A. Mittal et al., “Detecting pneumonia using convolutions and dynamic capsule routing for chest X-ray images,” Sensors (Switzerland), vol. 20, no. 4, pp. 1–30, 2020, doi: 10.3390/s20041068.
[14] Y. S. Taspinar, I. Cinar, and M. Koklu, “Classification by a stacking model using CNN features for COVID-19 infection diagnosis,” J. Xray. Sci. Technol., vol. 30, no. 1, pp. 73–88, 2021, doi: 10.3233/xst-211031.
[15] F. R. Lumbanraja, E. Fitri, Ardiansyah, A. Junaidi, and R. Prabowo, “Abstract Classification Using Support Vector Machine Algorithm (Case Study: Abstract in a Computer Science Journal),” J. Phys. Conf. Ser., vol. 1751, no. 1, pp. 1–12, 2021, doi: 10.1088/1742-6596/1751/1/012042.
[16] A. Abubakar et al., “A support vector machine classification of computational capabilities of 3D map on mobile device for navigation aid,” Int. J. Interact. Mob. Technol., vol. 10, no. 3, pp. 4–10, 2016, doi: 10.3991/ijim.v10i3.5056.
[17] H. H. Patel and P. Prajapati, "Study and Analysis of Decision Tree-Based Classification Algorithms," Int. J. Comput. Sci. Eng., vol. 6, no. 10, pp. 56–61, 2018.
[18] B. T. Jijo and A. M. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, 2021, doi: 10.38094/jastt20165.
[19] M. A. Abd-Elrazek, A. A. Othman, M. H. Abd Elaziz, and M. N. Abd-Elwhab, “Intelligent Prediction of Breast Cancer: A Comparative Study,” Egypt. Comput. Sci. J., vol. 42, no. 3, pp. 29–43, 2018, [Online]. Available:
[20] A. More, S. Mhatre, V. Kamble, V. Patil, and S. Bhairnallykar, “Breast Cancer Prediction Using Classification Techniques of Machine Learning,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 10, no. 1, 2022, doi: 10.26483/ijarcs.v10i5.6464.
[21] R. Rawal, “BREAST CANCER PREDICTION USING MACHINE LEARNING,” J. Emerg. Technol. Innov. Res., vol. 7, no. 5, pp. 13–24, 2020, doi: 10.2478/acss-2020-0018.
[22] S. M. Ayyoubzadeh, A. Almasizand, and ..., “Early Breast Cancer Prediction Using Dermatoglyphics: Data Mining Pilot Study in a General Hospital in Iran,” Heal. Educ. Heal. Promot., vol. 9, no. 3, pp. 279–285, 2021, [Online]. Available:
[23] E. R. Kaur and V. Chopra, “Implementing Adaboost and Enhanced Adaboost Algorithm in Web Mining,” Int. J. Advanced Res. Comput. Commun. Eng., vol. 4, no. 7, pp. 306–311, 2015, doi: 10.17148/IJARCCE.2015.4771.
[24] M. K. KELEŞ, “Breast cancer prediction and detection using data mining classification algorithms: A comparative study,” The. Vjesn., vol. 26, no. 1, pp. 149–155, 2019, doi: 10.17559/TV-20180417102943.
[25] V. Y. Kulkarni and P. K. Sinha, “Effective Learning and Classification using Random Forest Algorithm,” Int. J. Eng. Innov. Technology, vol. 3, no. 11, pp. 267–273, 2014.
[26] A. Sinha, B. Sahoo, S. S. Rautaray, and M. Pandey, “Analysis of Breast Cancer Dataset Using Big Data Algorithms for Accuracy of Diseases Prediction,” Lect. Notes Data Eng. Commun. Technol., vol. 44, pp. 271–277, 2020, doi: 10.1007/978-3-030-37051-0_31.
[27] M. H. Krishna and D. K. N. Rao, “PREDICTION OF BREAST CANCER USING MACHINE LEARNING TECHNIQUES,” Int. J. Manag. Technol. Eng., vol. 8, no. 12, pp. 150–153, 2018, doi: 10.2174/2213275912666190617160834.
[28] K. Swetha and R. Ranjana, "Breast Cancer Prediction Using Machine Learning and Data Mining," Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 6, no. 3, pp. 610–615, 2020, [Online]. Available:
[29] V. Chaurasia, S. Pal, and B. B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” J. Algorithms Comput. Technol., vol. 12, no. 2, pp. 119–126, 2018, doi: 10.1177/1748301818756225.