Data Mining: Building The Automatic Pipeline System for Clustering the Compliance Level of PBB-P2 Taxpayers
Keywords:
Data Mining, Pipeline, K-Means, Clustering, PBB-P2Abstract
Taxpayer compliance analysis is crucial in regional revenue management, especially for Rural and Urban Land and Building Tax (PBB-P2). The large volume of taxpayer data poses a significant challenge in manual processing, so an automated and efficient approach is needed. This research develops a data mining pipeline to cluster the level of taxpayer compliance in the Badan Pengelola Keuangan dan Pendapatan Daerah (BPKPD) of Tebing Tinggi City. The proposed pipeline is implemented using the Java programming language and the SMILE library and includes three main procedures, namely Extraction, which is in charge of retrieving receivables and payment data from the SISMIOP database; Transformation, which is in charge of processing the extracted data to generate new insights through K-Means clustering, and Load is in charge of storing the transformation results into the MySQL database for further analysis and reporting. This pipeline is run every hour to ensure that data processing is carried out in real-time. By utilizing this automated system, this study aims to increase understanding of taxpayer compliance patterns and assist local governments in designing more effective policies to increase tax revenues and taxpayer compliance levels.
References
Agostinelli, S., Benvenuti, D., Luzi, F. De, & Marrella, A. (2023). Big Data Pipeline Discovery through Process Mining: Challenges and Research Directions⋆. Catalogo Dei Prodotti Della Ricerca, 101016835.
Gallego, V., Lingan, J., Freixes, A., Juan, A. A., & Osorio, C. (2024). Applying Machine Learning in Marketing: An Analysis Using the NMF and k-Means Algorithms. Information (Switzerland), 15(7), 1–16. https://doi.org/10.3390/info15070368
Iqbal, M., Sipayung, S. P., Sinaga, A. R., & Hasugian, P. M. (2024). Analysis of Student Achievement with K-Means on Socioeconomic , Behavioral , and Psychological Factors. Jurnal Info Sains : Informatika Dan Sains, 14(04), 715–728. https://doi.org/10.54209/infosains.v14i04
Novelan, M. S., Efendi, S., Sihombing, P., & Mawengkang, H. (2023). Optimation Cavacity Vehicle Routing Problem with K-Nearest Neighbor in Classification of Goods Ditribution Route. 2023 International Conference of Computer Science and Information Technology (ICOSNIKOM), 1–6.
Putera, A., Siahaan, U., Jabar, A. A., Pranoto, S., & Sutiono, S. (2024). Analysis of Property Tax Bill Classification Using the C4 . 5 Algorithm. Journal of Information Technology, Computer Science and Electrical Engineering (JITCSE), 1(3), 181–185. https://doi.org/10.30596/jitcse
Putera, A., Siahaan, U., Sutiono, S., Pranoto, S., & Mentari, R. S. (2024). Analysis of Property Tax Payment Compliance Classification in Tebing Tinggi City Using the C4 . 5 Decision Tree Algorithm. Journal of Information Technology, Computer Science and Electrical Engineering (JITCSE), 1(2), 134–138. https://doi.org/10.61306/jitcse.v1i2
Putra, P. H., Syahputra, Z., Novelan, M. S., Budi, P., Utara, S., & Info, A. (2021). Application Of The K-Means Algorithm In Identifying Types Of Skin Disease. Jurnal Infokum, 9(2), 281–286.
Raj, A., Bosch, J., Holmstr, H., & Wang, T. J. (2020). Modelling Data Pipelines. 2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 13–20. https://doi.org/10.1109/SEAA51224.2020.00014
S.Pranoto and D.Nasution. (2024). Business Intelligence Menggunakan Apache Superset untuk Sistem Pendukung Keputusan Kebijakan Penagihan Pajak Bumi dan Bangunan : Studi Kasus BPKPD Kota Tebing Tinggi. Indonesian Journal of Education, 2(3), 154–160.
Sitorus, Z., Pranoto, S., & Sutiono, S. (2024). Comparison of Accuracy between Naïve Bayes and Decision Tree Methods for Property Tax ( PBB-P2 ) Compliance in Tebing Tinggi City. Journal of Information Technology, Computer Science and Electrical Engineering (JITCSE), 1(2), 121–128. https://doi.org/10.61306/jitcse.v1i2
Sri Wahyuni, Ahmad Akbar, Abdul Khaliq, & Aulia Akbar. (2023). Implementation of the Membership Method in Developing a Digital Marketing Website for Secanggan Village Sea Products. International Journal Of Computer Sciences and Mathematics Engineering, 2(2), 115–123. https://doi.org/10.61306/ijecom.v2i2.29
Wahyuni, S. (2018). Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree. Journal of Physics: Conference Series, 970(1). https://doi.org/10.1088/1742-6596/970/1/012030
Wahyuni, S., Julia Sari, D., & Afifah, N. (2022). Implementation of the Ternakloka Application membership method in increasing livestock sales in Kota Pari Village. Sciences Development and Technology, 2(1). http://creativecommons.org/licenses/by-sa/4.0/
Wahyuni, S., & Marbun, M. (2020). Implementation of Data Mining in Predicting the Study Period of Student Using the Naïve Bayes Algorithm. IOP Conference Series: Materials Science and Engineering, 769(1). https://doi.org/10.1088/1757-899X/769/1/012039
Wahyuni, S., Zarlis, M., Solikhun, Jollyta, D., Safii, M., & Sulistianingsih, I. (2019). Implementation of MD Heuristic Method for Classifying Numerical Data in Data Preprocessing. Journal of Physics: Conference Series, 1255(1). https://doi.org/10.1088/1742-6596/1255/1/012060
Zhao, W. L., Deng, C. H., & Ngo, C. W. (2018). k-means: A revisit. Neurocomputing, 291, 195–206. https://doi.org/10.1016/j.neucom.2018.02.072
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sugeng Pranoto, Sri Wahyuni, Muhammad Syahputra Novelan

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.