USING SYNTHETIC DATA TO IMPROVE DATA PROCESSING ALGORITHMS IN BUSINESS INTELLIGENCE

Authors

DOI:

https://doi.org/10.26577/jpcsit2024-v2-i4-a5

Keywords:

Synthetic data, data processing, CTGAN, TVAE, Linear Regression, RandomForestRegressor, GradientBoostingRegressor

Abstract

The growing volumes of data require the development of effective methods for its processing to solve practical problems. This study is devoted to the use of synthetic data to improve data processing algorithms in business analysis tasks. Synthetic data has a number of benefits, including increasing the amount of data available to train models and ensuring privacy when working with sensitive financial and medical data. The paper examines the potential of synthetic data generated by CTGAN and TVAE methods for regression problems. The study uses two datasets—Health Insurance and Boston Housing—to evaluate the performance of machine learning models, such as linear regression, random forest, and gradient boosting. The results suggest that synthetic data can significantly improve algorithm performance, especially for small or unbalanced datasets, although challenges remain in achieving quality comparable to real-world data. The study highlights the practical importance of synthetic data for optimizing business processes and opens up new opportunities for further study of data generation methods and their application.

Downloads

Download data is not yet available.

Author Biographies

Aizat Dildabek, Al-Farabi Kazakh National University, Almaty, Kazakhstan

Master of the Faculty of Artificial Intelligence and Big Date at al-Farabi Kazakh National University

Zukhra Abdiakhmetova, Al-Farabi Kazakh National University, Almaty, Kazakhstan

Deputy Dean for Educational, Methodical and Educational Work, Senior lecturer, PhD

        49 8

Downloads

How to Cite

Dildabek, A., & Abdiakhmetova, Z. (2024). USING SYNTHETIC DATA TO IMPROVE DATA PROCESSING ALGORITHMS IN BUSINESS INTELLIGENCE. Journal of Problems in Computer Science and Information Technologies, 2(4), 44–49. https://doi.org/10.26577/jpcsit2024-v2-i4-a5