USING SYNTHETIC DATA TO IMPROVE DATA PROCESSING ALGORITHMS IN BUSINESS INTELLIGENCE
DOI:
https://doi.org/10.26577/jpcsit2024-v2-i4-a5Keywords:
Synthetic data, data processing, CTGAN, TVAE, Linear Regression, RandomForestRegressor, GradientBoostingRegressorAbstract
The growing volumes of data require the development of effective methods for its processing to solve practical problems. This study is devoted to the use of synthetic data to improve data processing algorithms in business analysis tasks. Synthetic data has a number of benefits, including increasing the amount of data available to train models and ensuring privacy when working with sensitive financial and medical data. The paper examines the potential of synthetic data generated by CTGAN and TVAE methods for regression problems. The study uses two datasets—Health Insurance and Boston Housing—to evaluate the performance of machine learning models, such as linear regression, random forest, and gradient boosting. The results suggest that synthetic data can significantly improve algorithm performance, especially for small or unbalanced datasets, although challenges remain in achieving quality comparable to real-world data. The study highlights the practical importance of synthetic data for optimizing business processes and opens up new opportunities for further study of data generation methods and their application.