COMPARATIVE STUDY OF PARALLEL ALGORITHMS FOR MACHINE LEARNING METHODS
DOI:
https://doi.org/10.26577/jpcsit2023v1i4a2Keywords:
Machine learning, Linear regression, Random forest, Parallel computingAbstract
In the modern world, as the amount of data used in machine learning is constantly growing, the task of accelerating the training of models on large datasets becomes relevant. To solve this problem, methods of parallel data processing are used. This paper discusses methods of parallel data processing for machine learning. Linear regression and random forest are considered as machine learning methods. Parallel algorithms based on the MPI interface were developed for each method. The results of the experiments showed that both methods give acceleration compared to the sequential algorithm. However, the acceleration in the case of random forest was significantly higher than in the case of linear regression. This is because random forest is a more computationally efficient method than linear regression. Therefore, it can be concluded that Random Forest is the most effective machine learning approach for parallel data processing. This statement is confirmed by the results of experiments conducted in this work. Overall, the experimental results show that the use of parallel algorithms in machine learning can significantly speed up model training when working with large data sets. Random forest is the most efficient method for parallel data processing, as it is more computationally efficient and has higher scalability.