Citation: Alberto Amato, Rita Dario, Marina Popolizio, Vincenzo Di Lecce, "Data Preprocessing For Machine Learning Algorithms", International Conference On Software Engineering And Information Technology(ICSEIT) 25th Jul-26th Jul 2023 Athens, Greece
Abstract: Machine learning is increasingly popular and producing impressive results in various research fields. Terms such as big data, cloud computing, machine learning, and artificial intelligence are being incorporated into many aspects of daily life. However, having a large amount of data available does not always guarantee better performance for these techniques. Therefore, several papers on data preprocessing for information discovery have been published in the literature. This study focuses on the effects of three preprocessing techniques (Single Value Decomposition, Principal Component Analysis, and Semi-Pivoted QR approximation) on the performance of an unsupervised clustering algorithm. To the best of our knowledge, SPQR has never been used for data preprocessing in machine learning algorithms. The results indicate that data preprocessing techniques can have a significant impact on information discovery tasks, emphasizing the importance of avoiding "push button" solutions in datamining.
Keyword: data analysis; PCA; SPQR; SVD; FCM; Clustering Silhouette