Résumé:
Generation of massive data is increasing in big data industries due tothe evolution of modern technologies. The big data industries include data sourcefrom sensors, Internet of Things, digital and social media. In particular, these bigdata systems consist of data extraction, preprocessing, integration, analysis, andvisualization mechanism. The data encountered from the sources are redundant,incomplete and conflict. Moreover, in real time applications, it is a tedious processfor the interpretation of all the data from different sources. In this paper, the gath-ered data are preprocessed to handle the issues such as redundant, incomplete andconflict. For that, it is proposed to have a generalized dimensionality reductiontechnique called Shrinkage Linear Discriminate Analysis (SLDA). As a result,the Shrinkage Linear Discriminate Analysis (LDA) will improve the performanceof the classifier with generalization. Even though, dimensionality reduction sys-tems improve the performance of the classifier, the irrelevant features getdegraded by the performance of the system further. Hence, the relevant and themost important features are selected using Pearson correlation-based feature selec-tion technique which selects the subset of correlated features for improving theperformance of the classification system. The selected features are classified usingthe proposed Quadratic-Gaussian Discriminant Analysis (QGDA) classifier. Theproposed evolution techniques are tested with the localization and the cover datasets from machine learning University of California Irvine (UCI) repository. Inaddition to that, the proposed techniques on datasets are evaluated with the eva-luation metrics and compared to the other similar methods which prove the effi-ciency of the proposed classification system. It has achieved better performance.The acquired accuracy is over 91% for all the experiment on these datasets. Basedon the results evaluated in terms of training percentage and mapper, it is meaning-ful to conclude that the proposed method could be used for big data classification.