Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset

Roy, A.; Bhattacharjee, Debotosh; Krejcar, Ondřej

Domovská stránka DSpace
→
Univerzita Hradec Králové
→
Publikační činnost akademických pracovníků UHK
→
Zobrazit záznam

Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset

Roy, A.; Bhattacharjee, Debotosh; Krejcar, Ondřej

URI: http://hdl.handle.net/20.500.12603/2383

Datum: 2025

Abstrakt:

The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi's usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance. © 2025 The Authors

Zobrazit celý záznam