| dc.rights.license | CC BY | eng |
| dc.contributor.author | Roy, A. | cze |
| dc.contributor.author | Bhattacharjee, Debotosh | cze |
| dc.contributor.author | Krejcar, Ondřej | cze |
| dc.date.accessioned | 2025-12-05T15:41:07Z | |
| dc.date.available | 2025-12-05T15:41:07Z | |
| dc.date.issued | 2025 | eng |
| dc.identifier.issn | 2352-3409 | eng |
| dc.identifier.uri | http://hdl.handle.net/20.500.12603/2383 | |
| dc.description.abstract | The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi's usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance. © 2025 The Authors | eng |
| dc.format | p. "Article number: 111599" | eng |
| dc.language.iso | eng | eng |
| dc.publisher | Elsevier Inc. | eng |
| dc.relation.ispartof | Data in Brief, volume 60, issue: June | eng |
| dc.subject | Anomaly detection | eng |
| dc.subject | Cybersecurity | eng |
| dc.subject | Data preprocessing | eng |
| dc.subject | Dataset optimization | eng |
| dc.subject | Intelligent transportation systems | eng |
| dc.subject | Internet of vehicles | eng |
| dc.subject | Intrusion detection systems | eng |
| dc.subject | Machine learnin | eng |
| dc.subject | Network security | eng |
| dc.subject | Vehicular reference misbehavior dataset | eng |
| dc.title | Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset | eng |
| dc.type | article | eng |
| dc.identifier.obd | 43882000 | eng |
| dc.identifier.doi | 10.1016/j.dib.2025.111599 | eng |
| dc.publicationstatus | postprint | eng |
| dc.peerreviewed | yes | eng |
| dc.source.url | https://www.sciencedirect.com/science/article/pii/S2352340925003312?pes=vor&utm_source=scopus&getft_integrator=scopus | cze |
| dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S2352340925003312?pes=vor&utm_source=scopus&getft_integrator=scopus | eng |
| dc.rights.access | Open Access | eng |