DSpace Repository

The Effect of Generating Synthetic Data in Smart City Network Systems

Show simple item record

dc.rights.license CC BY eng
dc.contributor.author Čech, Pavel cze
dc.contributor.author Ponce, Daniela cze
dc.contributor.author Mikulecký, Peter cze
dc.contributor.author Žváčková, Andrea cze
dc.contributor.author Mls, Karel cze
dc.contributor.author Otčenášková, Tereza cze
dc.contributor.author Tučník, Petr cze
dc.date.accessioned 2025-12-05T15:38:03Z
dc.date.available 2025-12-05T15:38:03Z
dc.date.issued 2025 eng
dc.identifier.issn 2662-995X eng
dc.identifier.uri http://hdl.handle.net/20.500.12603/2362
dc.description.abstract This study examines the effect of synthetic data generation for balancing class distributions on the performance of classification algorithms in smart city network systems. Contrary to the assumption that data balancing improves classification performance, the analysis reveals a more complex impact. Using three publicly available network traffic benchmark datasets and four different balancing techniques, the study evaluates the performance of five classifiers on 65 classification tasks. The findings indicate that, for smaller datasets, classifiers that achieved the highest accuracy on unbalanced data did not benefit from synthetic data generation for minority classes. Although neural network-based classifiers showed improved performance with balanced data, these improvements came at the cost of lower overall classification scores. For larger datasets, balancing through random oversampling of minority classes and undersampling of majority classes helped improve classification. However, these improvements were limited to precision, with no significant gains in recall. The study offers valuable insights into using synthetic data for intrusion detection, emphasizing the challenges of intricate dependencies in network traffic data for generative models. The results align with previous research showing mixed effects of data balancing on classifier performance, contributing to a broader understanding of the limited efficacy of synthetic data in real-world network contexts. This experimental study highlights the need for a systematic benchmarking framework for synthetic data research, ensuring consistency in data balancing and classification processes. This work contributes to the ongoing discourse on the intersection of machine learning and cybersecurity, emphasizing the critical role of data in developing resilient intrusion detection systems. © The Author(s) 2025. eng
dc.format p. "Article number: 174" eng
dc.language.iso eng eng
dc.publisher Springer eng
dc.relation.ispartof SN Computer Science, volume 6, issue: 2 eng
dc.subject Attack classification eng
dc.subject Generative adversarial networks eng
dc.subject Imbalanced datasets eng
dc.subject Intrusion detection eng
dc.title The Effect of Generating Synthetic Data in Smart City Network Systems eng
dc.type article eng
dc.identifier.obd 43881928 eng
dc.identifier.doi 10.1007/s42979-025-03673-3 eng
dc.publicationstatus postprint eng
dc.peerreviewed yes eng
dc.source.url https://link.springer.com/article/10.1007/s42979-025-03673-3 cze
dc.relation.publisherversion https://link.springer.com/article/10.1007/s42979-025-03673-3 eng
dc.rights.access Open Access eng
dc.project.ID VJ02010016/Využití umělé inteligence pro zajištění kybernetické bezpečnosti Smart City eng


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account