DSpace Repository

Evaluation of Generative AI Models in Python Code Generation: A Comparative Study

Show simple item record

dc.rights.license CC BY eng
dc.contributor.author Palla, Dominik cze
dc.contributor.author Slabý, Antonín cze
dc.date.accessioned 2025-12-05T15:41:51Z
dc.date.available 2025-12-05T15:41:51Z
dc.date.issued 2025 eng
dc.identifier.issn 2169-3536 eng
dc.identifier.uri http://hdl.handle.net/20.500.12603/2388
dc.description.abstract This study evaluates leading generative AI models for Python code generation. Evaluation criteria include syntax accuracy, response time, completeness, reliability, and cost. The models tested comprise OpenAI's GPT series (GPT-4 Turbo, GPT-4o, GPT-4o Mini, GPT-3.5 Turbo), Google's Gemini (1.0 Pro, 1.5 Flash, 1.5 Pro), Meta's LLaMA (3.0 8B, 3.1 8B), and Anthropic's Claude models (3.5 Sonnet, 3 Opus, 3 Sonnet, 3 Haiku). Ten coding tasks of varying complexity were tested across three iterations per model to measure performance and consistency. Claude models, especially Claude 3.5 Sonnet, achieved the highest accuracy and reliability. They outperformed all other models in both simple and complex tasks. Gemini models showed limitations in handling complex code. Cost-effective options like Claude 3 Haiku and Gemini 1.5 Flash were budget-friendly and maintained good accuracy on simpler problems. Unlike earlier single-metric studies, this work introduces a multi-dimensional evaluation framework that considers accuracy, reliability, cost, and exception handling. Future work will explore other programming languages and include metrics such as code optimization and security robustness. eng
dc.format p. 65334-65347 eng
dc.language.iso eng eng
dc.publisher IEEE eng
dc.relation.ispartof IEEE Access, volume 13, issue: April eng
dc.subject Automatization eng
dc.subject generative AI eng
dc.subject generative AI eng
dc.subject LLM eng
dc.subject LLM eng
dc.subject python eng
dc.subject python eng
dc.subject software development eng
dc.subject software development eng
dc.subject software development eng
dc.title Evaluation of Generative AI Models in Python Code Generation: A Comparative Study eng
dc.type article eng
dc.identifier.obd 43882011 eng
dc.identifier.wos 001470367900023 eng
dc.identifier.doi 10.1109/ACCESS.2025.3560244 eng
dc.publicationstatus postprint eng
dc.peerreviewed yes eng
dc.source.url https://ieeexplore.ieee.org/document/10963975 cze
dc.relation.publisherversion https://ieeexplore.ieee.org/document/10963975 eng
dc.rights.access Open Access eng


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account