Can artificial intelligence create simulation scenarios on par with humans? a comparative analysis of scenarios written by humans and AI, using the case of «Fire in the operating room»
https://doi.org/10.46594/2687-0037_2025_4_2151
Abstract
A blind comparative pilot study was conducted to evaluate the quality of simulation scenarios generated by generative artificial intelligence (AI) models compared to human-generated scenarios. The objectives of the study were to test the feasibility of generating scenarios by AI models, compare their quality with human-generated scenarios, and assess the ability of experts to determine authorship. Three models (Grok, ChatGPT and DeepSeek) were used to generate scenarios based on a standardized prompt, and the ROSOMED competition case served as the reference standard. Five independent experts evaluated four scenarios using the original scale of assessment of a simulation scenario (SASS). The ChatGPT scenario (average score of 3.4) outperformed the human scenario (2.9). The AI evaluator generally ranked the scenarios similarly to the human experts, but it showed bias towards its own work when self-evaluating.
About the Authors
M. D. GorshkovРоссия
Gorshkov Maxim
Moscow;
Stuttgart
A. А. Andreenko
Россия
Andreenko Alexander
St. Petersburg
R. L. Bulanov
Россия
Bulanov Roman
Arkhangelsk
I. I. Dolgina
Россия
Dolgina Irina
Kursk
O. V. Golubeva
Россия
Golubeva Olesya
Moscow
D. M. Gribkov
Россия
Gribkov Denis
Moscow
Z. A. Zaripova
Россия
Zaripova Zulfiia
St. Petersburg
N. G. Kostsova
Россия
Kostsova Nadezhda
Moscow
V. R. Nepershina
Россия
Nepershina Valia
Moscow
S. V. Khodus
Россия
Khodus Sergey
Blagoveshchensk
L. B. Shubina
Россия
Shubina Lyubov
Moscow
References
1. Al-Elq A. H. Simulation-based medical teaching and learning. J Family Community Med. 2010 Jan; Vol. 17 (1). P. 35–40. DOI: 10.4103/1319-1683.68787. PMID: 22022669; PMCID: PMC3195067
2. Issenberg S. B., McGaghie W. C., Petrusa E. R., Lee Gordon D., Scalese R. J. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review // Med Teach. 2005 Jan. Vol. 27 (1). P. 10–28. DOI: 10.1080/01421590500046924. PMID: 16147767.
3. Kasneci E., Sessler K., Küchemann S., Bannert M., Dementieva D., Fischer F., Gasser U., Groh G., Günnemann S., Hüllermeier E., Krusche S., Kutyniok G., Michaeli T., Nerdel C., Pfeffer J., Poquet O., Michael Sailer M., Schmidt A., Seidel T., Stadler M., Kasneci G. ChatGPT for good? On opportunities and challenges of large language models for education // Learning and Individual Differences. 2023. Vol. 103. № 102274. DOI: 10.1016/j.lindif.2023.102274
4.
Review
For citations:
Gorshkov M.D., Andreenko A.А., Bulanov R.L., Dolgina I.I., Golubeva O.V., Gribkov D.M., Zaripova Z.A., Kostsova N.G., Nepershina V.R., Khodus S.V., Shubina L.B. Can artificial intelligence create simulation scenarios on par with humans? a comparative analysis of scenarios written by humans and AI, using the case of «Fire in the operating room». Virtual Technologies in Medicine. 2025;(4):362-369. (In Russ.) https://doi.org/10.46594/2687-0037_2025_4_2151
JATS XML
















