Математичне моделювання керування поведінковими ризиками в умовах часткової спостережуваності

This paper addresses the problem of mathematical modeling and prevention of transient control loss in stochastic human-machine systems characterized by a high cost of error. It is argued that classical control approaches based on Markov decision processes (MDP) are fundamentally limited for this tas...

Full description

Saved in:
Bibliographic Details
Date:2026
Main Authors: Чабан, Олександр, Гладун, Володимир
Format: Article
Language:Ukrainian
Published: Інститут прикладних проблем механіки і математики ім. Я. С. Підстригача НАН України 2026
Subjects:
Online Access:https://www.fmmit.lviv.ua/index.php/fmmit/article/view/432
Tags: Add Tag
No Tags, Be the first to tag this record!
Journal Title:Physico-mathematical modeling and informational technologies
Download file: Pdf

Institution

Physico-mathematical modeling and informational technologies
Description
Summary:This paper addresses the problem of mathematical modeling and prevention of transient control loss in stochastic human-machine systems characterized by a high cost of error. It is argued that classical control approaches based on Markov decision processes (MDP) are fundamentally limited for this task: since the true psychological state of the controlled object is a latent variable, the application of MDP inevitably leads to the problem of perceptual aliasing. To describe the hidden dynamics, a theoretical model is proposed that formalizes the control problem as a partially observable Markov decision process. The framework of recurrent reinforcement learning serves as the algorithmic basis. It is demonstrated that integrating the long short-term memory architecture provides the necessary mechanism for aggregating a sequence of noisy observations into a coherent behavioral trajectory, enabling the agent to infer the hidden risk level. Furthermore, a mathematical model for composite reward shaping is developed, departing from the standard maximization of expected return. By utilizing the conditional value at risk metric, the proposed model optimizes the control policy while accounting for heavy-tailed risks and worst-case scenarios of behavioral escalation. This work establishes a rigorous theoretical foundation for transitioning from static classification systems to algorithms for proactive and adaptive user support under conditions of uncertainty.
DOI:10.15407/fmmit2026.42.050