Математичне моделювання керування поведінковими ризиками в умовах часткової спостережуваності

This paper addresses the problem of mathematical modeling and prevention of transient control loss in stochastic human-machine systems characterized by a high cost of error. It is argued that classical control approaches based on Markov decision processes (MDP) are fundamentally limited for this tas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2026
Hauptverfasser:	Чабан, Олександр, Гладун, Володимир
Format:	Artikel
Sprache:	Ukrainisch
Veröffentlicht:	Інститут прикладних проблем механіки і математики ім. Я. С. Підстригача НАН України 2026
Schlagworte:	математичне моделювання втрата контролю навчання з підкріпленням частково спостережуваний марковський процес латентний стан ризик-чутливе керування рекурентна політика умовна вартість під ризиком
Online Zugang:	https://www.fmmit.lviv.ua/index.php/fmmit/article/view/432
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	Physico-mathematical modeling and informational technologies
Завантажити файл:

Institution

Physico-mathematical modeling and informational technologies

Beschreibung
Zusammenfassung:	This paper addresses the problem of mathematical modeling and prevention of transient control loss in stochastic human-machine systems characterized by a high cost of error. It is argued that classical control approaches based on Markov decision processes (MDP) are fundamentally limited for this task: since the true psychological state of the controlled object is a latent variable, the application of MDP inevitably leads to the problem of perceptual aliasing. To describe the hidden dynamics, a theoretical model is proposed that formalizes the control problem as a partially observable Markov decision process. The framework of recurrent reinforcement learning serves as the algorithmic basis. It is demonstrated that integrating the long short-term memory architecture provides the necessary mechanism for aggregating a sequence of noisy observations into a coherent behavioral trajectory, enabling the agent to infer the hidden risk level. Furthermore, a mathematical model for composite reward shaping is developed, departing from the standard maximization of expected return. By utilizing the conditional value at risk metric, the proposed model optimizes the control policy while accounting for heavy-tailed risks and worst-case scenarios of behavioral escalation. This work establishes a rigorous theoretical foundation for transitioning from static classification systems to algorithms for proactive and adaptive user support under conditions of uncertainty.
DOI:	10.15407/fmmit2026.42.050

Математичне моделювання керування поведінковими ризиками в умовах часткової спостережуваності

Institution

Ähnliche Einträge