A resource limited parallel program model

Modern parallel programs run in a complex, resource-limited environment, and this raises the new requirements for resource consumption and execution stability of long running processes. In order to help with checking resource constraints for such parallel software a resource-limited parallel program...

Повний опис

Збережено в:

Бібліографічні деталі
Дата:	2019
Автор:	Rahozin, D.V.
Формат:	Стаття
Мова:	English
Опубліковано:	Інститут програмних систем НАН України 2019
Теми:	formal model parallel computing parallel computing on graphics processing units UDC 681.3
Онлайн доступ:	https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:	Problems in programming
Завантажити файл:

Репозитарії

Problems in programming

id	pp_isofts_kiev_ua-article-376
record_format	ojs
resource_txt_mv	ppisoftskievua/38/1af0509c5c44ef2ca4731d35343fb238.pdf
spelling	pp_isofts_kiev_ua-article-3762024-04-28T11:07:27Z A resource limited parallel program model Модель параллельных программ с ограничением ресурсов Модель паралельних програм з обмеженням ресурсів Rahozin, D.V. formal model; parallel computing; parallel computing on graphics processing units UDC 681.3 формальная модель; параллельные вычисления; параллельные вычисления на графических процессорах УДК 681.3 формальна модель; паралельні обчислення; паралельні обчислення на графічних процесорах УДК 681.3 Modern parallel programs run in a complex, resource-limited environment, and this raises the new requirements for resource consumption and execution stability of long running processes. In order to help with checking resource constraints for such parallel software a resource-limited parallel program formal model was developed. The model expresses the resource and time constraints and is suitable both for fine grained and coarse-grained parallelism in programs. For higher degrees of parallelism (at independent procedure level, bigger loop iterations, large computing blocks for graphics, video and neural network processing) the interpretation of formal model can be done in run-time and avoid dead locks and hangs during resource allocation. We are discussing several modern software frameworks that are able to integrate the functionality to interpret the model and check the feasibility of the set of parallel programs running on hardware simultaneously with resource and time limitations. Real world tasks – neural network inference, video processing, general purpose computing on GPU – which get benefits after enabling such models - are discussed.Problems in programming 2019; 4: 03-10 Современные параллельные программы исполняются в сложных и ресурсно-ограниченных средах, поэтому к ним предъявляются повышенные требования к потреблению ресурсов и стабильности выполнения, особенно для долгих по времени вычислений. Для помощи в определении ресурсных ограничений для параллельных программ разработана формальная модель параллельных программ с ресурсными ограничениями. Модель определяет временные и ресурсные ограничения и может быть использована для как мелкозернистого, так и для крупнозернистого параллелизма. Для высоких степеней параллелизма (независимые процедуры, большие итерации циклов, большие вычислительные блоки для графических, нейросетевых вычислений и обработки видео) интерпретация модели возможна во время исполнения программы для обеспечения невозможности блокировок при распределении ресурсов. Рассмотрены современные фреймворки, для которых возможна интеграция в свой состав базовых средств проверки соответствия множества параллельных программ наличным вычислительным ресурсам при исполнении программы. Рассмотрены задачи реального мира – вывод в нейросетях, обработки видео, вычисления на графических картах, для которых будет выгодно применение таких моделей.Problems in programming 2019; 4: 03-10 Сучасні паралельні програми виконуються в складних та ресурсно-обмежених середовищах, тому до них застосовуються підвищені вимоги щодо споживання ресурсів та стабільності виконання, особливо для довгих у часі обчислень. Для допомоги у визначенні ресурсних обмежень для паралельних програм розроблено формальну модель паралельних програм з ресурсними обмеженнями. Модель визначає часові та ресурсні обмеження і може бути використана як для дрібнозернистого, так і для крупнозернистого паралелізму. Для високих ступенів паралелізму (незалежні процедури, великі ітерації циклів, великі обчислювальні блоки для графічних, нейромережевих обчислень та обробки відео) інтерпретація моделі можлива під час виконання програми для унеможливлення блокувань при розподілі ресурсів. Розглянуто сучасні фреймворки, які можуть інтегрувати у свій склад базові засоби перевірки відповідності множини паралельних програм наявним обчислювальним ресурсам під час виконання програм. Розглянуто задачі реального світу – вивід у нейромережах, обробка відео, обчислення на графічних картах, які матимуть зиск з впровадження таких моделей.Problems in programming 2019; 4: 03-10 Інститут програмних систем НАН України 2019-12-05 Article Article application/pdf https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376 10.15407/pp2019.04.003 PROBLEMS IN PROGRAMMING; No 4 (2019); 03-10 ПРОБЛЕМЫ ПРОГРАММИРОВАНИЯ; No 4 (2019); 03-10 ПРОБЛЕМИ ПРОГРАМУВАННЯ; No 4 (2019); 03-10 1727-4907 10.15407/pp2019.04 en https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376/379 Copyright (c) 2019 PROBLEMS IN PROGRAMMING
institution	Problems in programming
baseUrl_str	https://pp.isofts.kiev.ua/index.php/ojs1/oai
datestamp_date	2024-04-28T11:07:27Z
collection	OJS
language	English
topic	formal model parallel computing parallel computing on graphics processing units UDC 681.3
spellingShingle	formal model parallel computing parallel computing on graphics processing units UDC 681.3 Rahozin, D.V. A resource limited parallel program model
topic_facet	formal model parallel computing parallel computing on graphics processing units UDC 681.3 формальная модель параллельные вычисления параллельные вычисления на графических процессорах УДК 681.3 формальна модель паралельні обчислення паралельні обчислення на графічних процесорах УДК 681.3
format	Article
author	Rahozin, D.V.
author_facet	Rahozin, D.V.
author_sort	Rahozin, D.V.
title	A resource limited parallel program model
title_short	A resource limited parallel program model
title_full	A resource limited parallel program model
title_fullStr	A resource limited parallel program model
title_full_unstemmed	A resource limited parallel program model
title_sort	resource limited parallel program model
title_alt	Модель параллельных программ с ограничением ресурсов Модель паралельних програм з обмеженням ресурсів
description	Modern parallel programs run in a complex, resource-limited environment, and this raises the new requirements for resource consumption and execution stability of long running processes. In order to help with checking resource constraints for such parallel software a resource-limited parallel program formal model was developed. The model expresses the resource and time constraints and is suitable both for fine grained and coarse-grained parallelism in programs. For higher degrees of parallelism (at independent procedure level, bigger loop iterations, large computing blocks for graphics, video and neural network processing) the interpretation of formal model can be done in run-time and avoid dead locks and hangs during resource allocation. We are discussing several modern software frameworks that are able to integrate the functionality to interpret the model and check the feasibility of the set of parallel programs running on hardware simultaneously with resource and time limitations. Real world tasks – neural network inference, video processing, general purpose computing on GPU – which get benefits after enabling such models - are discussed.Problems in programming 2019; 4: 03-10
publisher	Інститут програмних систем НАН України
publishDate	2019
url	https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376
work_keys_str_mv	AT rahozindv aresourcelimitedparallelprogrammodel AT rahozindv modelʹparallelʹnyhprogrammsograničeniemresursov AT rahozindv modelʹparalelʹnihprogramzobmežennâmresursív AT rahozindv resourcelimitedparallelprogrammodel
first_indexed	2024-09-16T04:07:56Z
last_indexed	2024-09-16T04:07:56Z
_version_	1818568285358129152
fulltext	Інструментальні засоби та середовища програмування © Dmytro V. Rahozin, 2019 ISSN 1727-4907. Проблеми програмування. 2019. № 4 3 UDC 681.3 https://doi.org/10.15407/pp2019.04.003 Dmytro V. Rahozin A RESOURCE LIMITED PARALLEL PROGRAM MODEL Modern parallel programs run in a complex, resource-limited environment, and this raises the new require- ments for resource consumption and execution stability of long running processes. In order to help with check- ing resource constraints for such parallel software a resource-limited parallel program formal model was de- veloped. The model expresses the resource and time constraints and is suitable both for fine grained and coarse-grained parallelism in programs. For higher degrees of parallelism (at independent procedure level, big- ger loop iterations, large computing blocks for graphics, video and neural network processing) the interpreta- tion of formal model can be done in run-time and avoid dead locks and hangs during resource allocation. We are discussing several modern software frameworks that are able to integrate the functionality to interpret the model and check the feasibility of the set of parallel programs running on hardware simultaneously with re- source and time limitations. Real world tasks – neural network inference, video processing, general purpose computing on GPU – which get benefits after enabling such models - are discussed. Key words: formal model, parallel computing, parallel computing on graphics processing units. Introduction Nowadays parallel programming still is an activity, which cannot be planned well in terms of spent time and parallelization quality (i. e. writing a good code, which fits underly- ing architecture well in limited time and mon- ey). This can be easily explained by increas- ing complexity of underlying computing ar- chitecture of the modern hardware. The paral- lel architectures of 80ies had the minimal number of cache levels, simpler memory hier- archy and quite simple MESI cache protocols, but even in this case the performant imple- mentation of basic BLAS/LAPACK proce- dures was a tricky thing. Today the state-of- art parallel computing is based on Graphics Processing Units (GPUs) which has quite complex memory hierarchy, complicated da- ta exchange paths between central processors and GPUs and highly parallel SIMD-type computational units. The sophisticated tuning of GPU-based programs is mostly economi- cally ineffective due to high labor cost and longer time-to-market. This involves the use of programming tools, which enable more high-level programming structures than just basic CUDA or OpenCL code. The good examples of such tool are the neural network descriptions tools. Starting from Caffe tool [1], the inference and deep learning phases in neural networks are based on a high-level description of the network. The network therefore is defined as a pipeline of standard computing blocks (usually more than 50 types of blocks are defined) and data paths between them. This definition format practically enables the development of a neu- ral network from idea to optimized implemen- tation without moving down to hand pro- gramming the neural network behavior. The success of Caffe lead other development groups to implement alternate or competitive neural network definition languages (Darknet [2]) or integrate a set of packages into pro- gramming languages and frameworks (Ten- sorFlow [3]), sometimes incompatible. Of course, the use of high level definitions is not limited to neural network structure de- scriptions. 1. Practical expressions of high-level programming structures Usually the simplified description of a big computational pipeline lacks ability for semi-precise optimization of resource use, as there is no way to estimate possible changes in computation pipeline which can use GPU resources more effectively or estimate how efficiently the neural network pipeline will run on GPU simultaneously with other appli- cations. The optimizations of such compu- ting pipeline are hard and strictly depend on underlying hardware architecture, which is a trade secret for the main hardware market players. Other example is OpenCV framework [4] extensions for matrix/image operations on https://doi.org/10.15407/pp2019.04.003 Інструментальні засоби та середовища програмування 4 GPUs. Program can simply move matrices or image operations to GPU without critical changes in code, but the GPU memory man- agement is done by hand by moving the im- age data between central processor unit (CPU) and GPU memory. Such a helper (here helper mean the OpenCV functions set) does not deal with actual computational re- source allocation on GPU, so the user cannot reach the best performance on GPU side without additional and expensive platform- dependent optimizations. An interesting example is Gstreamer [5] media processing pipeline. Started as an eager media processing pipeline more than 30 years ago, now it is enriched with data processing on GPU side. A good example is Gstreamer-based DeepStream framework from Nvidia [6], which provides optimized neural network inference at GPU side. For this case the data processing pipeline may reach more than a hundred components, which are controlled by a dozen of threads. Despite of many efforts applied from Nvidia side, basic Gstreamer functionality still uses only rudimentary and hidden thread control and does not bother with resource allocation (either GPU memory or GPU computational power). Looking through these three cases the can lighten the following flaws: 1) too high- level model for neural network description, where resource allocation (computational, memory, communication) is out of scope of the model; 2) too low-level model where the user should deal with resources by hand but not by describing them in model descrip- tion; 3) even ignorance of resource alloca- tion. All these flaws greatly affect the paral- lel software system which goes from proof of concept stage to customer environment (i. e. productization process) experiencing all computational, memory and communi- cation resource constraints. Due to modern shift of heavy computations to GPU side, the productization process becomes a head- ache for engineers, full of bug hunting and code tricks to resolve various resource con- straints. The problem appears more complex in case if applications are long-running, so the memory and other resource leaks can be fatal. In order to overcome the difficulties of constraint management process we propose a model, which allows to introduce resource management process to frameworks such as OpenCV, TensorFlow or Gstreamer. It should be noted that these frameworks are different in levels of parallelism expression. OpenCV provides only basic parallelism, Gstreamer – thread-level and Caffe/Tensorflow – coarse- grain. Still we model will fit to all parallelism types. There are a lot of previous work intro- ducing various concepts for parallel software models, starting from Timed Finite Automa- tons [7]. This article introduces resource con- straints for parallel software models, which are necessary for executing parallel program in modern highly parallel and resource con- strained environment. 2. A Model of Resource-Constrained Parallel System The definition of the system is de- rived from [8], extending the previous defini- tions. The word “real-time” is not used inten- tionally, as the complex parallel system may have both hardware and resource constraints that prevent real-time behavior. We consider a discrete-time model where the time is represented as a set of non- negative integer values denoted by ℕ. The time progress is measured by clocks, which are non-negative integer variables increased constantly by one for some time. If com- pared to [8] the definition “synchronously” is dropped, as in real life system usually a common reference clock is used. For the set of clocks X, a valuation 𝑣: 𝑿 → ℕ is defined – it is a function associating with each clock its value v(x). For a clocks subset 𝑿′ ⊆ 𝑿 and a clock value 𝑙 ∈ ℕ we denote by 𝑣( 𝑿′, 𝑙, 𝑥) the valuation that coincides with v for all clocks 𝑥 ∈ 𝑿\𝑿′, and that associates l to all clocks 𝑥 ∈ 𝑿′. It is defined by: 𝑣(𝑿′, 𝑙, 𝑥) = { 𝑙 if 𝑥 ∈ 𝑿′ 𝑣(𝑥) otherwise. Guards are used to specify when ac- tions are enabled. Simple constrains over Інструментальні засоби та середовища програмування 5 clocks X are considered. The grammar allows to build general constrains over clocks: 𝑐 ≔ 𝑡𝑟𝑢𝑒 \| 𝑓𝑎𝑙𝑠𝑒 \|𝑥 ≤ 𝑘\|𝑥 < 𝑘\|𝑥 ≥ ≥ 𝑘\|𝑥 > 𝑘\|𝑐 ∧ 𝑐\|𝑐 ∨ 𝑐\|¬𝑐. The evaluation of a clock constraint c for a valuation v of clocks X denoted by c(v) is obtained by replacing each clock x by its value v(x). A guard g is a clock constraint c with an urgency type 𝜏 ∈ { 𝑛, 𝑑, 𝑢 }, denoted by 𝑔 = [𝑐𝜏]. Here urgency types are used to specify the need of the action in case if action execution is enabled (i. e. when the clocks constraint is true). Non-urgent actions are de- noted by n, delayable actions (which should be executed during their enable time interval) by d, urgent actions (should be executed as soon as they are enabled) are denoted by u. The predicate urg[g] that characterizes the valuations of clocks for which the guard 𝑔 = [𝑐𝜏] is urgent is defined by: 𝑢𝑟𝑔[𝑔](𝑣) ⇔ ⇔ { false if 𝑔 is non urgent (τ = n) 𝑐(𝑣) ∧ ¬𝑐(𝑣 + 𝐾) if 𝑔 is delayable (τ = d) 𝑐(𝑣) if 𝑔 is urgent (τ = u) The set of guards over the set of clocks X is denoted as G(X). For given guards 𝑔1 = [𝑐1] 𝜏1 𝑎𝑛𝑑 𝑔2 = [𝑐2] 𝜏2, the conjunction of g1 and g2 is denoted by 𝑔1 ∧ 𝑔2 and is defined by 𝑔1 ∧ 𝑔2 = = [𝑐1 ∧ 𝑐2] 𝑚𝑎𝑥 𝜏1,𝜏2, considering that urgency types are ordered as follows: 𝑛 < 𝑑 < 𝑢. So, for given guard g=[c] τ and a valuation v, we also write g(v) for the expression c(v). Additionally, the resource constraints are defined. Additionally to model in [8] let us de- fine the resource set P = { pi t }, where t is the resource type, I is the resource index for some number of identical resources in computation- al system. Also the resource pi t may be allo- cated in part – pi t (k), where k is the amount of resource pi t . Let us define availability function for resource: avl(pi t (k)), is it true is case if the re- source is available. 3. Abstract model with constraints Definition 1. Abstract model with constrains. An abstract model is a timed automa- ton M=(A,Q,X,P,→) such that: - A is a finite set of (observable) actions. In addition to actions A internal ac- tions β. The set of actions A  {β} is denoted as A β ; - Q is a finite set of control loca- tions; - P is the resource set; - X is finite set of clocks → Q×(A β ×G(X)×2 X )×Q is a finite set of la- beled transitions. A transition is a tuple (𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑞′) where a is an action executed by the transition, g is guard over X, r is a sub- set of clocks that are reset by the transition and p is the resource necessary to be allocated during the transaction. We write 𝑞 𝑎,𝑔,𝑟,𝑝 → 𝑞′ for (𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑞′) ∈→. An abstract model describes the plat- form-independent behavior of the system. Definition 2. (Abstract model semantics). An abstract model 𝑀 = = (𝑨,𝑸, 𝑿, 𝑷 →) defines a transition system TS. States of TS are pairs (q,v), where q is a control location of M and v is a valuation of the clocks X. Actions: We have (𝑞, 𝑣) 𝑎 → 𝑎 → (𝑞′, 𝑣[𝑟 ⟼ 0])𝑖𝑓 𝑞 𝑎,𝑔,𝑟,𝑝 → 𝑎,𝑔,𝑟,𝑝 → 𝑞′𝑖𝑛 𝑀 𝑎𝑛𝑑 𝑔(𝑣)𝑖𝑠 𝑡𝑟𝑢𝑒 and avl(p) is true. Time steps. For a waiting time 𝛿 ∈ ℕ, 𝛿 > 0, we have (𝑞, 𝑣) 𝛿 →(𝑞, 𝑣 + 𝛿) if for all transitions (𝑞 𝑎,𝑔,𝑟 → 𝑞′𝑜𝑓 𝑀 𝑓𝑜𝑟 𝑓𝑜𝑟 𝑎𝑙𝑙 𝛿′ ∈ ∈ [0, 𝛿[, ¬𝑢𝑟𝑔[𝑔](𝑣 + 𝛿′). Urgency corresponds to priorities in- duced by the timing constraints: urgent transi- tions have priority compared to other possible transitions. We denote by wait(q,v) the maxi- mal waiting time allowed at (q,v). Always wait(q,v+δ) = wait(q,v) – δ for all 𝛿 ∈ Інструментальні засоби та середовища програмування 6 ∈ [0, 𝑤𝑎𝑖𝑡(𝑞, 𝑣)], and is formally defined as follows: 𝑤𝑎𝑖𝑡(𝑞, 𝑣) = = 𝒎𝒊𝒏({𝛿 ≥ 0\|⋁ 𝑢𝑟𝑔[𝑔𝑖](𝑣 + 𝛿) 𝑞 𝑎𝑖,𝑔𝑖,𝑟𝑖,𝑝𝑖 → 𝑞𝑖 } ∪ ∪ {+∞}). For an abstract model M = =(A,Q,{x},P,→), a finite execution sequence of M from an initial state (q0,v0) is a maximal sequence of observable actions and time-steps (𝑞𝑖, 𝑣𝑖) ↝ 𝜎𝑖 (𝑞𝑖+1, 𝑣𝑖+1), 𝜎𝑖 ∈ 𝑨 ∪ 𝑁 and 𝑖 ∈ {0, 1, 2, … , 𝑛} (𝑖 ∈ ℕ), such that ↝ is the transitive closure of → for β-transitions, that is (𝑞𝑖, 𝑣𝑖) ↝ 𝜎𝑖 (𝑞𝑖+1, 𝑣𝑖+1) if (𝑞𝑖, 𝑣𝑖) 𝛽 → ∗ 𝛽 → ∗ (𝑞′ 𝑖 , 𝑣′𝑖) 𝜎𝑖 →(𝑞′′𝑖, 𝑣 ′′ 𝑖) 𝛽 → ∗ (𝑞𝑖+1, 𝑣𝑖+1). For example of this model you can re- fer to Model 1 in [8]. Definition 3. (Composition of abstract models). Let Mi = (Ai,Qi,Xi,Pi,→i), 1 ≤ 𝑖 ≤ ≤ 𝑛, be a set of abstract models. We assume that their sets of action and clocks are dis- joint, i.e. for all 𝑖 ≠ 𝑗we have Ai ∩Aj=∅ and Xi ∩Xj=∅. A set of interactions γ is a subset of 2 A , where 𝐴 = ⋃ 𝐴𝑖 𝑛 𝑖=1 , such that any in- teraction a ∈γ contains at most one action of each component Mi, that is, 𝑎 = {𝑎𝑖\|𝑖 ∈ 𝐼} where 𝑎𝑖∈𝐴𝑖and 𝐼 ⊆ {1,2, … , 𝑛}. The compo- sition of the abstract models Mi, 1 ≤ 𝑖 ≤ 𝑛, by using a set of interactions γ, denoted by γ(M1,….,Mn), is the composite abstract model M=(γ,Q,X,P,→) such that Q=Q1× ×Q2×… Qn, 𝑋 = ⋃ 𝑋𝑖 𝑛 𝑖=1 and →γ is defined by the rules: 𝑎 = {𝑎𝑖}𝑖∈𝐼 ∈ 𝛾 𝑔 = ⋀ 𝑔𝑖𝑖∈𝐼 𝑟 =⋃𝑟𝑖 𝑖∈𝐼 ∀𝑖 ∈ 𝐼, 𝑞𝑖 𝑎𝑖,𝑔𝑖,𝑟𝑖,𝑝𝑖 → 𝑞𝑖 ′ ∀𝑖 ∉ 𝐼. 𝑞𝑖 ′ = 𝑞𝑖 (𝑞1, … , 𝑞𝑛) (𝑎,𝑔,𝑟,𝑝)𝛾 → (𝑞′ 1 , … , 𝑞′ 𝑛 ) аlso ∃𝑖 ∈ {1,… , 𝑛}. 𝑞𝑖 𝛽,𝑔𝑖,𝑟𝑖,𝑝𝑖 → 𝑖 𝑞 ′ 𝑖 ∀𝑖 ≠ 𝑗. 𝑞′ 𝑗 = 𝑞𝑗 (𝑞1, … , 𝑞𝑛) 𝛽,𝑔𝑖,𝑟𝑖,𝑝𝑖 → 𝛾 (𝑞 ′ 1 , … , 𝑞′ 𝑛 ). A composition M=γ(M1,…,Mn) of ab- stract models Mi, 1 ≤ 𝑖 ≤ 𝑛, can execute two types of transitions: interactions 𝑎 = {𝑎𝑖}𝑖∈𝐼 ∈ 𝛾 which corresponds to synchronizations of actions ai of models Mi, 𝑖 ∈ 𝐼, and internal ac- tions β of the modeles Mi. An interaction 𝑎 = {𝑎𝑖}𝑖∈𝐼 ∈ 𝛾is enabled from a state of M if all actions ai are enabled. In a composite model M=γ(M1,…,Mn) many interaction can be enabled to act simul- taneously (in the same time) introducing a de- gree of non-determinism in the behavior of M. In order to restrict non-determinism, priorities are introduced that specify which interaction should be executed among the en- abled ones. A priority on M=γ(M1,…,Mn) is a relation 𝜋 ⊆ 𝛾 × 𝑄 × 𝛾 such that for all q the relation 𝜋𝑞 = {(𝑎, 𝑎 ′)\|(𝑎, 𝑞, 𝑎′) ∈ 𝜋} is a par- tial order. We write aπqa' for (𝑎, 𝑞, 𝑎′) ∈ 𝜋 to express the fact that a has weaker priority than a' at state q. That is if both a and a' are enabled at state q, only the action a' can be executed. Thus, priority aπqa' is applied only when the conjunction of the guards and re- sources of a and a' is true. Let 𝑞 𝑎,𝑔,𝑟,𝑝 → 𝛾 𝑞′ and 𝑞 𝑎′,𝑔′,𝑟′,𝑝′ → 𝛾 𝑞′′ be transitions of M such that g=[c] τ and g'=[c'] τ' . Applying priority aπqa' boils down to transforming the guard g of a into the guard gπ=[c˄¬𝑐′]τ and leaving the guard g' of a' unchanged. Furthermore we denote by enq(a) the predicate characterizing valuations of clocks for which an interaction a is enabled at state q. It is defined by: 𝑒𝑛𝑞(𝑎) = { 𝑓𝑎𝑙𝑠𝑒, 𝑖𝑓 ∄(𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑔′) ∈→𝛾 ⋁ 𝑐 (𝑞,𝑎,[𝑐]𝜏,𝑟,𝑝,𝑞′∈→𝛾 − 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒. Definition 4. Priority. Given a compo- site model M=(γ,Q,X,P,→γ) the application of priority π to M defines a new model πM=(γ,Q,X,P,→π) such that →π is defined by the rule: 𝑞 𝑎,𝑔,𝑟,𝑝 → 𝛾 𝑞 ′, 𝑔 = [𝑐]𝜏, Інструментальні засоби та середовища програмування 7 𝑔𝜋 = [𝑐˄¬ ⋁ 𝑒𝑛𝑞(𝑎 ′) 𝑎𝜋𝑞𝑎′ ] 𝜏 𝑞 𝑎,𝑔𝜋,𝑟,𝑝 → 𝜋 𝑞′. Example of an abstract model with priorities is considered in [8]. Abstract models are platform- independent representations of programs with atomic and instantaneous actions execution. Real (“physical”) models represent the pro- gram behavior on a real platform. It accounts the fact that the action execution takes some non-zero time. So we need to break action ex- ecution atomicity and introduce execution times. The transition of an action a of an ab- stract model is replaced by a sequence of two consecutive transitions of the corresponding physical (real world) model – see figure 1. The first transition marks the beginning of the execution of action a, and the second transi- tion marks its completion. These transitions are separated by a partial state denoted by ⊥. The execution time of the action corresponds to the waiting time at state ⊥ 𝑞 𝑎,𝑔,𝑟,𝑝 → 𝑞′ ⊥𝑡 → 𝐪 𝑎,𝑔,𝑟,𝑝 → ⊥𝑡 𝛽 →𝑞′ (Corresponding sequence of transitions in 𝑀⊥), where 𝑡 = (𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑞′)𝑖𝑛 𝑀. This denoted the transformation of transitions of the abstract model. Definiton 5. Physical model. Let M=(A,Q,X,P,→) be an abstract model. We define the associated as the timed automaton 𝑀⊥=(A,𝐐 ∪ 𝐐⊥,X,P,→⊥) such that: 𝐐⊥is the set of partial states such that there is one partial state for each transition of M, that is, 𝐐⊥={⊥𝑡 \|𝑡 ∈→} →⊥is defined by the rule: 𝑞 𝑎,𝑔,𝑟,𝑝 → 𝑞′ 𝑡=(𝑞,𝑎,𝑔,𝑟,𝑝,𝑞′) 𝑞 𝑎,𝑔,𝑟,𝑝 → ⊥⊥𝑡 ⊥𝑡 𝛽,[𝑡𝑟𝑢𝑒]𝐼,∅,𝑡𝑟𝑢𝑒 → ⊥𝑞′ . In the physical model 𝑀⊥we assume arbitrary execution times for actions, ranging from 0 to +∞, which is modeled by the guard [true] \| for β-transitions. Notice that 𝑀⊥can be further constrained if bounds of the execution times of actions are unknown. For instance, if an estimate WCET(a) is known for the worst- case execution timeof an action a, the associ- ated timing constraint is [xa≤WCET(a)] d in- stead of [true] \| , where xa is a clock that is reset whenever a is started. This allows us to stati- cally check the correctness of the application running on the platform but this is beyond the paper scope. In a physical model 𝑀⊥, the execution of the action a by a transition t=(q,a,g,r,p,q') is followed by a lapse of time δ(a)∈ ℕ at the partial state ⊥𝑡before a β-transition is execut- ed: (𝑞, 𝑣) ↝𝑎 (⊥𝑡, 𝑣[𝑟 ⟼ 0]) ↝𝛿(𝑎) ↝𝛿(𝑎) (𝑞′, 𝑣[𝑟 ⟼ 0] + 𝛿(𝑎)). (1) This corresponds to the following exe- cution sequence in the abstract model M, if such a sequence is feasible: (𝑞, 𝑣) ↝𝑎 (𝑞′, 𝑣[𝑟 ⟼ 0]) ↝𝛿(𝑎) ↝𝛿(𝑎) (𝑞′, 𝑣[𝑟 ⟼ 0] + 𝛿(𝑎)). (2) It should be noticed that the time stamp δ(a) of 𝑀⊥ in (1) may not be the time stamp of M in (2) if δ(a)>wait(q',v[𝑟 ⟼ 0]), meaning that the physical model violates tim- ing constraints defined in the corresponding abstract model. In this case we say that the considered execution sequence is not time- safe. (The execution times of abstract and physical models are compared in [9] – considering that if all execution sequences of 𝑀⊥ are time-safe than 𝑀⊥ is weakly simulated by M). Correct model implementation should execute only time-safe sequences, but time- safety violations occur in a physical model when the execution time of an action is larger than what is allowed by the timing and re- source constraints of the corresponding ab- stract model. Correct implementations are ob- tained for platforms that are sufficiently fast for executing the program without violating time-safety. Here the physical model pre- serves the semantic of the abstract model as shown in [10]. Otherwise the time-safety vio- lations should be checked in run time. Physical model composition consider- ations and correctness considerations can be checked in [8]. Інструментальні засоби та середовища програмування 8 So how we can deal with resource lim- itations? State-of-the-art software packages require simpler solutions, so model interpreta- tion in run-time (such as in [11]) looks com- plex and superfluous. In case if program is al- ready expressed as a graph of scheduled blocks, it is possible to evaluate model re- quirements and behavior at the moments be- tween the previous block finish and a the new block start – this also works for moments of spawning new parallel processes and joining parallel processes for one new serial process. This works well for cases enlisted before: LAPACK-based computations [12], neural network inference and media processing in Gstreamer-like pipelines. Even it is possible for interpreted code, such as Java or .Net, where code assemblies can be annotated with resource information. 4. Affected industry cases Why all these model descriptions are important for us although all they looks to be abstract for real software business? Before 2005 the parallel programming was the area of academy pundits and small groups of professionals in computer graphics area. After 2005 the integrated circuits mak- ing technology enabled so many transistors on single die for hardware engineers so that the single computing units (up to von Neumann definitions) can not effectively use the hard- ware. The most efficient way was to place several processing units on one silicone crys- tal, so that the silicone can run multiple pro- cesses simultaneously somehow. It looked like “as single CPU can not deal effectively, let’s do multiple CPUs and the programmer should do software in right way”. At practical side a lot of money was invested into teaching parallel programming in colleges and making specialized computer languages for highly parallel hardware – such as Nvidia CUDA [13] or anti-Nvidia industry standard OpenCL [14]. As the efficient parallel software re- quires more time to invest, more skilled re- sources and more money for test teams work the new tendency was appeared – to make frameworks which allow the programmer to make parallel programs with simpler (less or more simpler) descriptions. Such concept looks nice but works hard. Two well-known technology examples – CUDA and OpenCL shows that the main model of writing a paral- lel program for GPUs is a “producer- consumer” model when a CPU-side program controls GPU threads execution and none of GPU threads are self-sufficient. Practically all memory management employs CPU-GPU memory communications, and memory com- munications between neighbor GPU cards are exceptions. Under these conditions any software developer experiences a serious technology limitation, as any parallel program support only its exclusive execution on GPU resource and the resource pool subdivision between different processes are possible only in case if each parallel program allocates some number of GPUs, leaving the other GPUs to counter- part. However, all technologies similar to CUDA/OpenCL or any specialized parallel programming languages do not decrease the development time significantly. The good ways to decrease programmer effort are infra- structures, which shorten development time for 90 % in simpler cases and for 50 % in hard cases. The important issue is that the most of the infrastructures are upgradable to incorporate the described physical model. Let us check several infrastructure (frame- work) cases. 1) TensorFlow [3]. At least 60 % of neural network learning and inference market. Tensorflow interprets network model (using Python language) based on a sequence of big code blocks. 2) Caffee[1]. Near 30 % of neural networking segment, written in optimized C++. Caffee (conceptually similar to Tensor- flow) interprets an abstract sequence of blocks. 3) Gstreamer [5] and ffmpeg [15] media processing pipelines. Both construct a pipeline using already defined big code blocks (plugins), including spawning parallel processes and internal queue storage. Pipe- lines also works for GPU-based computa- tions. Both packages are widely used as a base for industry media processing and broadcasting. 4) OpenCV framework [4]. It is a good base for parallel matrix computations Інструментальні засоби та середовища програмування 9 (and 2D/3D processing) at GPU. Still the OpenCV is more low-level library tool than a framework, the packages can benefit from model interpretation. 5) Simulation tools, starting from the old good NS2 (good use example in [16]) and other network simulations. In case of big runs cluster-based modeling benefit from model interpretations. The proposed abstract and physical models are good universal tools for many frameworks. It can deal with both low level descriptions of CPU-handled subroutines (at 50-100 CPU instructions level) and high- level annotations for framework elements: all timing and resource constraints remain the same. The next step will be more practical: incorporation of the model interpreter into one of the frameworks and practical test of benefits got because of expressing the soft- ware in term of physical model. Conclusions In order to meet the modern require- ments for software development – less time, more quality, lower expenses – this article proposes the resource constrained model for parallel programs, which allows to run (or model the behavior) of multiple (and differ- ent) parallel software runs under resource constraints on real-world hardware systems. In addition, the list of popular frameworks - which can benefit from incorporating the el- ements of the resource-constrained models in- terpreter – is presented. The future work in- cludes the extension of one of framework with model interpreter for low-overhead re- source checker on the fly (at program run time) and real-world model examples. References 1. Yangqing J., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T. (2014) Caffe: Convolutional Ar- chitecture for Fase Feature Embedding. ArXiv preprint: arXiv:1408.5039 2. Redmon J. (2013) [Online]. Darknet: Open Source Neural Networks in C. – Available from https://pjreddie.com/darknet/ 3. Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., Devin M., Ghemawat S., Irving G., Isard M. & others (2016). TensorFlow: A System for Large-Scale Machine Learning. OSDI. P. 265–283. 4. Bradsky G., Kaehler A. Learning OpenCV — O’Reilly, 2008. P. 1. 5. Taymans W., Baker S., Wingo A. (2018) GStreamer Application Development 1.10.1. P. 164. 12 th Media Services. 6. DeepStream [Online] – Nvidia DeepStream Software Development Kit – Available at https://developer.nvidia.com/deepstream-sdk 7. Peter Hui and Satish Chikkagoudar. (2012) A Formal Model for Real-time Parallel compu- tation. In Proc of FTSCS-2012. P. 39–53. 8. Ahlem Triki, Jacques Combaz. (2013) Model- Based implementation of Parallel Real- Time Systems. Verimag Research Report TR- 2013-11 9. Wilhelm R., Altmeyer S., Burguiere C., Grund D., Herter J., Reineke J., Wachter B., Wilhelm S. Static timing analysis for hard re- al-time systems. In Barthe G. and Herme- negildo M.V., eds., WMCAI. 2010. Vol. 5944 of LNCS. P. 3–22. Springer. 10. Abdellatif T., Combaz J., Sifakis J. Model- based implementation of real-time applica- tions. In Carloni L.P. and Stavros Tripakis, eds. EMSOFT. 2010. P. 229–238. 11. Basu A., Bogza M., Sifakis J. Modeling heterogeneous real-time components in BIP. In SEFM. 2006. P. 3–12. IEEE Computer Society. 12. Baboulin M., Demmel J., Dongarra J., Tomov S., and Volkov V. Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project) , Austin, TX, The In- ternational Conference for High Performance Computing, Networking, Storage, and Analy- sis (SC08), Nov. 2008. 13. CUDA [Online] – Available at https://developer.nvidia.com/cuda-zone 14. OpenCL [Online] – Available at Khronos Group: https://www.khronos.org/opencl/ 15. Xu Y.G. and Cao S.X. Real-Time Video Acquisition and Frame Compression Processing Technology Based on FFmpeg, Applied Mechanics and Materials. 2014. Vols. 631–632. P. 494–497. 16. Michael Welzl. Adaptive Multimedia Communication over Satellite Routed IP". In ICC 2000 (International Conference on Communications – IEEE Communications Society), New Orleans, Louisiana, USA, 18–22 June 2000. Інструментальні засоби та середовища програмування 10 Література 1. Yangqing J., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R., Guadarrama S., Darrell T. (2014) Caffe: Convolutional Ar- chitecture for Fase Feature Embedding. ArXiv preprint: arXiv:1408.5039 2. Redmon J. (2013) [Online]. Darknet: Open Source Neural Networks in C. – Available from https://pjreddie.com/darknet/ 3. Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., Devin M., Ghemawat S., Irving G., Isard M. & others (2016). TensorFlow: A System for Large-Scale Machine Learning. OSDI. P. 265–283. 4. Bradsky G., Kaehler A. Learning OpenCV — O’Reilly, 2008. P. 1. 5. Taymans W., Baker S., Wingo A. (2018) GStreamer Application Development 1.10.1. P. 164. 12 th Media Services. 6. DeepStream [Online] – Nvidia DeepStream Software Development Kit – Available at https://developer.nvidia.com/deepstream-sdk 7. Peter Hui and Satish Chikkagoudar. (2012) A Formal Model for Real-time Parallel compu- tation. In Proc of FTSCS-2012. P. 39–53. 8. Ahlem Triki, Jacques Combaz. (2013) Model- Based implementation of Parallel Real- Time Systems. Verimag Research Report TR- 2013-11 9. Wilhelm R., Altmeyer S., Burguiere C., Grund D., Herter J., Reineke J., Wachter B., Wilhelm S. Static timing analysis for hard re- al-time systems. In Barthe G. and Herme- negildo M.V., eds., WMCAI. 2010. Vol. 5944 of LNCS. P. 3–22. Springer. 10. Abdellatif T., Combaz J., Sifakis J. Model- based implementation of real-time applica- tions. In Carloni L.P. and Stavros Tripakis, eds. EMSOFT. 2010. P. 229–238. 11. Basu A., Bogza M., Sifakis J. Modeling heterogeneous real-time components in BIP. In SEFM. 2006. P. 3–12. IEEE Computer Society. 12. Baboulin M., Demmel J., Dongarra J., Tomov S., and Volkov V. Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project) , Austin, TX, The In- ternational Conference for High Performance Computing, Networking, Storage, and Analy- sis (SC08), Nov. 2008. 13. CUDA [Online] – Available at https://developer.nvidia.com/cuda-zone 14. OpenCL [Online] – Available at Khronos Group: https://www.khronos.org/opencl/ 15. Xu Y.G. and Cao S.X. Real-Time Video Acquisition and Frame Compression Processing Technology Based on FFmpeg, Applied Mechanics and Materials. 2014. Vols. 631–632. P. 494–497. 16. Michael Welzl. Adaptive Multimedia Communication over Satellite Routed IP". In ICC 2000 (International Conference on Communications – IEEE Communications Society), New Orleans, Louisiana, USA, 18–22 June 2000. Received 01.10.2019 About the author: Dmytro V. Rahozin, candidate of tech. sciences (PhD) More than 10 publication in Ukrainian and foreign journals. https://orcid.org/0000-0002-8445-9921 Affiliation: Institute of Software Systems, NAS of Ukraine 03187, Kyiv-187, Acad. Hlushkov avenue, 40. Tel.: +38 068 575 91 25. E-mail: dmytro.rahozin@gmail.com

A resource limited parallel program model

Репозитарії

Схожі ресурси