A resource limited parallel program model
Modern parallel programs run in a complex, resource-limited environment, and this raises the new requirements for resource consumption and execution stability of long running processes. In order to help with checking resource constraints for such parallel software a resource-limited parallel program...
Збережено в:
Дата: | 2019 |
---|---|
Автор: | |
Формат: | Стаття |
Мова: | English |
Опубліковано: |
Інститут програмних систем НАН України
2019
|
Теми: | |
Онлайн доступ: | https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376 |
Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
Назва журналу: | Problems in programming |
Завантажити файл: |
Репозитарії
Problems in programmingid |
pp_isofts_kiev_ua-article-376 |
---|---|
record_format |
ojs |
resource_txt_mv |
ppisoftskievua/38/1af0509c5c44ef2ca4731d35343fb238.pdf |
spelling |
pp_isofts_kiev_ua-article-3762024-04-28T11:07:27Z A resource limited parallel program model Модель параллельных программ с ограничением ресурсов Модель паралельних програм з обмеженням ресурсів Rahozin, D.V. formal model; parallel computing; parallel computing on graphics processing units UDC 681.3 формальная модель; параллельные вычисления; параллельные вычисления на графических процессорах УДК 681.3 формальна модель; паралельні обчислення; паралельні обчислення на графічних процесорах УДК 681.3 Modern parallel programs run in a complex, resource-limited environment, and this raises the new requirements for resource consumption and execution stability of long running processes. In order to help with checking resource constraints for such parallel software a resource-limited parallel program formal model was developed. The model expresses the resource and time constraints and is suitable both for fine grained and coarse-grained parallelism in programs. For higher degrees of parallelism (at independent procedure level, bigger loop iterations, large computing blocks for graphics, video and neural network processing) the interpretation of formal model can be done in run-time and avoid dead locks and hangs during resource allocation. We are discussing several modern software frameworks that are able to integrate the functionality to interpret the model and check the feasibility of the set of parallel programs running on hardware simultaneously with resource and time limitations. Real world tasks – neural network inference, video processing, general purpose computing on GPU – which get benefits after enabling such models - are discussed.Problems in programming 2019; 4: 03-10 Современные параллельные программы исполняются в сложных и ресурсно-ограниченных средах, поэтому к ним предъявляются повышенные требования к потреблению ресурсов и стабильности выполнения, особенно для долгих по времени вычислений. Для помощи в определении ресурсных ограничений для параллельных программ разработана формальная модель параллельных программ с ресурсными ограничениями. Модель определяет временные и ресурсные ограничения и может быть использована для как мелкозернистого, так и для крупнозернистого параллелизма. Для высоких степеней параллелизма (независимые процедуры, большие итерации циклов, большие вычислительные блоки для графических, нейросетевых вычислений и обработки видео) интерпретация модели возможна во время исполнения программы для обеспечения невозможности блокировок при распределении ресурсов. Рассмотрены современные фреймворки, для которых возможна интеграция в свой состав базовых средств проверки соответствия множества параллельных программ наличным вычислительным ресурсам при исполнении программы. Рассмотрены задачи реального мира – вывод в нейросетях, обработки видео, вычисления на графических картах, для которых будет выгодно применение таких моделей.Problems in programming 2019; 4: 03-10 Сучасні паралельні програми виконуються в складних та ресурсно-обмежених середовищах, тому до них застосовуються підвищені вимоги щодо споживання ресурсів та стабільності виконання, особливо для довгих у часі обчислень. Для допомоги у визначенні ресурсних обмежень для паралельних програм розроблено формальну модель паралельних програм з ресурсними обмеженнями. Модель визначає часові та ресурсні обмеження і може бути використана як для дрібнозернистого, так і для крупнозернистого паралелізму. Для високих ступенів паралелізму (незалежні процедури, великі ітерації циклів, великі обчислювальні блоки для графічних, нейромережевих обчислень та обробки відео) інтерпретація моделі можлива під час виконання програми для унеможливлення блокувань при розподілі ресурсів. Розглянуто сучасні фреймворки, які можуть інтегрувати у свій склад базові засоби перевірки відповідності множини паралельних програм наявним обчислювальним ресурсам під час виконання програм. Розглянуто задачі реального світу – вивід у нейромережах, обробка відео, обчислення на графічних картах, які матимуть зиск з впровадження таких моделей.Problems in programming 2019; 4: 03-10 Інститут програмних систем НАН України 2019-12-05 Article Article application/pdf https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376 10.15407/pp2019.04.003 PROBLEMS IN PROGRAMMING; No 4 (2019); 03-10 ПРОБЛЕМЫ ПРОГРАММИРОВАНИЯ; No 4 (2019); 03-10 ПРОБЛЕМИ ПРОГРАМУВАННЯ; No 4 (2019); 03-10 1727-4907 10.15407/pp2019.04 en https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376/379 Copyright (c) 2019 PROBLEMS IN PROGRAMMING |
institution |
Problems in programming |
baseUrl_str |
https://pp.isofts.kiev.ua/index.php/ojs1/oai |
datestamp_date |
2024-04-28T11:07:27Z |
collection |
OJS |
language |
English |
topic |
formal model parallel computing parallel computing on graphics processing units UDC 681.3 |
spellingShingle |
formal model parallel computing parallel computing on graphics processing units UDC 681.3 Rahozin, D.V. A resource limited parallel program model |
topic_facet |
formal model parallel computing parallel computing on graphics processing units UDC 681.3 формальная модель параллельные вычисления параллельные вычисления на графических процессорах УДК 681.3 формальна модель паралельні обчислення паралельні обчислення на графічних процесорах УДК 681.3 |
format |
Article |
author |
Rahozin, D.V. |
author_facet |
Rahozin, D.V. |
author_sort |
Rahozin, D.V. |
title |
A resource limited parallel program model |
title_short |
A resource limited parallel program model |
title_full |
A resource limited parallel program model |
title_fullStr |
A resource limited parallel program model |
title_full_unstemmed |
A resource limited parallel program model |
title_sort |
resource limited parallel program model |
title_alt |
Модель параллельных программ с ограничением ресурсов Модель паралельних програм з обмеженням ресурсів |
description |
Modern parallel programs run in a complex, resource-limited environment, and this raises the new requirements for resource consumption and execution stability of long running processes. In order to help with checking resource constraints for such parallel software a resource-limited parallel program formal model was developed. The model expresses the resource and time constraints and is suitable both for fine grained and coarse-grained parallelism in programs. For higher degrees of parallelism (at independent procedure level, bigger loop iterations, large computing blocks for graphics, video and neural network processing) the interpretation of formal model can be done in run-time and avoid dead locks and hangs during resource allocation. We are discussing several modern software frameworks that are able to integrate the functionality to interpret the model and check the feasibility of the set of parallel programs running on hardware simultaneously with resource and time limitations. Real world tasks – neural network inference, video processing, general purpose computing on GPU – which get benefits after enabling such models - are discussed.Problems in programming 2019; 4: 03-10 |
publisher |
Інститут програмних систем НАН України |
publishDate |
2019 |
url |
https://pp.isofts.kiev.ua/index.php/ojs1/article/view/376 |
work_keys_str_mv |
AT rahozindv aresourcelimitedparallelprogrammodel AT rahozindv modelʹparallelʹnyhprogrammsograničeniemresursov AT rahozindv modelʹparalelʹnihprogramzobmežennâmresursív AT rahozindv resourcelimitedparallelprogrammodel |
first_indexed |
2024-09-16T04:07:56Z |
last_indexed |
2024-09-16T04:07:56Z |
_version_ |
1818568285358129152 |
fulltext |
Інструментальні засоби та середовища програмування
© Dmytro V. Rahozin, 2019
ISSN 1727-4907. Проблеми програмування. 2019. № 4 3
UDC 681.3 https://doi.org/10.15407/pp2019.04.003
Dmytro V. Rahozin
A RESOURCE LIMITED PARALLEL PROGRAM MODEL
Modern parallel programs run in a complex, resource-limited environment, and this raises the new require-
ments for resource consumption and execution stability of long running processes. In order to help with check-
ing resource constraints for such parallel software a resource-limited parallel program formal model was de-
veloped. The model expresses the resource and time constraints and is suitable both for fine grained and
coarse-grained parallelism in programs. For higher degrees of parallelism (at independent procedure level, big-
ger loop iterations, large computing blocks for graphics, video and neural network processing) the interpreta-
tion of formal model can be done in run-time and avoid dead locks and hangs during resource allocation. We
are discussing several modern software frameworks that are able to integrate the functionality to interpret the
model and check the feasibility of the set of parallel programs running on hardware simultaneously with re-
source and time limitations. Real world tasks – neural network inference, video processing, general purpose
computing on GPU – which get benefits after enabling such models - are discussed.
Key words: formal model, parallel computing, parallel computing on graphics processing units.
Introduction
Nowadays parallel programming still
is an activity, which cannot be planned well in
terms of spent time and parallelization quality
(i. e. writing a good code, which fits underly-
ing architecture well in limited time and mon-
ey). This can be easily explained by increas-
ing complexity of underlying computing ar-
chitecture of the modern hardware. The paral-
lel architectures of 80ies had the minimal
number of cache levels, simpler memory hier-
archy and quite simple MESI cache protocols,
but even in this case the performant imple-
mentation of basic BLAS/LAPACK proce-
dures was a tricky thing. Today the state-of-
art parallel computing is based on Graphics
Processing Units (GPUs) which has quite
complex memory hierarchy, complicated da-
ta exchange paths between central processors
and GPUs and highly parallel SIMD-type
computational units. The sophisticated tuning
of GPU-based programs is mostly economi-
cally ineffective due to high labor cost and
longer time-to-market. This involves the use
of programming tools, which enable more
high-level programming structures than just
basic CUDA or OpenCL code.
The good examples of such tool are
the neural network descriptions tools. Starting
from Caffe tool [1], the inference and deep
learning phases in neural networks are based
on a high-level description of the network.
The network therefore is defined as a pipeline
of standard computing blocks (usually more
than 50 types of blocks are defined) and data
paths between them. This definition format
practically enables the development of a neu-
ral network from idea to optimized implemen-
tation without moving down to hand pro-
gramming the neural network behavior. The
success of Caffe lead other development
groups to implement alternate or competitive
neural network definition languages (Darknet
[2]) or integrate a set of packages into pro-
gramming languages and frameworks (Ten-
sorFlow [3]), sometimes incompatible. Of
course, the use of high level definitions is
not limited to neural network structure de-
scriptions.
1. Practical expressions of high-level
programming structures
Usually the simplified description of
a big computational pipeline lacks ability for
semi-precise optimization of resource use, as
there is no way to estimate possible changes
in computation pipeline which can use GPU
resources more effectively or estimate how
efficiently the neural network pipeline will
run on GPU simultaneously with other appli-
cations. The optimizations of such compu-
ting pipeline are hard and strictly depend on
underlying hardware architecture, which is a
trade secret for the main hardware market
players.
Other example is OpenCV framework
[4] extensions for matrix/image operations on
https://doi.org/10.15407/pp2019.04.003
Інструментальні засоби та середовища програмування
4
GPUs. Program can simply move matrices or
image operations to GPU without critical
changes in code, but the GPU memory man-
agement is done by hand by moving the im-
age data between central processor unit
(CPU) and GPU memory. Such a helper
(here helper mean the OpenCV functions set)
does not deal with actual computational re-
source allocation on GPU, so the user cannot
reach the best performance on GPU side
without additional and expensive platform-
dependent optimizations.
An interesting example is Gstreamer
[5] media processing pipeline. Started as an
eager media processing pipeline more than
30 years ago, now it is enriched with data
processing on GPU side. A good example is
Gstreamer-based DeepStream framework
from Nvidia [6], which provides optimized
neural network inference at GPU side. For
this case the data processing pipeline may
reach more than a hundred components,
which are controlled by a dozen of threads.
Despite of many efforts applied from Nvidia
side, basic Gstreamer functionality still uses
only rudimentary and hidden thread control
and does not bother with resource allocation
(either GPU memory or GPU computational
power).
Looking through these three cases the
can lighten the following flaws: 1) too high-
level model for neural network description,
where resource allocation (computational,
memory, communication) is out of scope
of the model; 2) too low-level model where
the user should deal with resources by hand
but not by describing them in model descrip-
tion; 3) even ignorance of resource alloca-
tion. All these flaws greatly affect the paral-
lel software system which goes from proof
of concept stage to customer environment
(i. e. productization process) experiencing
all computational, memory and communi-
cation resource constraints. Due to modern
shift of heavy computations to GPU side,
the productization process becomes a head-
ache for engineers, full of bug hunting and
code tricks to resolve various resource con-
straints. The problem appears more complex
in case if applications are long-running, so
the memory and other resource leaks can
be fatal.
In order to overcome the difficulties of
constraint management process we propose a
model, which allows to introduce resource
management process to frameworks such as
OpenCV, TensorFlow or Gstreamer. It should
be noted that these frameworks are different
in levels of parallelism expression. OpenCV
provides only basic parallelism, Gstreamer –
thread-level and Caffe/Tensorflow – coarse-
grain. Still we model will fit to all parallelism
types.
There are a lot of previous work intro-
ducing various concepts for parallel software
models, starting from Timed Finite Automa-
tons [7]. This article introduces resource con-
straints for parallel software models, which
are necessary for executing parallel program
in modern highly parallel and resource con-
strained environment.
2. A Model of Resource-Constrained
Parallel System
The definition of the system is de-
rived from [8], extending the previous defini-
tions. The word “real-time” is not used inten-
tionally, as the complex parallel system may
have both hardware and resource constraints
that prevent real-time behavior.
We consider a discrete-time model
where the time is represented as a set of non-
negative integer values denoted by ℕ. The
time progress is measured by clocks, which
are non-negative integer variables increased
constantly by one for some time. If com-
pared to [8] the definition “synchronously” is
dropped, as in real life system usually a
common reference clock is used. For the set
of clocks X, a valuation 𝑣: 𝑿 → ℕ is defined
– it is a function associating with each clock
its value v(x). For a clocks subset 𝑿′ ⊆ 𝑿 and
a clock value 𝑙 ∈ ℕ we denote by 𝑣( 𝑿′, 𝑙, 𝑥)
the valuation that coincides with v for all
clocks 𝑥 ∈ 𝑿\𝑿′, and that associates l to all
clocks 𝑥 ∈ 𝑿′. It is defined by:
𝑣(𝑿′, 𝑙, 𝑥) = {
𝑙 if 𝑥 ∈ 𝑿′
𝑣(𝑥) otherwise.
Guards are used to specify when ac-
tions are enabled. Simple constrains over
Інструментальні засоби та середовища програмування
5
clocks X are considered. The grammar allows
to build general constrains over clocks:
𝑐 ≔ 𝑡𝑟𝑢𝑒 | 𝑓𝑎𝑙𝑠𝑒 |𝑥 ≤ 𝑘|𝑥 < 𝑘|𝑥 ≥
≥ 𝑘|𝑥 > 𝑘|𝑐 ∧ 𝑐|𝑐 ∨ 𝑐|¬𝑐.
The evaluation of a clock constraint c
for a valuation v of clocks X denoted by c(v)
is obtained by replacing each clock x by its
value v(x).
A guard g is a clock constraint c with
an urgency type 𝜏 ∈ { 𝑛, 𝑑, 𝑢 }, denoted by
𝑔 = [𝑐𝜏]. Here urgency types are used to
specify the need of the action in case if action
execution is enabled (i. e. when the clocks
constraint is true). Non-urgent actions are de-
noted by n, delayable actions (which should
be executed during their enable time interval)
by d, urgent actions (should be executed as
soon as they are enabled) are denoted by u.
The predicate urg[g] that characterizes the
valuations of clocks for which the guard
𝑔 = [𝑐𝜏] is urgent is defined by:
𝑢𝑟𝑔[𝑔](𝑣) ⇔
⇔ {
false if 𝑔 is non urgent (τ = n)
𝑐(𝑣) ∧ ¬𝑐(𝑣 + 𝐾) if 𝑔 is delayable (τ = d)
𝑐(𝑣) if 𝑔 is urgent (τ = u)
The set of guards over the set of
clocks X is denoted as G(X).
For given guards
𝑔1 = [𝑐1]
𝜏1 𝑎𝑛𝑑 𝑔2 = [𝑐2]
𝜏2,
the conjunction of g1 and g2 is denoted by
𝑔1 ∧ 𝑔2 and is defined by 𝑔1 ∧ 𝑔2 =
= [𝑐1 ∧ 𝑐2]
𝑚𝑎𝑥 𝜏1,𝜏2, considering that urgency
types are ordered as follows: 𝑛 < 𝑑 < 𝑢. So,
for given guard g=[c]
τ
and a valuation v, we
also write g(v) for the expression c(v).
Additionally, the resource constraints
are defined.
Additionally to model in [8] let us de-
fine the resource set P = { pi
t
}, where t is the
resource type, I is the resource index for some
number of identical resources in computation-
al system. Also the resource pi
t
may be allo-
cated in part – pi
t
(k), where k is the amount of
resource pi
t
.
Let us define availability function for
resource: avl(pi
t
(k)), is it true is case if the re-
source is available.
3. Abstract model with
constraints
Definition 1. Abstract model with
constrains.
An abstract model is a timed automa-
ton M=(A,Q,X,P,→) such that:
- A is a finite set of (observable)
actions. In addition to actions A internal ac-
tions β. The set of actions A {β} is denoted
as A
β
;
- Q is a finite set of control loca-
tions;
- P is the resource set;
- X is finite set of clocks
→ Q×(A
β
×G(X)×2
X
)×Q is a finite set of la-
beled transitions. A transition is a tuple
(𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑞′) where a is an action executed
by the transition, g is guard over X, r is a sub-
set of clocks that are reset by the transition
and p is the resource necessary to be allocated
during the transaction. We write 𝑞
𝑎,𝑔,𝑟,𝑝
→ 𝑞′ for
(𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑞′) ∈→.
An abstract model describes the plat-
form-independent behavior of the system.
Definition 2. (Abstract model
semantics). An abstract model 𝑀 =
= (𝑨,𝑸, 𝑿, 𝑷 →) defines a transition system
TS. States of TS are pairs (q,v), where q is a
control location of M and v is a valuation of
the clocks X.
Actions: We have (𝑞, 𝑣)
𝑎
→
𝑎
→ (𝑞′, 𝑣[𝑟 ⟼ 0])𝑖𝑓 𝑞
𝑎,𝑔,𝑟,𝑝
→
𝑎,𝑔,𝑟,𝑝
→ 𝑞′𝑖𝑛 𝑀 𝑎𝑛𝑑 𝑔(𝑣)𝑖𝑠 𝑡𝑟𝑢𝑒 and avl(p) is
true.
Time steps. For a waiting time
𝛿 ∈ ℕ, 𝛿 > 0, we have (𝑞, 𝑣)
𝛿
→(𝑞, 𝑣 + 𝛿) if
for all transitions
(𝑞
𝑎,𝑔,𝑟
→ 𝑞′𝑜𝑓 𝑀 𝑓𝑜𝑟 𝑓𝑜𝑟 𝑎𝑙𝑙 𝛿′ ∈
∈ [0, 𝛿[, ¬𝑢𝑟𝑔[𝑔](𝑣 + 𝛿′).
Urgency corresponds to priorities in-
duced by the timing constraints: urgent transi-
tions have priority compared to other possible
transitions. We denote by wait(q,v) the maxi-
mal waiting time allowed at (q,v). Always
wait(q,v+δ) = wait(q,v) – δ for all 𝛿 ∈
Інструментальні засоби та середовища програмування
6
∈ [0, 𝑤𝑎𝑖𝑡(𝑞, 𝑣)], and is formally defined as
follows:
𝑤𝑎𝑖𝑡(𝑞, 𝑣) =
= 𝒎𝒊𝒏({𝛿 ≥ 0|⋁ 𝑢𝑟𝑔[𝑔𝑖](𝑣 + 𝛿)
𝑞
𝑎𝑖,𝑔𝑖,𝑟𝑖,𝑝𝑖
→ 𝑞𝑖
} ∪
∪ {+∞}).
For an abstract model M =
=(A,Q,{x},P,→), a finite execution sequence
of M from an initial state (q0,v0) is a maximal
sequence of observable actions and time-steps
(𝑞𝑖, 𝑣𝑖) ↝
𝜎𝑖 (𝑞𝑖+1, 𝑣𝑖+1), 𝜎𝑖 ∈ 𝑨 ∪ 𝑁 and
𝑖 ∈ {0, 1, 2, … , 𝑛} (𝑖 ∈ ℕ), such that ↝ is the
transitive closure of → for β-transitions, that
is (𝑞𝑖, 𝑣𝑖) ↝
𝜎𝑖 (𝑞𝑖+1, 𝑣𝑖+1) if (𝑞𝑖, 𝑣𝑖)
𝛽
→
∗
𝛽
→
∗
(𝑞′
𝑖
, 𝑣′𝑖)
𝜎𝑖
→(𝑞′′𝑖, 𝑣
′′
𝑖)
𝛽
→
∗
(𝑞𝑖+1, 𝑣𝑖+1).
For example of this model you can re-
fer to Model 1 in [8].
Definition 3. (Composition of abstract
models). Let Mi = (Ai,Qi,Xi,Pi,→i), 1 ≤ 𝑖 ≤
≤ 𝑛, be a set of abstract models. We assume
that their sets of action and clocks are dis-
joint, i.e. for all 𝑖 ≠ 𝑗we have Ai ∩Aj=∅ and
Xi ∩Xj=∅. A set of interactions γ is a subset
of 2
A
, where 𝐴 = ⋃ 𝐴𝑖
𝑛
𝑖=1 , such that any in-
teraction a ∈γ contains at most one action of
each component Mi, that is, 𝑎 = {𝑎𝑖|𝑖 ∈ 𝐼}
where 𝑎𝑖∈𝐴𝑖and 𝐼 ⊆ {1,2, … , 𝑛}. The compo-
sition of the abstract models Mi, 1 ≤ 𝑖 ≤ 𝑛,
by using a set of interactions γ, denoted
by γ(M1,….,Mn), is the composite abstract
model M=(γ,Q,X,P,→) such that Q=Q1×
×Q2×… Qn, 𝑋 = ⋃ 𝑋𝑖
𝑛
𝑖=1 and →γ is defined
by the rules:
𝑎 = {𝑎𝑖}𝑖∈𝐼 ∈ 𝛾
𝑔 = ⋀ 𝑔𝑖𝑖∈𝐼
𝑟 =⋃𝑟𝑖
𝑖∈𝐼
∀𝑖 ∈ 𝐼, 𝑞𝑖
𝑎𝑖,𝑔𝑖,𝑟𝑖,𝑝𝑖
→ 𝑞𝑖
′
∀𝑖 ∉ 𝐼. 𝑞𝑖
′ = 𝑞𝑖
(𝑞1, … , 𝑞𝑛)
(𝑎,𝑔,𝑟,𝑝)𝛾
→ (𝑞′
1
, … , 𝑞′
𝑛
)
аlso ∃𝑖 ∈ {1,… , 𝑛}. 𝑞𝑖
𝛽,𝑔𝑖,𝑟𝑖,𝑝𝑖
→ 𝑖 𝑞
′
𝑖
∀𝑖 ≠ 𝑗. 𝑞′
𝑗
= 𝑞𝑗
(𝑞1, … , 𝑞𝑛)
𝛽,𝑔𝑖,𝑟𝑖,𝑝𝑖
→ 𝛾 (𝑞
′
1
, … , 𝑞′
𝑛
).
A composition M=γ(M1,…,Mn) of ab-
stract models Mi, 1 ≤ 𝑖 ≤ 𝑛, can execute two
types of transitions: interactions 𝑎 = {𝑎𝑖}𝑖∈𝐼 ∈
𝛾 which corresponds to synchronizations of
actions ai of models Mi, 𝑖 ∈ 𝐼, and internal ac-
tions β of the modeles Mi. An interaction
𝑎 = {𝑎𝑖}𝑖∈𝐼 ∈ 𝛾is enabled from a state of M if
all actions ai are enabled.
In a composite model M=γ(M1,…,Mn)
many interaction can be enabled to act simul-
taneously (in the same time) introducing a de-
gree of non-determinism in the behavior of M.
In order to restrict non-determinism,
priorities are introduced that specify which
interaction should be executed among the en-
abled ones. A priority on M=γ(M1,…,Mn) is a
relation 𝜋 ⊆ 𝛾 × 𝑄 × 𝛾 such that for all q the
relation 𝜋𝑞 = {(𝑎, 𝑎
′)|(𝑎, 𝑞, 𝑎′) ∈ 𝜋} is a par-
tial order. We write aπqa' for (𝑎, 𝑞, 𝑎′) ∈ 𝜋 to
express the fact that a has weaker priority
than a' at state q. That is if both a and a' are
enabled at state q, only the action a' can be
executed. Thus, priority aπqa' is applied only
when the conjunction of the guards and re-
sources of a and a' is true. Let 𝑞
𝑎,𝑔,𝑟,𝑝
→ 𝛾 𝑞′ and
𝑞
𝑎′,𝑔′,𝑟′,𝑝′
→ 𝛾 𝑞′′ be transitions of M such that
g=[c]
τ
and g'=[c']
τ'
. Applying priority aπqa'
boils down to transforming the guard g of a
into the guard gπ=[c˄¬𝑐′]τ
and leaving the
guard g' of a' unchanged.
Furthermore we denote by enq(a) the
predicate characterizing valuations of clocks
for which an interaction a is enabled at state
q. It is defined by:
𝑒𝑛𝑞(𝑎) = {
𝑓𝑎𝑙𝑠𝑒, 𝑖𝑓 ∄(𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑔′) ∈→𝛾
⋁ 𝑐
(𝑞,𝑎,[𝑐]𝜏,𝑟,𝑝,𝑞′∈→𝛾
− 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Definition 4. Priority. Given a compo-
site model M=(γ,Q,X,P,→γ) the application of
priority π to M defines a new model
πM=(γ,Q,X,P,→π) such that →π is defined by
the rule:
𝑞
𝑎,𝑔,𝑟,𝑝
→ 𝛾 𝑞
′, 𝑔 = [𝑐]𝜏,
Інструментальні засоби та середовища програмування
7
𝑔𝜋 = [𝑐˄¬ ⋁ 𝑒𝑛𝑞(𝑎
′)
𝑎𝜋𝑞𝑎′
]
𝜏
𝑞
𝑎,𝑔𝜋,𝑟,𝑝
→ 𝜋 𝑞′.
Example of an abstract model with
priorities is considered in [8].
Abstract models are platform-
independent representations of programs with
atomic and instantaneous actions execution.
Real (“physical”) models represent the pro-
gram behavior on a real platform. It accounts
the fact that the action execution takes some
non-zero time. So we need to break action ex-
ecution atomicity and introduce execution
times. The transition of an action a of an ab-
stract model is replaced by a sequence of two
consecutive transitions of the corresponding
physical (real world) model – see figure 1.
The first transition marks the beginning of the
execution of action a, and the second transi-
tion marks its completion. These transitions
are separated by a partial state denoted by ⊥.
The execution time of the action corresponds
to the waiting time at state ⊥
𝑞
𝑎,𝑔,𝑟,𝑝
→ 𝑞′
⊥𝑡
→ 𝐪
𝑎,𝑔,𝑟,𝑝
→ ⊥𝑡
𝛽
→𝑞′
(Corresponding sequence of transitions in
𝑀⊥), where
𝑡 = (𝑞, 𝑎, 𝑔, 𝑟, 𝑝, 𝑞′)𝑖𝑛 𝑀.
This denoted the transformation of
transitions of the abstract model.
Definiton 5. Physical model. Let
M=(A,Q,X,P,→) be an abstract model. We
define the associated as the timed automaton
𝑀⊥=(A,𝐐 ∪ 𝐐⊥,X,P,→⊥) such that:
𝐐⊥is the set of partial states such that
there is one partial state for each transition of
M, that is, 𝐐⊥={⊥𝑡 |𝑡 ∈→}
→⊥is defined by the rule:
𝑞
𝑎,𝑔,𝑟,𝑝
→ 𝑞′ 𝑡=(𝑞,𝑎,𝑔,𝑟,𝑝,𝑞′)
𝑞
𝑎,𝑔,𝑟,𝑝
→ ⊥⊥𝑡 ⊥𝑡
𝛽,[𝑡𝑟𝑢𝑒]𝐼,∅,𝑡𝑟𝑢𝑒
→ ⊥𝑞′
.
In the physical model 𝑀⊥we assume
arbitrary execution times for actions, ranging
from 0 to +∞, which is modeled by the guard
[true]
|
for β-transitions. Notice that 𝑀⊥can be
further constrained if bounds of the execution
times of actions are unknown. For instance, if
an estimate WCET(a) is known for the worst-
case execution timeof an action a, the associ-
ated timing constraint is [xa≤WCET(a)]
d
in-
stead of [true]
|
, where xa is a clock that is reset
whenever a is started. This allows us to stati-
cally check the correctness of the application
running on the platform but this is beyond the
paper scope.
In a physical model 𝑀⊥, the execution
of the action a by a transition t=(q,a,g,r,p,q')
is followed by a lapse of time δ(a)∈ ℕ at the
partial state ⊥𝑡before a β-transition is execut-
ed:
(𝑞, 𝑣) ↝𝑎 (⊥𝑡, 𝑣[𝑟 ⟼ 0]) ↝𝛿(𝑎)
↝𝛿(𝑎) (𝑞′, 𝑣[𝑟 ⟼ 0] + 𝛿(𝑎)). (1)
This corresponds to the following exe-
cution sequence in the abstract model M, if
such a sequence is feasible:
(𝑞, 𝑣) ↝𝑎 (𝑞′, 𝑣[𝑟 ⟼ 0]) ↝𝛿(𝑎)
↝𝛿(𝑎) (𝑞′, 𝑣[𝑟 ⟼ 0] + 𝛿(𝑎)). (2)
It should be noticed that the time
stamp δ(a) of 𝑀⊥ in (1) may not be the time
stamp of M in (2) if δ(a)>wait(q',v[𝑟 ⟼ 0]),
meaning that the physical model violates tim-
ing constraints defined in the corresponding
abstract model. In this case we say that the
considered execution sequence is not time-
safe. (The execution times of abstract and
physical models are compared in [9] –
considering that if all execution sequences
of 𝑀⊥ are time-safe than 𝑀⊥ is weakly
simulated by M).
Correct model implementation should
execute only time-safe sequences, but time-
safety violations occur in a physical model
when the execution time of an action is larger
than what is allowed by the timing and re-
source constraints of the corresponding ab-
stract model. Correct implementations are ob-
tained for platforms that are sufficiently fast
for executing the program without violating
time-safety. Here the physical model pre-
serves the semantic of the abstract model as
shown in [10]. Otherwise the time-safety vio-
lations should be checked in run time.
Physical model composition consider-
ations and correctness considerations can be
checked in [8].
Інструментальні засоби та середовища програмування
8
So how we can deal with resource lim-
itations? State-of-the-art software packages
require simpler solutions, so model interpreta-
tion in run-time (such as in [11]) looks com-
plex and superfluous. In case if program is al-
ready expressed as a graph of scheduled
blocks, it is possible to evaluate model re-
quirements and behavior at the moments be-
tween the previous block finish and a the new
block start – this also works for moments of
spawning new parallel processes and joining
parallel processes for one new serial process.
This works well for cases enlisted before:
LAPACK-based computations [12], neural
network inference and media processing in
Gstreamer-like pipelines. Even it is possible
for interpreted code, such as Java or .Net,
where code assemblies can be annotated with
resource information.
4. Affected industry cases
Why all these model descriptions are
important for us although all they looks to be
abstract for real software business?
Before 2005 the parallel programming
was the area of academy pundits and small
groups of professionals in computer graphics
area. After 2005 the integrated circuits mak-
ing technology enabled so many transistors on
single die for hardware engineers so that the
single computing units (up to von Neumann
definitions) can not effectively use the hard-
ware. The most efficient way was to place
several processing units on one silicone crys-
tal, so that the silicone can run multiple pro-
cesses simultaneously somehow. It looked
like “as single CPU can not deal effectively,
let’s do multiple CPUs and the programmer
should do software in right way”. At practical
side a lot of money was invested into teaching
parallel programming in colleges and making
specialized computer languages for highly
parallel hardware – such as Nvidia CUDA
[13] or anti-Nvidia industry standard OpenCL
[14]. As the efficient parallel software re-
quires more time to invest, more skilled re-
sources and more money for test teams work
the new tendency was appeared – to make
frameworks which allow the programmer to
make parallel programs with simpler (less or
more simpler) descriptions. Such concept
looks nice but works hard. Two well-known
technology examples – CUDA and OpenCL
shows that the main model of writing a paral-
lel program for GPUs is a “producer-
consumer” model when a CPU-side program
controls GPU threads execution and none of
GPU threads are self-sufficient. Practically all
memory management employs CPU-GPU
memory communications, and memory com-
munications between neighbor GPU cards are
exceptions.
Under these conditions any software
developer experiences a serious technology
limitation, as any parallel program support
only its exclusive execution on GPU resource
and the resource pool subdivision between
different processes are possible only in case if
each parallel program allocates some number
of GPUs, leaving the other GPUs to counter-
part.
However, all technologies similar to
CUDA/OpenCL or any specialized parallel
programming languages do not decrease the
development time significantly. The good
ways to decrease programmer effort are infra-
structures, which shorten development time
for 90 % in simpler cases and for 50 % in
hard cases. The important issue is that the
most of the infrastructures are upgradable to
incorporate the described physical model.
Let us check several infrastructure (frame-
work) cases.
1) TensorFlow [3]. At least 60 % of
neural network learning and inference market.
Tensorflow interprets network model (using
Python language) based on a sequence of big
code blocks.
2) Caffee[1]. Near 30 % of neural
networking segment, written in optimized
C++. Caffee (conceptually similar to Tensor-
flow) interprets an abstract sequence of
blocks.
3) Gstreamer [5] and ffmpeg [15]
media processing pipelines. Both construct a
pipeline using already defined big code
blocks (plugins), including spawning parallel
processes and internal queue storage. Pipe-
lines also works for GPU-based computa-
tions. Both packages are widely used as a
base for industry media processing and
broadcasting.
4) OpenCV framework [4]. It is a
good base for parallel matrix computations
Інструментальні засоби та середовища програмування
9
(and 2D/3D processing) at GPU. Still the
OpenCV is more low-level library tool than a
framework, the packages can benefit from
model interpretation.
5) Simulation tools, starting from the
old good NS2 (good use example in [16]) and
other network simulations. In case of big runs
cluster-based modeling benefit from model
interpretations.
The proposed abstract and physical
models are good universal tools for many
frameworks. It can deal with both low level
descriptions of CPU-handled subroutines (at
50-100 CPU instructions level) and high-
level annotations for framework elements:
all timing and resource constraints remain
the same.
The next step will be more practical:
incorporation of the model interpreter into
one of the frameworks and practical test of
benefits got because of expressing the soft-
ware in term of physical model.
Conclusions
In order to meet the modern require-
ments for software development – less time,
more quality, lower expenses – this article
proposes the resource constrained model for
parallel programs, which allows to run (or
model the behavior) of multiple (and differ-
ent) parallel software runs under resource
constraints on real-world hardware systems.
In addition, the list of popular frameworks -
which can benefit from incorporating the el-
ements of the resource-constrained models in-
terpreter – is presented. The future work in-
cludes the extension of one of framework
with model interpreter for low-overhead re-
source checker on the fly (at program run
time) and real-world model examples.
References
1. Yangqing J., Shelhamer E., Donahue J.,
Karayev S., Long J., Girshick R., Guadarrama
S., Darrell T. (2014) Caffe: Convolutional Ar-
chitecture for Fase Feature Embedding. ArXiv
preprint: arXiv:1408.5039
2. Redmon J. (2013) [Online]. Darknet: Open
Source Neural Networks in C. – Available
from https://pjreddie.com/darknet/
3. Abadi M., Barham P., Chen J., Chen Z., Davis
A., Dean J., Devin M., Ghemawat S., Irving
G., Isard M. & others (2016). TensorFlow: A
System for Large-Scale Machine
Learning. OSDI. P. 265–283.
4. Bradsky G., Kaehler A. Learning OpenCV —
O’Reilly, 2008. P. 1.
5. Taymans W., Baker S., Wingo A. (2018)
GStreamer Application Development 1.10.1.
P. 164. 12
th
Media Services.
6. DeepStream [Online] – Nvidia DeepStream
Software Development Kit – Available at
https://developer.nvidia.com/deepstream-sdk
7. Peter Hui and Satish Chikkagoudar. (2012) A
Formal Model for Real-time Parallel compu-
tation. In Proc of FTSCS-2012. P. 39–53.
8. Ahlem Triki, Jacques Combaz. (2013) Model-
Based implementation of Parallel Real-
Time Systems. Verimag Research Report TR-
2013-11
9. Wilhelm R., Altmeyer S., Burguiere C.,
Grund D., Herter J., Reineke J., Wachter B.,
Wilhelm S. Static timing analysis for hard re-
al-time systems. In Barthe G. and Herme-
negildo M.V., eds., WMCAI. 2010. Vol. 5944
of LNCS. P. 3–22. Springer.
10. Abdellatif T., Combaz J., Sifakis J. Model-
based implementation of real-time applica-
tions. In Carloni L.P. and Stavros Tripakis,
eds. EMSOFT. 2010. P. 229–238.
11. Basu A., Bogza M., Sifakis J. Modeling
heterogeneous real-time components in BIP.
In SEFM. 2006. P. 3–12. IEEE Computer
Society.
12. Baboulin M., Demmel J., Dongarra J., Tomov
S., and Volkov V. Enhancing the Performance
of Dense Linear Algebra Solvers on GPUs (in
the MAGMA Project) , Austin, TX, The In-
ternational Conference for High Performance
Computing, Networking, Storage, and Analy-
sis (SC08), Nov. 2008.
13. CUDA [Online] – Available at
https://developer.nvidia.com/cuda-zone
14. OpenCL [Online] – Available at Khronos
Group: https://www.khronos.org/opencl/
15. Xu Y.G. and Cao S.X. Real-Time Video
Acquisition and Frame Compression
Processing Technology Based on FFmpeg,
Applied Mechanics and Materials. 2014.
Vols. 631–632. P. 494–497.
16. Michael Welzl. Adaptive Multimedia
Communication over Satellite Routed IP". In
ICC 2000 (International Conference on
Communications – IEEE Communications
Society), New Orleans, Louisiana, USA,
18–22 June 2000.
Інструментальні засоби та середовища програмування
10
Література
1. Yangqing J., Shelhamer E., Donahue J.,
Karayev S., Long J., Girshick R., Guadarrama
S., Darrell T. (2014) Caffe: Convolutional Ar-
chitecture for Fase Feature Embedding. ArXiv
preprint: arXiv:1408.5039
2. Redmon J. (2013) [Online]. Darknet: Open
Source Neural Networks in C. – Available
from https://pjreddie.com/darknet/
3. Abadi M., Barham P., Chen J., Chen Z., Davis
A., Dean J., Devin M., Ghemawat S., Irving
G., Isard M. & others (2016). TensorFlow: A
System for Large-Scale Machine
Learning. OSDI. P. 265–283.
4. Bradsky G., Kaehler A. Learning OpenCV —
O’Reilly, 2008. P. 1.
5. Taymans W., Baker S., Wingo A. (2018)
GStreamer Application Development 1.10.1.
P. 164. 12
th
Media Services.
6. DeepStream [Online] – Nvidia DeepStream
Software Development Kit – Available at
https://developer.nvidia.com/deepstream-sdk
7. Peter Hui and Satish Chikkagoudar. (2012) A
Formal Model for Real-time Parallel compu-
tation. In Proc of FTSCS-2012. P. 39–53.
8. Ahlem Triki, Jacques Combaz. (2013) Model-
Based implementation of Parallel Real-
Time Systems. Verimag Research Report TR-
2013-11
9. Wilhelm R., Altmeyer S., Burguiere C.,
Grund D., Herter J., Reineke J., Wachter B.,
Wilhelm S. Static timing analysis for hard re-
al-time systems. In Barthe G. and Herme-
negildo M.V., eds., WMCAI. 2010. Vol. 5944
of LNCS. P. 3–22. Springer.
10. Abdellatif T., Combaz J., Sifakis J. Model-
based implementation of real-time applica-
tions. In Carloni L.P. and Stavros Tripakis,
eds. EMSOFT. 2010. P. 229–238.
11. Basu A., Bogza M., Sifakis J. Modeling
heterogeneous real-time components in BIP.
In SEFM. 2006. P. 3–12. IEEE Computer
Society.
12. Baboulin M., Demmel J., Dongarra J., Tomov
S., and Volkov V. Enhancing the Performance
of Dense Linear Algebra Solvers on GPUs (in
the MAGMA Project) , Austin, TX, The In-
ternational Conference for High Performance
Computing, Networking, Storage, and Analy-
sis (SC08), Nov. 2008.
13. CUDA [Online] – Available at
https://developer.nvidia.com/cuda-zone
14. OpenCL [Online] – Available at Khronos
Group: https://www.khronos.org/opencl/
15. Xu Y.G. and Cao S.X. Real-Time Video
Acquisition and Frame Compression
Processing Technology Based on FFmpeg,
Applied Mechanics and Materials. 2014.
Vols. 631–632. P. 494–497.
16. Michael Welzl. Adaptive Multimedia
Communication over Satellite Routed IP". In
ICC 2000 (International Conference on
Communications – IEEE Communications
Society), New Orleans, Louisiana, USA,
18–22 June 2000.
Received 01.10.2019
About the author:
Dmytro V. Rahozin,
candidate of tech. sciences (PhD)
More than 10 publication in Ukrainian and
foreign journals.
https://orcid.org/0000-0002-8445-9921
Affiliation:
Institute of Software Systems,
NAS of Ukraine
03187, Kyiv-187,
Acad. Hlushkov avenue, 40.
Tel.: +38 068 575 91 25.
E-mail: dmytro.rahozin@gmail.com
|