Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops
Algorithms, permitting us to find sources of synchronization-free slices of perfectly nested uniform and non-uniform loops, are presented. Sources extracted are to be used for creating synchronization-free-slices that can be executed independently preserving the lexicographic order of iterations in...
Saved in:
| Published in: | Электронное моделирование |
|---|---|
| Date: | 2007 |
| Main Authors: | , |
| Format: | Article |
| Language: | English |
| Published: |
Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України
2007
|
| Subjects: | |
| Online Access: | https://nasplib.isofts.kiev.ua/handle/123456789/101769 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Cite this: | Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops / W. Bielecki, K. Siedlecki // Электронное моделирование. — 2007. — Т. 29, № 3. — С. 41-53. — Бібліогр.: 27 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| id |
nasplib_isofts_kiev_ua-123456789-101769 |
|---|---|
| record_format |
dspace |
| spelling |
Bielecki, W. Siedlecki, K. 2016-06-07T06:19:59Z 2016-06-07T06:19:59Z 2007 Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops / W. Bielecki, K. Siedlecki // Электронное моделирование. — 2007. — Т. 29, № 3. — С. 41-53. — Бібліогр.: 27 назв. — англ. 0204-3572 https://nasplib.isofts.kiev.ua/handle/123456789/101769 Algorithms, permitting us to find sources of synchronization-free slices of perfectly nested uniform and non-uniform loops, are presented. Sources extracted are to be used for creating synchronization-free-slices that can be executed independently preserving the lexicographic order of iterations in each slice. Our approach requires exact dependence analysis and based on operations on relations and sets. To describe and implement the algorithms, the dependence analysis by Pugh and Wonnacott was chosen where dependences are found in the form of tuple relations. The proposed algorithms have been implemented and verified by means of the Omega project software. Представлены алгоритмы, позволяющие находить несинхронизированные фрагменты, содержащие итерации полностью вложенных однородных и неоднородных циклов. Такие фрагменты могут выполняться независимо, сохраняя лексикографический порядок итераций в каждом фрагменте. Предложенный подход основан на операциях отношений и множеств и требует точного анализа зависимостей между операторами программы. Для описания и реализации алгоритмов выбран анализ зависимости по Пугу и Воннакоту, согласно которому зависимости отыскиваются в форме отношений кортежа. Описанные алгоритмы реализованы и верифицированы посредством программного пакета Omega project. Наведено алгоритми, що дозволяють знаходити несинхронізовані фрагменти, які вміщують ітерації повністю вкладених однорідних і неоднорідних циклів. Такі фрагменти можуть виконуватись незалежно, зберігаючи лексикографічний порядок ітерацій у кожному фрагменті. Запропонований підхід базується на операціях відношень та множин і потребує точного аналізу залежності між операторами програми. Для опису та реалізації алгоритмів обрано аналіз залежності по Пугу і Воннакоту, згідно з яким залежності знаходять у формі відношень кортежу. Описані алгоритми реалізовано і верифіковано за допомогою програмного пакета Omega project. en Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України Электронное моделирование Вычислительные процессы и системы Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops Article published earlier |
| institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| collection |
DSpace DC |
| title |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops |
| spellingShingle |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops Bielecki, W. Siedlecki, K. Вычислительные процессы и системы |
| title_short |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops |
| title_full |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops |
| title_fullStr |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops |
| title_full_unstemmed |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops |
| title_sort |
finding sources of synchronizationfree slices in perfectly nested loops |
| author |
Bielecki, W. Siedlecki, K. |
| author_facet |
Bielecki, W. Siedlecki, K. |
| topic |
Вычислительные процессы и системы |
| topic_facet |
Вычислительные процессы и системы |
| publishDate |
2007 |
| language |
English |
| container_title |
Электронное моделирование |
| publisher |
Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України |
| format |
Article |
| description |
Algorithms, permitting us to find sources of synchronization-free slices of perfectly nested uniform and non-uniform loops, are presented. Sources extracted are to be used for creating synchronization-free-slices that can be executed independently preserving the lexicographic order of iterations in each slice. Our approach requires exact dependence analysis and based on operations on relations and sets. To describe and implement the algorithms, the dependence analysis by Pugh and Wonnacott was chosen where dependences are found in the form of tuple relations. The proposed algorithms have been implemented and verified by means of the Omega project software.
Представлены алгоритмы, позволяющие находить несинхронизированные фрагменты, содержащие итерации полностью вложенных однородных и неоднородных циклов. Такие фрагменты могут выполняться независимо, сохраняя лексикографический порядок итераций в каждом фрагменте. Предложенный подход основан на операциях отношений и множеств и требует точного анализа зависимостей между операторами программы. Для описания и реализации алгоритмов выбран анализ зависимости по Пугу и Воннакоту, согласно которому зависимости отыскиваются в форме отношений кортежа. Описанные алгоритмы реализованы и верифицированы посредством программного пакета Omega project.
Наведено алгоритми, що дозволяють знаходити несинхронізовані фрагменти, які вміщують ітерації повністю вкладених однорідних і неоднорідних циклів. Такі фрагменти можуть виконуватись незалежно, зберігаючи лексикографічний порядок ітерацій у кожному фрагменті. Запропонований підхід базується на операціях відношень та множин і потребує точного аналізу залежності між операторами програми. Для опису та реалізації алгоритмів обрано аналіз залежності по Пугу і Воннакоту, згідно з яким залежності знаходять у формі відношень кортежу. Описані алгоритми реалізовано і верифіковано за допомогою програмного пакета Omega project.
|
| issn |
0204-3572 |
| url |
https://nasplib.isofts.kiev.ua/handle/123456789/101769 |
| citation_txt |
Finding Sources of Synchronizationfree Slices in Perfectly Nested Loops / W. Bielecki, K. Siedlecki // Электронное моделирование. — 2007. — Т. 29, № 3. — С. 41-53. — Бібліогр.: 27 назв. — англ. |
| work_keys_str_mv |
AT bieleckiw findingsourcesofsynchronizationfreeslicesinperfectlynestedloops AT siedleckik findingsourcesofsynchronizationfreeslicesinperfectlynestedloops |
| first_indexed |
2025-11-26T15:14:37Z |
| last_indexed |
2025-11-26T15:14:37Z |
| _version_ |
1850626003934642176 |
| fulltext |
W. Bielecki, K. Siedlecki
Technical University of Szczecin,
(Zo³nierska 49 st., 71-210 Szczecin, Poland,
fax.: (+4891) 4876439, E-mail: wbielecki@wi.ps.pl, ksiedlecki@wi.ps.pl)
Finding Sources of Synchronization-free
Slices in Perfectly Nested Loops
Algorithms, permitting us to find sources of synchronization-free slices of perfectly nested uni-
form and non-uniform loops, are presented. Sources extracted are to be used for creating synchro-
nization-free-slices that can be executed independently preserving the lexicographic order of iter-
ations in each slice. Our approach requires exact dependence analysis and based on operations on
relations and sets. To describe and implement the algorithms, the dependence analysis by Pugh
and Wonnacott was chosen where dependences are found in the form of tuple relations. The pro-
posed algorithms have been implemented and verified by means of the Omega project software.
Ïðåäñòàâëåíû àëãîðèòìû, ïîçâîëÿþùèå íàõîäèòü íåñèíõðîíèçèðîâàííûå ôðàãìåíòû,
ñîäåðæàùèå èòåðàöèè ïîëíîñòüþ âëîæåííûõ îäíîðîäíûõ è íåîäíîðîäíûõ öèêëîâ. Òàêèå
ôðàãìåíòû ìîãóò âûïîëíÿòüñÿ íåçàâèñèìî, ñîõðàíÿÿ ëåêñèêîãðàôè÷åñêèé ïîðÿäîê èòåðà-
öèé â êàæäîì ôðàãìåíòå. Ïðåäëîæåííûé ïîäõîä îñíîâàí íà îïåðàöèÿõ îòíîøåíèé è
ìíîæåñòâ è òðåáóåò òî÷íîãî àíàëèçà çàâèñèìîñòåé ìåæäó îïåðàòîðàìè ïðîãðàììû. Äëÿ
îïèñàíèÿ è ðåàëèçàöèè àëãîðèòìîâ âûáðàí àíàëèç çàâèñèìîñòè ïî Ïóãó è Âîííàêîòó,
ñîãëàñíî êîòîðîìó çàâèñèìîñòè îòûñêèâàþòñÿ â ôîðìå îòíîøåíèé êîðòåæà. Îïèñàííûå
àëãîðèòìû ðåàëèçîâàíû è âåðèôèöèðîâàíû ïîñðåäñòâîì ïðîãðàììíîãî ïàêåòà Omega
project.
K e y w o r d s: loop transformations, perfectly nested loops, synchronization-free parallelism.
Introduction. Finding synchronization-free slices in loops is of great importance
for parallel and distributed computing, enhancing code locality, and reducing mem-
ory requirements. Different techniques have been developed to extract synchroniza-
tion-free parallelism available in loops, for example [1—13]. However, to our
knowledge, none of well-known techniques extracts the entire synchronization-free
parallelism available in the general case of affine non-uniform loops.
The goal of this paper is to present an approach which permits us to extract
synchronization-free slices available in loops when well-known techniques may
fail to extract such slices. It is applicable to perfectly nested both non-para-
meterized and parameterized loops and allows synchronization-free slices to be
extracted at compile or run-time.
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 41
�������� �
��
�
�� ���������� ��
The purpose of extracting synchronization-free slices is not only to get scalable
performance on parallel computers and distributed systems but also to enhance per-
formance on a uniprocessor thanks to enhancing data locality and to reduce memory
requirements to decrease cost and power consumption in embedded systems.
Our approach is based on exact data dependence analysis and on operations
on sets and relations. We have implemented and verified our approach by means
of the Omega project software [14].
Background. In this paper, we deal with affine loop nests where, for given
loop indices, lower and upper bounds as well as array subscripts and condition-
als are affine functions of surrounding loop indices and possibly of structure pa-
rameters, and the loop steps are known positive constants. Presented algorithms
are to be used for extracting synchronization-free slices. The iterations belong-
ing to slices requiring synchronization can be parallelized with well-known
techniques, for example, [1, 15 —19].
For the perfectly nested loop, all its statements are comprised within the in-
nermost nest. We refer to a particular sequential execution of all the statements
of the loop body as an iteration. Each iteration of an n-level nested loop is repre-
sented with an iteration vector I of dimension n.
Two iterations I and J are dependent if both access the same memory loca-
tion and if at least one access is a write. We refer to I and J as the source and des-
tination of a dependence, respectively, provided that I is lexicographically less
than J (I � J). The vector d = J – I is referred to the dependence vector. The loop
nest is said to be uniform if all dependence vectors do not depend neither on I,
nor on J.
Our approach requires an exact representation of loop-carried dependences
and consequently an exact dependence analysis which detects a dependence if
and only if it exists. To describe and implement our algorithms, we chose the de-
pendence analysis proposed by Pugh and Wonnacott [20] where dependences
are represented with dependence relations comprised of Presburger formulas,
which can be built up out of linear constraints over integer variables, logical con-
nectives, and universal and existential quantifiers [20]. We assume that the
reader is familiar with that dependence analysis.
We refer to the source (destination) of a dependence as the ultimate depend-
ence source (destination) if it is not the destination (source) of any other depend-
ence. Program slicing is a viable method to restrict the focus of a task to specific
sub-components of a program. Program slicing was first introduced by Mark
Weiser [21]. According to the original definition [22], the notion of slice was
based on the deletion of statements. A slice is an executable subset of program
statements that preserves the original behavior of the program with respect to a
subset of variables of interest and at a given program point [22].
W. Bielecki, K. Siedlecki
42 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 3
In paper [23] the idea of iteration space slicing was introduced. Iteration
space slicing takes dependence information as input to find all statement in-
stances from a given loop nest which must be executed to produce the correct
values for the specified array elements. We can think of the slice as following
chains of transitive dependences to reach all statement instances which can
affect the result.
In this paper we deal with specific slices, which can be synchronization-free
or requiring synchronization.
Definition 1. A slice is a set of dependent iterations including an ultimate
dependence source and all the dependence destinations such that each depend-
ence destination except from the lexicographically maximal destination (ulti-
mate dependence destination) is the source of the next dependence.
Definition 2. A slice is independent or synchronization-free if the intersec-
tion of the set of iterations representing this slice and the set representing the rest
of computation in a loop is empty.
Definition 3. The source of a slice is the ultimate dependence source that
this slice comprises, i. e., the lexicographically minimal iteration among all the
iterations belonging to this slice.
Our algorithms are based on the operations on relations and sets presented in
Table 1, where R and S denote relations and sets, respectively. In detail, these op-
erations are described in [14] and we assume that the reader is familiar with these
operations. We would like to note only that there exist two relations related to tran-
sitive closure: positive transitive closure, R+, and transitive closure, R* = R + � I,
where I is the identity relation, and both of them are used in the algorithms pre-
sented in this paper.
In one of the algorithms presented in this paper, the loop interchange trans-
formation is applied [1]. It consists in switching the nesting order of two loops in
a perfect nest. The legality condition of this transformation can be found in [1].
Finding Sources of Synchronization-free Slices in Perfectly Nested Loops
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 43
Operations Denotation in the Omega Calculator
R R
1 2
�
S S
1 2
�
Intersection
R R
1 2
�
S S
1 2
�
– , Difference
R R
1 2
�
S S
1 2
�
Union
Inverse R � , Inverse
Positive Transitive Closure of R +, Transitive_Closure
Table 1. Operations on relations and sets
Given dependence relations calculated for the loop, our approach to extract-
ing sources of synchronization-free slices of the loop iterations consists of the
following steps. First to increase the number of synchronization-free slices, we
should remove redundant dependences. Second, we have to extract a set of all ul-
timate dependence sources and next split it into two sets including sources of
slices requiring synchronization and sources of synchronization-free slices, re-
spectively. When the sources cannot be extracted in the whole loop domain, we
try to find subspaces in the loop domain where synchronization-free slices can
be extracted.
The following sections describe each step in detail.
Removing redundant dependences. A redundant dependence is one that
can be eliminated without missing any information about dependences required
for extracting all synchronization-free slices available in the loop. A dependence
is redundant if it is implied by the other dependences. For example, a redundant
dependence is a direct dependence that is described by different dependence re-
lations (one dependence has to be retained, the rest dependences can be re-
moved) or it is such a direct dependence whose source and destination are the
source and destination of a transitive dependence. The elimination of redundant
dependences may increase the number of synchronization-free slices that we are
able to extract from the loop. Let us consider the following example.
Example 1.
for(i = 1; i � 10; i++)
for(j = 1; j � 10; j++) {
(a) a[i][j] = a[i] [j �1];
(b) c[i][j] = c[i] [j �1];
(ñ) b[i][j] = b[i] [j �2];
}
The loop above originates the following dependence relations found with
Petit (Fig. 1, a)
(data dep. a�a) R1 := {[i,j] � [i,j+1] : 1 � i � 10 && 1 � j � 9},
(data dep. b�b) R2 := {[i,j] � [i,j+1] : 1 � i � 10 && 1 � j � 9},
(data dep. c�c) R3 := {[i,j] � [i,j+2] : 1 � i � 10 && 1 � j � 8}.
If we take into account all dependence relations, the presented algorithm
cannot extract synchronization-free slices because there exist common depend-
ence destinations described with the different dependence relations. But R2 is re-
dundant because it represents the same dependences as R1 does, hence it can be
removed (Fig. 1, b). Relation R3 is also redundant because for each direct de-
pendence represented with R3 there exists a transitive dependence represented
with relation R1, hence R3 can be removed. Removing R2 and R3 permits us to ex-
tract synchronization-free slices shown in Fig. 1, c.
W. Bielecki, K. Siedlecki
44 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 3
The removal of redundant dependences is a well-known problem consid-
ered in many publications, for example, in [24] and it is out of the scope of this
paper.
Finding sources of slices. In this section, we describe an algorithm that per-
mits us to find the lexicographically minimal iteration among all the iterations
belonging to the slice. Such an iteration is the source of a slice.
The following steps are to be fulfilled for extracting synchronization-free
slices. First we form two sets containing independent and dependent iterations,
respectively. Next using the set, containing dependent iterations, we extract all
ultimate dependence sources. Then a set comprising sources for slices requiring
synchronization, RSS, and a set including sources for synchronization-free
slices, SFS, are built. The number of iterations in set SFS determines the number
of synchronization-free slices.
In the algorithm that follows, we suppose that (i) each dependence relation
does not represent two or more different dependence destinations corresponding
to the same dependence source; if this is the case, the relation has to be normalized
to satisfy the above condition (redundant dependences have to be removed or it
has to be split into several dependences such that each of them does not describe
common iterations); (ii) any two relations, Ri and Rj, such that one of them de-
scribes ultimate dependence destinations that are ultimate dependence sources de-
scribed with the other relation are represented with a single relation R:=Ri � Rj.
Algorithm 1. Extracting sources of slices requiring synchronization and
sources of synchronization-free slices.
Input: set S_In including normalized relations representing loop-carried
dependences: R1, R2, …, Rm, where m is the number of relations.
Output: a set of sources of slices requiring synchronization, RSS; a set of
sources of synchronization-free slices, SFS.
Finding Sources of Synchronization-free Slices in Perfectly Nested Loops
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 45
j
i
a b c
j
i
j
i
R
R
R
1
2
3
Fig. 1. Dependences for Example 1: a — all dependences; b — dependences after removing R2;
c — dependences after removing R2 and R3
1. Calculate the union of all dependence relations, R, inversion of R, IR, and
initialize a set of common iterations, CI:
R := R1 � R2 � … � Rm ; IR := inverse R ; CI := EMPTY.
2. Find a set of all common sources and destinations, CI. If set S_In includes
the only relation, then go to step 3. Otherwise for each pair of relations Ri and Rj
in set S_In, where i � j, i, j � m do
CI = CI � (domain Ri � domain Rj) � (range Ri � range Rj).
3. Find dependence sources, I, as the domain of relation R; find dependence
destinations, J, as the range of relation R. Find all ultimate dependence sources,
UDS, as the difference between sets I and J, that is,
UDS := (domain R) – (range R).
4. If CI == EMPTY, then UDS contains all sources of synchronization-free
slices, SFS : = UDS, the end; otherwise go t step 5.
5. Calculate a set of dependence sources belonging to slices requiring syn-
chronization, RSS1, using step 5a or 5b (Table 2).
6. To find a set comprising sources of dependent slices, RSS, calculate the
intersection between RSS1 and UDS
RSS := RSS1 � UDS.
7. To find a set including sources of synchronization-free slices, SFS, calcu-
late the difference between UDS and RSS
SFS := UDS – RSS.
It is worth to note that when a set of common iterations, CI, is EMPTY, all
ultimate dependence sources are the sources of synchronization-free slices.
When the loop originates common iterations, Steps 1 to 5 of Algorithm 1 find all
the iterations, RSS1, belonging to slices requiring synchronization on the back-
W. Bielecki, K. Siedlecki
46 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 3
Step 5a Step 5b
With applying transitive closure Without applying transitive closure (only for
non-parameterized CI and IR)
Calculate:
IR+ := Transitive Closure IR,
RSS1 := IR+(CI) /*applying relation IR+ to set
CI*/
TEMP := CI, RSS1:= CI,
L: find a set of dependence sources belonging
to slices requiring synchronization, TEMP, as:
TEMP := IR(TEMP) /* applying relation IR to
set TEMP*/
If TEMP == EMPTY, then go to step 6,
otherwise RSS1 := RSS1 TEMP,
go to L:
Table 2. Calculating a set of dependence sources
wards paths from common iterations, contained in set CI, to sources of slices re-
quiring synchronization.
Step 5a of Algorithm 1 can be applied to loops with both parameterized and
non-parameterized bounds at compile or run-time whereas step 5 b can be applied to
loops with parameterized bounds only at run-time when bounds become known.
It is well known that the exact transitive closure of an affine integer tuple re-
lation may not be affine [24]. Exact transitive closure represented with affine
forms is therefore not computable in the general case of the affine dependence
relation. However, it is always computable for affine loops originating uniform
dependences [24]. In the case when the Omega library computes inexact transi-
tive closure, we may approximate this closure and try to extract slices.
To illustrate Algorithm 1 let us consider the following loop.
Example 2.
for(i = 1; i � n; i++)
for(j = 1; j � n; j++) {
a(2*i, 3*j) = b(i,j)
b(i+1, j) = a(i, j)
}
The relations representing loop-carried dependences in this loop are as fol-
lows
(data dep.) R1 : {[i,j] � [2i,3j] : 1 <= j && 2i <= n && 1 <= i && 3j <= n},
(data dep.) R2 : {[i,j] � [i+1,j] : 1 <= i < n && 1 <= j <= n}.
Fig. 2 presents dependences for the loop of Example 2. Following Algo-
rithm 1, we get the following results.
Finding Sources of Synchronization-free Slices in Perfectly Nested Loops
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 47
j
i
R
1
R
2
Sources
of
slices
synchronization-free
Sources of
slices requiring
synchronization
Fig. 2. Dependences for the loop of Example 2, when n = 8
1. R := R1 � R2 := {[i,j] � [2i,3j] : 1 � j && 2i � n && 1 � i && 3j � n} union {[i,j] �
� [i+1,j]: 1 � i < n && 1 � j � n},
IR := inverse R := {[In_1,j] � [i,j ] : j = 3j && In_1 = 2i && 1 � j && 2i � n && 1 � i
&& 3j � n} union {[In_1,j] � [In_1�1,j] : 2 � In_1 � n && 1 � j � n},
CI:=EMPTY.
2. CI := {[i,j]: 1 � j && 2i � n && 1 � i && 3j � n} union {[i,j]: Exists ( alpha,beta : j =
=3alpha && 2beta = i && 2 � i � n && 3 � j � n)}.
3. I := domain R, J := range R,
UDS := I – J := {[1,j]: 1 � j � n && 2 � n}.
4. Since CI is not EMPTY, go to step 5a (using transitive closure).
5. TIR := IR+ union {[i,j] � [i,j]} := {[2,j] � [1,j] : n = 2 && 1 � j � 2} union {[i,j] � [i,j] }
union {[i,j] � [i ,j ] : 1 � j � j � n && i +2 � i � n && 1 � i && 3j � 2n+3j && 2i � 4+n+2i
&& UNKNOWN} union ...
Since relation TIR includes UNKNOWN in the constraints, we approximate
TIR to calculate the upper and lower bounds for TIR. Calculating and using the
upper bound (UNKNOWN=True) for forming set SFS with Algorithm 1 yield
the following result
SFS := {[1,j]: 2 � n � 3, 3j�1 && j � 2} union {[1,j]: Exists ( alpha : 4, j � n �
�3alpha+2 && 3alpha < j)}.
Set SFS above does not include all sources of synchronization-free slices,
we miss some sources.
Calculating the lower bound for TIR(UNKNOWN=False), LTIR,
LTIR := lower_bound TIR := {[2,j] � [1,j] : n = 2 && 1 � j � 2} union {[i,j] � [i,j] }
union {[i,j] � [i ,j ] : j = 3j && 2i � i � n && 3j � n && 1 � j && 1 � i } union {[i,j] �
� [i ,j ] : j = 9j && 4i � i � n && 9j � n && 1 � j && 1 � i } union {[i,j] � [i ,j] : 1 � i < i�
� n && 1 � j � n && 3 � n}
permits us to extract all sources of synchronization-free slices being contained in
the following set
SFS := {[1,j]: Exists ( alpha : j, 2 � n � 3j�1 && 3alpha+1 � j � 3alpha+2)}.
Finding subspaces in the loop domain where synchronization-free slices
can be extracted. When synchronization-free slices cannot be extracted in the
whole loop domain, we can try to find subspaces in the loop domain where syn-
chronization-free slices can be extracted. The following algorithm analyzes de-
pendence vectors in a particular way to discover subspaces of interest.
Algorithm 2. Finding subspaces with synchronization-free slices in the
loop domain.
Input: a source loop; set S_In being comprised of relations representing
loop-carried dependences R1, R2, ..., Rm and set D being comprised of correspon-
dent dependence vectors D1, D2, ..., Dm, where m is the number of relations.
W. Bielecki, K. Siedlecki
48 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 3
Output: loop indices defining subspaces where the algorithm extracts
sources of synchronization-free slices; set S_Out including dependence rela-
tions to be used for extracting slices in subspaces; sources of synchroniza-
tion-free slices described with relations contained in set S_Out.
1. Check whether in set D there exists a dependence vector with one or more
zero coordinates. If no, then the end, the algorithm does not extract any subspace
with synchronization-free slices; otherwise go to step 2.
2. Set S_Out:= EMPTY.
2.1. Add to set S_Out dependence relation Ri belonging to set S_In and such
that it has the minimal number of subsequent zero coordinates in the correspon-
dent vector Di among all the vectors in set D (if there exist two or more such re-
lations, choose any of them; if a chosen relation yields in a correspondent de-
pendence vector two or more sequences with the same number of zeros, chose
any of them),
2.2. Add to set S_Out all the dependence relations from S_In such that the
correspondent dependence vectors have the same zero coordinates as those cho-
sen in Di and the rest coordinates are arbitrary expressions (zero or nonzero),
2.3. Check whether set S_Out includes the same number of dependence re-
lations that set S_In does. If so, the end, the algorithm does not extract any
subspace with synchronization-free slices; otherwise go to step 3.
3. Apply Algorithm 1 to set S_Out. If it extracts sources of synchroniza-
tion-free slices for S_Out, then check whether the source loop is semantically the
same as that whose body is the same as that of the source loop and whose outer
and inner loops are represented with indices that form the minimal number of
subsequent zero coordinates of Di and the rest ones, respectively (checking the
legality of the loop interchange transformation); any known technique can be
used for this purpose, for example, ones described in [1]. Else go to step 4.
If so, memorize the loop indices representing zeros in the chosen
subsequence of zero coordinates in vector Di; memorize the sources of synchro-
nization-free slices for S_Out and set S_Out, the end. Else go to step 4.
4. Check whether in set D there exists a dependence vector, Di, with another
non-analyzed subsequence of zero coordinates. If so, S_Out:= EMPTY, add the
correspondent dependence relation Ri into set S_Out and go to step 2.2, other-
wise the end, the algorithm does not extract any subspace with synchroniza-
tion-free slices.
It is worth to note that choosing in vector Di the minimal number of subse-
quent zero coordinates aims at extracting the maximal number of synchroniza-
tion-free slices. Dependence relations not included in set S_Out do not describe
any dependence in each subspace of interest because of non-zero coordinates of
correspondent dependence vectors derived from these relations. Each depend-
Finding Sources of Synchronization-free Slices in Perfectly Nested Loops
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 49
ence represented with relations not being included in set S_Out has a source and
destination belonging to different subspaces (the source and destination of a de-
pendence do not belong to the same subspace). Hence, we do not take them into
account to extract sources of synchronization-free slices in appropriate sub-
spaces of the loop domain. At generating code, we have to guarantee enumerat-
ing subspaces serially in lexicographic order to preserve dependences repre-
sented with relations not being included in set S_Out.
The following example illustrates applying Algorithm 2.
Example 3.
for (i = 1; i � n; i++)
for (j = 1; j � n; j++)
for (k = 1; k � n; k++){
a (i,j,k) = a (i,j�1,k)
b (i,j,k) = b (i,j,k�1)
}
Dependence relations and correspondent dependence vectors for this loop
are as follows.
R1 := {[i,j,k] �[i,j+1,k] : 1 � i � n && 1 � j < n && 1 � k � n},
D1 := {[0,1,0]},
R2 := {[i,j,k] � [i,j,k+1] : 1 � i � n && 1 � j � n && 1 � k < n}.
D2 := {[0,0,1]}.
The input date for this loop are sets S = { R1, R2 } and D = { D1, D2 }. Apply-
ing Algorithm 2, we yield.
1. In set D there exists a dependence vector with zero coordinates, we go to
step 2.
2. S_Out: = EMPTY.
2.1. We choose index k representing the zero coordinate in vector D1 and
form set S_Out := { R1 }.
2.2. S_Out: = { R1 }.
2.3. Because S_Out does not include the same number of dependence rela-
tions as S_In does, we go to step 3.
3. Applying Algorithm 1 to set S_Out we get the following sources of syn-
chronization-free slices
SFS: = { [i,j,k] :j = 1 && 1 � i � n && 1 � k � n && 2 � n }.
3.1. Because the correspondent loop interchange transformation is legal, we
memorize index k, set SFS, and set S_Out.
Related work. The results of the paper are within the iteration space slicing
framework introduced by Pugh and Rosser in paper [23]. This framework might
have a number of uses. Pugh and Rosser examined in paper [23] how to apply
W. Bielecki, K. Siedlecki
50 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 3
this framework to optimization of interprocessor communication. In particular,
they demonstrated how to use slicing to enable loop fusion, tolerate message la-
tency and allow message coalescing. To form slices, they use the transitive clo-
sure operation to compute the transitive dependences among iterations and then
compute the set of iterations that are reachable via the transitive closure in the
forwards or backwards direction, depending on the application. But Pugh and
Rosser do not describe in paper [23] how to construct synchronization-free
slices. Our contribution to the iteration space slicing framework consists in pre-
senting how to extract sources of both synchronization-free slices and slices re-
quiring synchronization. We are unaware of any work describing techniques ex-
posing sources of synchronization-free slices.
The affine partitioning framework, considered in many papers, for example,
[10 —12, 18, 19, 25 — 27] unifies a large number of previously proposed loop
transformations. Today, it is one of the most powerful frameworks for loop
transformations allowing us to extract synchronization-free parallelism pre-
sented in loops with both uniform and affine dependences. However, for the
general case of non-uniform loops, this framework does not permit us to build
non-affine schedules for extracting parallelism. We believe that the algorithms
presented in this paper opens a possibility to extract synchronization-free
slices when the affine partitioning framework and other well-known techniques
can fail in extracting such slices.
The idea to seek for parallelism in subspaces of the loop domain is not new.
It is discussed in many papers. Our contribution consists in demonstrating how
Algorithm 1, presented in this paper, can be applied to subspaces of the loop do-
main and how these subspaces can be extracted.
Conclusion. In this paper, we described algorithms, permitting us to find
sources of synchronization-free slices comprised of iterations of perfectly nested
uniform and non-uniform loops. Extracting sources of synchronization-free
slices will allow us to find iterations belonging to each synchronization-free
slice. Each slice can be executed without synchronization with the other slices,
thus allowing us to enhance code locality, and (or) reduce memory require-
ments. The presented technique comprises the following steps: finding exact de-
pendence relations, removing redundant dependence relations or removing re-
dundant dependences from dependence relations, finding sources of synchroni-
zation-free slices.
In our future work, using the results of this paper we intend to present how
to extract iterations belonging to each synchronization-free slice and how to
generate code enumerating sources of slices and iterations of each slice in lexi-
cographic order.
Finding Sources of Synchronization-free Slices in Perfectly Nested Loops
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 51
Íàâåäåíî àëãîðèòìè, ùî äîçâîëÿþòü çíàõîäèòè íåñèíõðîí³çîâàí³ ôðàãìåíòè, ÿê³ âì³-
ùóþòü ³òåðàö³¿ ïîâí³ñòþ âêëàäåíèõ îäíîð³äíèõ ³ íåîäíîð³äíèõ öèêë³â. Òàê³ ôðàãìåíòè
ìîæóòü âèêîíóâàòèñü íåçàëåæíî, çáåð³ãàþ÷è ëåêñèêîãðàô³÷íèé ïîðÿäîê ³òåðàö³é ó êîæ-
íîìó ôðàãìåíò³. Çàïðîïîíîâàíèé ï³äõ³ä áàçóºòüñÿ íà îïåðàö³ÿõ â³äíîøåíü òà ìíîæèí ³ ïî-
òðåáóº òî÷íîãî àíàë³çó çàëåæíîñò³ ì³æ îïåðàòîðàìè ïðîãðàìè. Äëÿ îïèñó òà ðåàë³çàö³¿
àëãîðèòì³â îáðàíî àíàë³ç çàëåæíîñò³ ïî Ïóãó ³ Âîííàêîòó, çã³äíî ç ÿêèì çàëåæíîñò³
çíàõîäÿòü ó ôîðì³ â³äíîøåíü êîðòåæó. Îïèñàí³ àëãîðèòìè ðåàë³çîâàíî ³ âåðèô³êîâàíî çà
äîïîìîãîþ ïðîãðàìíîãî ïàêåòà Omega project.
1. Allen R., Kennedy K. Optimizing Compilers for Modern Architectures. — San Francisco:
Morgan Kaufmann Publishers Inc., 2001. — 790 p.
2. Amarasinghe S. P., Lam M. S. Communication optimization and code generation for distributed
memory machines//Proc. of the SIGPLAN’93. — New Mexico, June, 1993. — P. 126—138.
3. Ancourt C., Irigoin F. Scanning polyhedra with do loops// Proc. of the Third ACM/ SIGPLAN
Symposium on Principles and Practice of Parallel Programming. April 21—24. — Virginia:
ACM Press, 1991. — P. 39—50.
4. Banerjee U. Unimodular transformations of double loops// Proc. of the Third Workshop on Lan-
guages and Compilers for Parallel Computing. August, 1990. — Irvine, CA. — P. 192—219.
5. Beletskyy V. Finding Synchronization-Free Parallelism for Non-uniform Loops // Proc. of the
Computational Science. — ICCS’2003. Lecture Notes in Computer Science. — Berlin/Hei-
delberg: Springer, 2003. — Vol. 2658. — P. 925—934.
6. Gavalda R., Ayguade E., Torres J. Obtaining Synchronization-Free Code with Maximum
Parallelism// Technical Report LSI-96-23-R. — Barcelona: Universitat PolitPcnica de
Catalunya, 1996. — 96 p.
7. Griebl M., Lengauer C. Classifying Loops for Space-Time Mapping//In Proc. of the
Euro-Par 1996. Lecture Notes in Computer Science. — Berlin: Springer-Verlag, 1996. —
P. 467— 474.
8. Huang C., Sadayappan P. Communication-free hyperplane partitioning of nested loops// J.
of Parallel and Distributed Computing. —1993.—No 19.—P. 90—102.
9. Kelly W., Pugh W. Minimizing communication while preserving parallelism//Proc. of the 1996
ACM International Conference on Supercomputing. — Philadelphia, USA, 1996. — P. 52—60.
10. Lim W., Lam M. S. Communication-free parallelization via affine transformations//Proc.
of the Seventh workshop on languages and compilers for parallel computing. — Ithaca,
NY, USA, 1994. — P. 92—106 .
11. Lim W., Cheong G. I., Lam M. S. An affine partitioning algorithm to maximize parallelism
and minimize communication//Proc. of the 13th ACM SIGARCH. International Conference
on Supercomputing. Rhodes, Greece, 1999. — P. 228—237.
12. Lim W., Lam M. S. Maximizing parallelism and minimizing synchronization with affine
transforms// Conf. Record of the 24th ACM SIGPLAN-SIGACT Symposium on Principles
of Programming Languages. Paris, France, 1997. — P. 201—214.
13. Wolf M. E. Improving locality and parallelism in nested loops: Ph. D. Dissertation
CSL-TR-92-538: — Stanford University. Dept. Computer Science, 1992.
14. Kelly W., Maslov V., Pugh W. et all. The omega library interface guide// Technical Report
CS-TR-3445. — University of Maryland, College Park, USA, 1995. — 33 p.
15. Beletskyy V., Siedlecki K. Finding Free Schedules for Non-uniform Loops// Proc. of the
Euro-Par 2003. Lecture Notes in Computer Science.— Berlin/Heidelberg: Springer, 2003.
Vol. 2790. — P. 297—302.
16. Boulet P., Darte A., Silber G.A., Vivien F. Loop parallelization algorithms: from parallelism
extraction to code generation// Parallel Computing, 1998.—No 24.— P. 421—444.
W. Bielecki, K. Siedlecki
52 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 3
17. Collard J. F., Feautrier P., Risset T. Construction of do loops from systems of afffne con-
straints. Construction of Do Loops from Systems of Affine Constraints//Parallel Processing
Letters. — 1995. — Ð. 421—436.
18. Feautrier P. Some efficient solutions to the affine scheduling problem, part i, one dimen-
sional time// International J. of Parallel Programming. — 1992. — Vol. 21, No5.— Ð. 313—
348.
19. Feautrier P. Some efficient solutions to the affine scheduling problem, part ii, multidimen-
sional time// Ibid. — P. 389— 420.
20. Pugh W., Wonnacott D. Constraint-based array dependence analysis//ACM Trans. on Pro-
gramming Languages and Systems. — 1998.— Vol. 20, No3. — P. 635—678.
21. Weiser M. Program slices: formal, psychological, and practical investigations of an automatic
program abstraction method// PhD thesis, University of Michigan, Ann Arbor, MI, 1979.
22. Weiser M. Program Slicing//IEEE Transactions on Software Engineering. — 1984. — V. SE-10,
No7. — Ð. 352—357.
23. Pugh W. , Rosser E. Iteration Space Slicing and Its Application to Communication Optimiza-
tion//Proc. of the International Conference on Supercomputing, July 7-11, Vienna, Austria,
1997. — P. 221—228.
24. Kelly W., Pugh W., Rosser E., Shpeisman T. Transitive Closure of Infinite Graphs and its Ap-
plications// Intern. J. of Parallel Programming. — 1996. — V. 24, No 6. — Ð. 579—598.
25. Darte A., Robert Y., Vivien F. Scheduling and Automatic Parallelization. — Boston :
Birkh�user, 2000. — 294 p.
26. Feautrier P. Toward automatic distribution//J. of Parallel Processing Letters. — 1994. —
Vol. 3, No4. — P. 233—244.
27. Quillere F., Rajopadhye S., Wilde D. Generation of efficient nested loops from polyhedra//
International J. of Parallel Programming. — 2000. — Vol. 28, No5. — P. 469—498.
Ïîñòóïèëà 07.11.06;
ïîñëå äîðàáîòêè 26.02.07
Finding Sources of Synchronization-free Slices in Perfectly Nested Loops
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 3 53
|