Метод семантичної верифікації застосувань у технології GPGPU

An application development and verification method for massively parallel systems using NVIDIA GPUs is proposed. The method allows creating models at different levels of abstraction using the apparatus of marked transition systems. The compositions (product) of such systems are transformed into a Pe...

Full description

Saved in:
Bibliographic Details
Date:2020
Main Authors: Kryvyi, Serhii L., Pogorilyy, Sergiy D., Slynko, Maksym S., Kramov, Artem A.
Format: Article
Language:English
Published: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2020
Subjects:
Online Access:https://journal.iasa.kpi.ua/article/view/213203
Tags: Add Tag
No Tags, Be the first to tag this record!
Journal Title:System research and information technologies
Download file: Pdf

Institution

System research and information technologies
_version_ 1867334407601258496
author Kryvyi, Serhii L.
Pogorilyy, Sergiy D.
Slynko, Maksym S.
Kramov, Artem A.
author_facet Kryvyi, Serhii L.
Pogorilyy, Sergiy D.
Slynko, Maksym S.
Kramov, Artem A.
author_institution_txt_mv [ { "author": "Serhii L. Kryvyi", "institution": "Київський Національний Університет імені Тараса Шевченка, Київ" }, { "author": "Sergiy D. Pogorilyy", "institution": "Київський Національний Університет імені Тараса Шевченка, Київ" }, { "author": "Maksym S. Slynko", "institution": "Київський Національний Університет імені Тараса Шевченка, Київ" }, { "author": "Artem A. Kramov", "institution": "Київський Національний Університет імені Тараса Шевченка, Київ" } ]
author_sort Kryvyi, Serhii L.
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2021-01-19T12:18:25Z
description An application development and verification method for massively parallel systems using NVIDIA GPUs is proposed. The method allows creating models at different levels of abstraction using the apparatus of marked transition systems. The compositions (product) of such systems are transformed into a Petri net, which are then analyzed by appropriate means. The proposed method allows specifying model properties by temporal logic formulas. This allows studying the properties of massively parallel systems which is almost impossible to analyze manually, since the number of execution threads in the latest NVIDIA video adapter architectures (Pascal, Volta, Turing, Ampere) is measured in hundreds of thousands or millions.
doi_str_mv 10.20535/SRIT.2308-8893.2020.3.01
first_indexed 2025-07-17T10:26:48Z
format Article
fulltext  S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov, 2020 Системні дослідження та інформаційні технології, 2020, № 3 7 TIДC ПРОГРЕСИВНІ ІНФОРМАЦІЙНІ ТЕХНОЛОГІЇ, ВИСОКОПРОДУКТИВНІ КОМП’ЮТЕРНІ СИСТЕМИ UDC 004.4 DOI: 10.20535/SRIT.2308-8893.2020.3.01 METHOD OF SEMANTIC APPLICATION VERIFICATION IN GPGPU TECHNOLOGY S.L. KRYVYI, S.D. POGORILYY, M.S. SLYNKO, A.A. KRAMOV Abstract. An application development and verification method for massively paral- lel systems using NVIDIA GPUs is proposed. The method allows creating models at different levels of abstraction using the apparatus of marked transition systems. The compositions (product) of such systems are transformed into a Petri net, which are then analyzed by appropriate means. The proposed method allows specifying model properties by temporal logic formulas. This allows studying the properties of mas- sively parallel systems which is almost impossible to analyze manually, since the number of execution threads in the latest NVIDIA video adapter architectures (Pas- cal, Volta, Turing, Ampere) is measured in hundreds of thousands or millions. Keywords: CUDA, graphical processing units (GPU), General Purpose Graphics Computing (GPGPU), transition system, Petri net, model design. INTRODUCTION The main feature of the latest graphical processing units (GPU) is the availability of a set of streaming multiprocessors (SM) that were used previously in image processing algorithms and tasks only. General Purpose Graphics Computing (GPGPU) technology is based on the use of a combination of GPUs working in parallel to process data using general-purpose algorithms (scientific or other, but not necessarily related to image processing). The latest graphics architectures from NVIDIA include Pascal, Volta, Amper, Thuring [1]. For this day, Volta is one of the most powerful GPU architectures, which is an indicator of achieve- ments in the field of high-performance artificial intelligence calculations (the GTX TITAN Z graphics adapter, built on the top of two powerful GK110 cores, can provide peak performance up to 8 teraflops, and each core can implement 2880 stream processors, which in total gives 5760 stream processors). NVIDIA, in addition, develops a series of video adapters focused on scien- tific applications and use in high-performance (cluster) computing. These GPUs lack some graphic-specific features and are widely used in the scientific field. This led to a significant increase in the number of supercomputers included in the TOP500 world most efficient computers [2] that utilize NVIDIA video adapters. Nowadays high-performance computing (HPC) trends get shifted from using clusters consisting of general-purpose modules to more specialized accelerated S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 8 components (in other words, from universal CPUs to other units — GPU, FPGA, etc.), that is, to less functional and less power consuming modules. Accelerators, unlike universal CPUs, can not run an operating system, and rely on external sys- tems for I/O operations or task scheduling. Their advantage in productivity lies solely in the fact that these elements are used in large groups simultaneously. The article proposes methods of modelling the properties of GPU accelerator architec- tures at different abstraction levels. In terms of programming, success of the most common GPGPU technologies (CUDA in particular) lies in the fact that they encapsulate the SIMD nature of GPU hardware. In most cases, the developer deals with individual streams that work with scalar data instead of warps [1] working with vectors. Development of the complicated, massively parallel systems that utilize video adapters requires new scientific methods to justify both the architecture of the system and applications for them. Since modern multi-threaded applications have hundreds of thousands and millions of threads, solving parallelization tasks for such systems makes impossible to use traditional engineering design ap- proaches and requires the use of a mathematical apparatus and formalization methods to substantiate the decisions made. Model justification is an effective way of algorithmic study of the parallel algorithms properties. One of the options for implementing such a justification is the use of the apparatus of algorithmic algebras [3], which allow formulating schemes of algorithms in the form of algebraic expressions that depend on various parameters, including software and hardware platforms, paradigms of parallel programming, etc. This paper focuses on using the transition systems (TS) [4] and their compositions as the main mathematical model, which allow creating models at different abstraction levels. Their properties can be investigated by translating in Petri networks (PN) [5] and can be specified by temporal logic formulas [6]. MODERN NVIDIA GPU ARCHITECTURES CUDA (Computational Unified Device Architecture) is a parallel computing ar- chitecture developed by NVIDIA to facilitate the GPGPU programming by using high-level APIs. Since the pilot CUDA platform had been introduced more than 10 years ago, each new generation of NVIDIA GPUs provided better application performance (for example, in floating point operations), increased energy effi- ciency, added important new computing capabilities, and streamlined graphics processor programming. Today, NVIDIA GPUs are leading computing devices that have, in some way, defined the artificial intelligence revolution (AI). Nowa- days GPGPU technology accelerates deep learning; high-precision text, voice, and other media data recognition systems; are used in the areas of molecular mo- deming, modelling of medical products, medical diagnosis, financial modelling, and others. Application instructions that are executed in a GPU-based heterogeneous environment can be logically decomposed into the following parts:  “host” instructions — blocks which are executed on the CPU;  “kernel” functions — instruction blocks which are executed on the GPUs. Host blocks define the context of the kernel functions execution, transferring data between the computer’s RAM and GPU memory. Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 9 All NVIDIA GPU architectures execute instructions in groups of 32 thread (known as warps) using the SIMT model (Single Instruction, Multiple Thread), that is, one instruction is executed by many threads simultaneously, although be- haviour of each individual thread is not limited by anything. However, architec- tures prior to Pascal include, among others, a software counter and a mask com- mon to all threads of the warp that determines which threads are active at any given moment. This means that in the case of a execution ow branching, each execution path uses only a subset of all threads, while the rest are deactivated. Once execution paths are converged, threads of a warp start being executed simul- taneously again. Such an implementation model gets rid of the necessity to track each indi- vidual thread state separately. However, tracing only a warp in general means re- ducing the level of parallelism if branching is present, as described above. In turn, this prevents the data exchange between the threads within a single warp, if those threads are at different execution stages, or if they execute the instructions in dif- ferent branches. That means that threads of different warps may execute instruc- tions in parallel, but threads of a single warp sometimes have to execute instruc- tions sequentially. Thus, algorithms that require data sharing at a high level of detail or that utilize synchronization tools (such as mutexes) can suffer from dead- locks. Therefore, developers have to rely on algorithms with minimal blocking support while using the NVIDIA GPUs of Pascal architecture or earlier (that is, more than 60% of devices, because Volta architecture was released only in 2017 and Turing — in 2018). More modern architectures, starting from Volta, include independent flow planning, which stores instruction counters and call stacks for each thread sepa- rately, which can be utilized for optimal resource usage or to allow one thread to wait for data from another. To increase the level of parallelism, Volta includes an optimizer that defines how to group active threads within the warp within the SIMT module. This maintains the high performance of the SIMT approach, as in previous NVIDIA GPUs, but with much greater flexibility: the threads can now perform various branching paths within a single warp. The verification methods proposed are illustrated on a simple system model analysis example. MODEL JUSTIFICATION AND VERIFICATION Testing process has always been the main method of increasing the reliability of programs, developed using traditional methods. Edsger Dijkstra once said: “Pro- gram testing can be used to show the presence of bugs, but never to show their absence!”. In addition, testing can not detect typical synchronization errors of parallel programs. Parallel programs may for years retain errors that manifest themselves after a long usage period as a reaction to a specific combination of numerous factors that have arisen (for example, due to the unpredictable rates of individual threads/processes execution in parallel programs). However, if any of the system properties can be expressed formally, for example, in the form of a mathematical logic formula, then analysis of this property can be performed by verification methods. Normally process of the system verification consists of the following parts: 1. Construction of a mathematical model of the system under analysis. S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 10 2. Definition of the properties to be checked in the form of a formal text (al- so known as specification). 3. Building a formal proof of the presence or absence of the property being verified. Usually, mathematical model of a system is a graph whose vertices are called states and represent situations (or situation classes), in which the system may be present at different times; whose edges can have labels depicting the ac- tions system can perform. The functioning of the system in this model is repre- sented by transitions along the edges of the graph from one state to another. If the passable edge has a label, then this label represents the action of the system, executed when passing from the state at the beginning of the edge to the state at its end. The choice of the abstraction level for the system modelling depends on many factors (algorithmic solvability, astronomical dimensions of the model, the absence of effective methods of formal analysis of properties etc.). In this regard, the informal rules of this choice are reduced to the following: system model should not be over-specified, because the excessive model complexity may cause significant computational problems during its formal analysis. On the other hand, system model should not be oversimplified: it should reflect those aspects of the system that are relevant to the properties being verified, and preserve all the properties of the simulated system which are of interest for analysis. Model checking (MC, [7]) approach is used to find a formal proof that the model does not meet its specification. In this paper we propose a new method of justifying that the model satisfies the specification. This paper focuses on the method of checking the conformity of a model an its specification, which uses the apparatus of TS and PN. Definition of a simple and labeled TS is given below, while definition of PN is well known and can be found in [7] if necessary. Definition 1. A simple transition system is ),,,(  RSA , where S — finite or infinite set of states; R — finite or infinite set of transitions; , — two mappings from S to R , which make a correspondence between a transition Rt and two states )(t and )(t , which are called respectively the beginning and end of the transition .t The transition t with beginning s and end s is written as follows: .ss  Sometimes transitions may have a common beginning or end or both. This means that the pair SR  :, is not necessarily an injective function. TS A is called finite if sets RS , are finite. If the set of states defines an initial state, such TS is labeled as ),,,,( 0sRSA  and is called initial TS. Such model as a simple TS may be enough to study properties of a model at a certain level of abstraction, but when it is necessary to carry out a more detailed analysis, labeled TS is more ap- propriate. Definition 2. Let X be an alphabet. The labeled transition system (LTS) is an ordered six ),,,,,( 0 hsTSA  where ),,,,( 0sTS  represent a TS, and h is a mapping from T to ,X which makes a correspondence between each transition t and its label .)( Xth  LTS is finite if sets ,S ,T X are finite. Transition label )(th may also be called an action, and the transition itself is written as follows: Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 11 )),(,( sths  or .)( ss th  Transition label set is often accompanied by a special label , which represents an internal action of the system that is not visible at a given level of modelling. Using LTS as a model of a real system allows analysing properties of actions associated with transitions, which is impossible by using a simple TS. Analysis and verification of the applications for graphic video adapters (NVIDIA in par- ticular) was chosen as a subject of the study, because this area perfectly illustrates the impossibility of manual verification, as the number of threads allocated for solving the problem is measured by hundreds of thousands (in Pascal/Volta archi- tectures). The use of LTS to construct a high-level model for substantiating the properties of a CUDA application was described in [8]. CUDA APPLICATION EXECUTION MODEL A generalized execution model in the NVIDIA CUDA architecture, based on LTS and Petri nets was presented in [9]. The main emphasis was put on obtaining a high-level model without a detailed examination of the labels semantics of each transition system. The following briefly recalls the main details: three LTS were emerged as a result of CUDA application decomposition: LTS ),,,,},,,,{( 1011132101 haRaaaaSA  — represents the warp that contains a set of instructions and sequentially executes them, where };,,,{ 43211 rrrrX  LTS ),,,,},,,,,,{( 202225432102 hbRbbbbbbSB  — represents a generalized information instruction, where 2 1 2 3 4 5 6{ , , , , , };X p p p p p p LTS ),,,,},,,,{( 3033332103 hcRccccSC  — represents a thread block execution process (warp scheduling) on the SM, where }.,,,{ 43213 qqqqX  Transition label functions 321 ,, hhh are shown below in the charts (Fig. 1, 2, 3). The main activities modelled at this level of abstractions include planning, choos- ing warp and instruction for it and providing exclusive access to computational resources for the execution time of each warp-instruction tuple. This paper shows creation of a model of the lower level of abstraction, which will allow us to ana- lyze the properties of actions associated with transitions, and the properties of the model in particular states. Fig. 1. LTS B of the information instruction b1 b2 b4 b2 b0 b5 p2 p5 p6 p3p1 p4 τ S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 12 Definition 1. General TS (GTS) is a TS ),,,,,,( 0 LAPsRXSA  where S — set of states; X — set of actions associated with transitions; SXSR  — transition relation; 0s — initial state from S ; AP — set of propositional formulas associated with states; : ( )L S B AP — state label function, where ( )B AP is the power set of .AP We introduce a set of Boolean variables: },,,,,{1 ninstrMemFiinstrArFinecinstrMemExcinstrArExeinstrMeminstrArV  , where ,0,0,0,0{)(1  instrMeminstrArFincinstrArExeinstrArh }0,0  ninstrMemFiecinstrMemEx ; }1{)( 11  instrArph — arithmetical instruction selection; }1{)( 41  instrMemph — global memory access selection; }1{)( 21  cinstrArExeph — arithmetical instruction execution; }1{)( 51  ecinstrMemExph — memory access execution; }1{)( 31  instrArFinph — retrieving results of an arithmetical operation; }1{)( 61  ninstrMemFiph — retrieving results of memory access operation. We define the state label function :1L };0{)( 01  instrMeminstrArbL 1 1( ) { 1};L b instrAr  1 2( ) { 1};L b instrArExec  };1{)( 31  instrMembL };1{)( 41  ecinstrMemExbL }.1{)( 51  instrArFinninstrMemFibL We introduce a set of Boolean variables ,,{2 warpBusywarpActiveV  }warpFin that are set by the following transition labels (Fig. 2): 2 1( ) { 1}h r warpActive  — warp activation — given warp gains access to SM resources to execute a single instruction; }1{)( 22  warpBusyrh — instruction is being executed by a warp; }1{)( 32  warpFinrh — instruction execution is finished; а1 а2 а3 а0 r1 r2 r3 r4 Fig. 2. LTS A of the instruction execution within a warp Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 13 }0,0,0{)( 42  warpFinwarpBusywarpActiverh — warp deactivation. State label function 2L is defined as following: };0{)( 02  warpFinwarpBusywarpActiveaL };1{)( 12  warpActiveaL };1{)( 22  warpBusyaL }.1{)( 32  warpFinaL We introduce a set of Boolean variables ,,{3 xecwarpInstrEelwarpInstrSV  }inwarpInstrF that are set by the following transition labels (Fig. 3): }1{)( 13  elwarpInstrSqh — warp and instruction selection; }1{)( 23  xecwarpInstrEqh — execution of the selected instruction by a selected warp; }1{)( 33  inwarpInstrFqh — confirmation of instruction execution finish; }1,1,0{)( 43  inwarpInstrFxecwarpInstrEelwarpInstrSqh — transi- tion to the next iteration. State label function 3L is defined as following: };0{)( 03  inwarpInstrFxecwarpInstrEelwarpInstrScL };1{)( 13  elwarpInstrScL };1{)( 23  xecwarpInstrEcL }.1{)( 33  inwarpInstrFcL Integration of multiple TS into a holistic system that orchestrates the joint work of all subsystems is performed depending on the requirements of the com- ponent interaction model (synchronous, asynchronous, parallel, sequential). These interaction methods are introduced using different TS composition types and the general notion of TS. To analyze the model properties, we consider the concept of parallel composition of the TS. There are several options of TS composition that model parallel functioning of multiple TS. The simplest one is the composition in which TSs work in paral- lel, but do not interact with each other. Such a composition is based on the con- cept of alternating actions (interleaving), which are performed by different sub- systems of the composition. In this case, the order of actions performed by each TS is preserved. This is a common way of modelling parallel interactions, which is based on the assumption that the result of a parallel execution of operations coincides with the result of their sequential execution. The formal definition of this composition is as follows. Let 0( , , , , , )i i i i i i iA S X R s AP L — transition systems, where .,...,2,1 ni  Parallel composition of TS nAAA ,...,, 21 with interleaving is TS а1 c2 c3 c0 q1 q2 q3 q4 Fig. 3. LTS C of the warp scheduler S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 14 1 2 0... ( , , , , , ),nA A A A S X R s AP L  where nSSSS  ...21 , 0s ),,...,,( 00 2 0 1 nsss ,...21 nXXXX  and transitions from R are defined as follows: transition Rssxss nn  )),...,(,),,...,(( 11 if and only if state 1( ,..., )ns s  differs from 1( ,..., )ns s by the value of not more than one component, that is there exists {1,2,..., }i n such that ,),,( iii Rsxs  where }.{ iXx Note that in this composition TS can be repeated, that is, a given iA may be included in TS A more than once. In addition, some of the TS iA may be compo- sitions of other TS: ....21 iniii AAAA  Therefore one can describe the whole sys- tem in a structured from by using a parallel composition. However, there may be achievable states in the resulting TS which are not desirable. Therefore a parallel composition with alternating actions does not always reflect the real situation when the TSs should interact with each other. Handshaking composition concept is a more adequate approach to describe parallel interaction. Such option describes a situation when different TS are synchronized by the actions in which multiple TSs participate simultaneously (data exchange etc.). Those actions are indicated by the same symbol in alphabets of all TS which take part in the interaction. Definition 2. Let 0( , , , , , )i i i i i i iA S X R s AP L — transition systems, where .2,1i Parallel composition of TS 21, AA with handshaking is a TS 1 2 0( , , , , , ),A A A S X R s AP L  where ,21 SSS  1 2 0 0 0( , ),s s s .21 XXX  Transitions that belong to T are defined as following: if 21 XXx  and 1 1 1( , , ) ,s x s R  2 2 2( , , ) ,s x s R  then 1 2 1 2(( , ), , ( , ))s s x s s R   ; if 21 \ XXx and 1 1 1( , , ) ,s x s R  then 1 2 1 2(( , ), , ( , ))s s x s s R  ; if 12 \ XXx and 2 2 2( , , ) ,s x s R  then 1 2 1 2(( , ), , ( , ))s s x s s R  . The following composition summarizes both previous concepts and is called a synchronized parallel composition or a synchronous product of TS (Fig. 4). b3,a2,c2 b0,a0,c0 b1,a2,c2 Instr b4,a2,c2 b0,a1,c1 b0,a3,c3 b2,a2,c2 b5,a2,c2 t0 t0 t2 t5 t3 t6 t4 t7 s0 s2 s3 s4 s7 s1 s5 s6 t9 t8 s8 Fig. 4. GTS A of synchronous product of CBA LTSLTSLTS  Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 15 Definition 3. nsssss AALAPsRXSA  ...),,,,,( 10 , where  1SSs nS , 1 0 0 0( ,..., )ns s s is called a synchronous product of TS nAAA ,...,, 21 where 0( , , , , , ).i i i i i i iA S A R s AP L Set of transitions is divided into two classes: asynchronous and synchronous. When synchronous transition from state nss ,...,1 occurs, some of its components change simultaneously while the rest remains un- changed. To describe this the symbol  is added to labels of each transition set iR (i.e. { }iX  ) along with a corresponding transition ( , , ).i is s Such a sym- bol shows that state does not change during a particular transition. Subset R of nRR  ...1 is called the synchronization constraints set. If TS is labeled, then the set of global transitions R corresponds to the set of transition labels .X In other words, ,...1 nXXX  where iX is an alphabet of LTS ,iA .,...,1 ni  An arbi- trary element 1( ,..., , )nA A A R of the set X is called the LTS synchronization vector. We build a model of application execution in the CUDA architecture in the form of synchronous product with the following global transitions (initial and fi- nal states of the transitions are omitted, since they are present in the model): ),,(),,,(),,,(),,,({ 34232212111  ptptqrptqrtR ; )}.,,(),,,(),,,(),,,(),,,( 44933867562245 qrtqrtptptqrpt  DETERMINATION OF SEMANTIC CORRECTNESS OF THE MODEL ACTIONS Definition function of iL allow to find the context of correct system functioning at each transition in each state. For example, if BLTS is in state ,0b values of both instrAr and instrMem should be .0 If system was moved to a state 0b after a number of allowed transitions and instrMem flag is set, this will mean that sys- tem is not semantically correct, even if the corresponding PN meets all reliability criteria. To analyze the semantic correctness of a model, it is necessary to determine the context of each global state and consider its conformity to the system under analysis. As shown above, transition system state context is specified by the state label function ,iL and the global state context can be retrieved by combining lo- cal state contexts of the components of the synchronous products. Each of the global states can be described as follows:  )()()()( 0302011 cLaLbLsL }0{}0{}0{  elwarpInstrSwarpActiveinstrMeminstrAr — instruction is absent, warp is not active, scheduler is ready for planning;  )()()()( 1312012 cLaLbLsL }1{}1{}0{  elwarpInstrSwarpActiveinstrMeminstrAr — warp and instruction are selected, but instruction type is not yet determined;  )()()()( 2322113 cLaLbLsL }1{}1{}1{  xecwarpInstrEwarpBusyinstrAr — S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 16 arithmetic instruction is selected for a given warp;  )()()()( 2322214 cLaLbLsL }1{}1{}1{  xecwarpInstrEwarpBusycinstrArExe — given warp executes arithmetic operation;  )()()()( 2322315 cLaLbLsL }1{}1{}1{  xecwarpInstrEwarpBusyinstrMem — memory access instruction is selected for a given warp;  )()()()( 2322416 cLaLbLsL }1{}1{}1{  xecwarpInstrEwarpBusyecinstrMemEx — given warp executes memory access instruction;  )()()()( 2322517 cLaLbLsL }1{}1{}1{  inwarpInstrFwarpBusyninstrMemFiinstrArFin — current instruction is executed regardless of its type;  )()()()( 3332018 cLaLbLsL }1{}1{}0{  inwarpInstrFwarpFininstrMeminstrAr — instruction is absent, warp and scheduler finished their work. All received contexts are semantically correct in accordance with the materials [1], thus, the set of global transitions of synchronous products is defined correctly. In addition to the above, the semantic correctness of the model is provided by the following properties:  mutual exclusion: a single instruction can be of one type only — either arithmetic or memory access;  fairness: if the warp is active, scheduler must provide an instruction to be executed;  liveliness: if the warp is active, one of the available types of instruction should be executed;  deactivation: warp scheduler does not activate the warp if there are no in- structions for execution. Sequence nnsss  ...2110 is called a finite execution in GTS, where 1( , , )i i is s T   for .1,...,1,0  ni The following finite execution paths exist in the received LTS: 1 1 1 2 2 3 3 4 4 7 8 8 9 1( )*Path s t s t s t s t s t s t s ; 2 1 1 2 5 5 6 6 7 7 8 8 9 1( )*,Path s t s t s t s t s t s t s where denotes the iterative operation of regular language. The language that cor- responds to the paths above is: 1 2 1 2 3 4 5 6 7 8 9( ) ( ( ) ( ))* ( ( ) )*.L Path L Path L Path t t t t t t t t t    And this language must correspond to the formula: 1 2 3 4 5 6 7 8( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ).L s L s L s L s L s L s L s L s Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 17 Let's check the semantic correctness properties of the model:  mutual exclusion: the following expression is always true: 3 5 2 2 3 2 1 1 1 3( ) ( ) ( ) ( )( ( ) ( ));L s L s L a L c L b L b    fairness: always 2 1 3 3( ) ( );L a L c  liveliness: always 2 1 1 1 2 2 3 2 1 3 2 2 3 2( ) ( ) ( ) ( ) ( ) ( ) ( );L a L b L a L c L b L a L c   deactivation: 3 0 1 0 2 0( ) ( ) ( ).L c L b L a FORMALIZATION OF THE GTS ANALYSIS PROCESS Analysis process presented in the previous section can be generalized and reused for any GTS. Definition 1. The subset of a set ( )*,B AP where ( )B AP is the power set of the set ,AP is called the linear-temporal property P over the set of atomic propositional formulas .AP Consequently, ( ( ))*.P B AP In our case set ( ( ))*B AP is a set of words of finite length constructed from concatenated formulas of ( ).B AP Assume there is ( , , , , , )A S X R I AP L and nsss ...10 is a sequence of states. Such sequence is called a path fragment if 1 ( )i is Post s  where ( ) ( , ),i iPost s Post s x  and :{),( SsxsPost i  },),,( Rsxsi  .Xx Sequence nsss ...10 for which )( nsPost is called the maximal path fragment. Definition 2. Word 0 1( ) ( )... ( )nL s L s L s is called a trace ))(( trace of a fi- nite sequence . Consequently, the set of traces is a set of finite words over the alphabet of the propositional formulas ( )B AP which are executed in states of this sequence. Denote },)({)(  tracetrace ( ) ( ( ))trace s trace Path s and ( ) ( )s ITraces A trace s  where )(sPath is maximal fragment of path  that begins in state .s In our case, the set of AP propositional formulas includes the following items: ,,,,,,{ ninstrMemFiinstrArFinecinstrMemExcinstrArExeinstrMeminstrArAP  }.,,,,, inwarpInstrFwarpFinxecwarpInstrEwarpBusyelwarpInstrSwarpActive The following words are traces of GTS A : 1 2 3 4 7 8 1( ) ( ) ( ) ( ) ( ) ( )p L s L s L s L s L s L s ; 2 2 5 6 7 8 1( ) ( ) ( ) ( ) ( ) ( )p L s L s L s L s L s L s . Let's define bad prefixes in these words as prefixes that violate the truth of ),( isL which mean the following words: )(...)()( 1321 sLsLsLp  ; )(...)()( 1322 sLsLsLp  ; )(...)()( 1523 sLsLsLp  ; ... ... ... ... ... ... ... ... ... ... )()( 12 sLsLpk  . S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 18 Therefore language )...(*)()( 121 kppppABadPref  is regular and is accepted by a finite automaton ),,,,( 0 FQfAPQB  where  FQ0 that is shown at Fig. 5. We will connect the GTS from Fig. 4 and automaton B with such a product BA that produces GTS A as result so that: ,QSS  R — the smallest relation defined by the rule , ),(),( )( qsqs qqss x sLx     },:),{( )( 000000 0 qqQqIsqsI sL   ,QPA  )(QBQLS  where ( , ) { }.L s q q  Then the correctness of GTS A functioning is expressed as a condition ,)()()()(  BLATracesABadPrefATraces where ( )L B is the language accepted by the automaton .B Consequently, if P is a property whose execution guarantees the correct functioning of GTS ,A then  )()( PBadPrefATraces . Thus, GTS A will look as shown on Fig. 6. There is a transition from each vertex, except for 0 0( , ),s q to the state 2( , *).s q Such transitions represent one of the bad prefix traces. For the sake of clarity only 4 examples of bad prefixes are left in Fig. 6. Fig. 5. Automaton B that accepts bad prefixes, ,00 Qq  ,0 Fq  Fq * p1 , p2 ** 1 ,, kpp  q0 q* s0, q0 s2, q0 s3, q0 s4, q0 s2, q * s1, q0 s8, q0 s5, q0 s6, q0 s7, q0 not(L(s2)) not(L(s1)) n ot(L (s 5 ) n ot (L (s 3 ) t9 t8 t6 t7 t0 t4 t2 t3 t1 t5 Fig. 6. GTS of BA product, where Fq * Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 19 DETERMINATION OF MODEL CORRECTNESS ON THE HIGH LEVEL OF ABSTRACTION In addition to semantic verification, it is important to check the redundancy of the system, deadlock/trap balance etc. To do this, we use verification at the highest level of abstraction, without need of transition label semantics analysis. The translation of the received synchronous product into the PN gives the network shown in Fig. 7. It is described in [10], [11] that the TS product seman- tics and the semantics of the corresponding PN are consistent in the sense that a sequence of global transitions represents the global history of the TS product A if and only if it is an admissible sequence of transitions in the PN. Accordingly, el- ements of the set R become transitions of the PN, and the global states of the TS product (the set of states of each TS involved in the synchronous product before or after the global transition) become the places of the received network. We build a PN by using the synchronization constraints set, and such a network simu- lates the interoperability of all subsystems. Recall that NVIDIA video adapters operate with multiple warps at the same time, so there is a situation of synchro- nous and asynchronous execution, since different instructions may have different execution times and will not reach synchronization location at the same time. Therefore using the Petri net apparatus is expedient. Let's analyze the presence and number of dead transitions in PN for one warp case. To do this, we form a state equation  kMMxA 0 b1,a2,c2 b0,a3,c3 b0,a1,c1 instr b5,a2,c2 b2,a2,c2 b4,a2,c2b3,a2,c2 b0,a0,c0 * 1t ** 1t t2 t9 t3 t5 t6 t4 t8 t7 t1 s2 s8 s3 s5 s4 s7 s6 s0 s1 … Fig. 7. PN that represents synchronous product CBA LTSLTSLTS  S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 20 0 dxA where 0 (1,1,0,0,0,0,0,0,0),M  (0,1,0,0,0,0,0,0,0),kM  )( 0MMd k  (see Table 1). T a b l e 1 . Petri net state equation matrix t s 1t 2t 3t 4t 5t 6t 7t 8t 9t 0( )kM M  0s -1 0 0 0 0 0 0 0 0 1 1s -1 0 0 0 0 0 0 0 1 0 2s 1 -1 0 0 -1 0 0 0 0 0 3s 0 1 -1 0 0 0 0 0 0 0 4s 0 0 1 -1 0 0 0 0 0 0 5s 0 0 0 0 1 -1 0 0 0 0 6s 0 0 0 0 0 1 -1 0 0 0 7s 0 0 0 1 0 0 1 -1 0 0 8s 0 0 0 0 0 0 0 1 -1 0 Applying the TSS-algorithm [9] to solve the state equation with the above matrix, we obtain the following solutions (Table 2). T a b l e 2 . Petri net state equation solutions 1t 2t 3t 4t 5t 6t 7t 8t 9t 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 As can be seen from the set of solutions, all transitions in the PN with the initial and final markings given above are alive (the value corresponding to each transition is positive at least in one of the solutions). In addition, the property of mutual exclusion has been verified: transitions that correspond only to one of the possible types of instructions are performed at each point of time. The analysis of the properties of the received PN also showed the absence of deadlocks, verified limitation and controllability [10]. As described above, this analysis did not take into account the properties of actions in transitions and properties associated with the states of the model. CONCLUSION This paper proposes the use of the TS mathematical apparatus to obtain a for- malized system specification in the GPGPU technology. The advantages of the existing model include the ability to reduce synchronous product to a PN, which allows for further verification by automated means. The ability to study the char- acteristics of the model created by the combination of LTS and PN apparatuses is shown. The model was analyzed to verify there are no dead transitions and places (without taking into account the semantics of the transition labels), and a separate analysis was performed to verify the semantic correctness of the model actions. Method of semantic application verification in GPGPU technology Системні дослідження та інформаційні технології, 2020, № 3 21 As a result of these actions, the proof of the correct construction of the model was obtained. The developed approach allows to simplify and reduce the processes of verification and testing of multi-threaded applications in computer systems that utilize video adapters. REFERENCES 1. Nvidia Data Center – Nvidia, 2018. [Online]. Available: https://www.nvidia.com/ en-us/data-center/. Accessed on: 2019, March 14. 2. TOP500 Lists - TOP500 Supercomputer Sites, 2018. [Online]. Available: https://www.top500.org/lists. Accessed on: 2019, March 14. 3. A.V. Anisimov, S.D. Pogorilyy, and D.Yu. Vitel, “About the Issue of Algorithms formalized Design for Parallel Computer Architectures”, Applied and Computational Mathematics, vol. 12, no. 2, pp.140–151, 2013. 4. A. Arnold, Finite Transition Systems: Semantics of Communicating Systems. Paris, France: Prentice Hall, 1994, 177 p. 5. T. Murata, “Petri nets: properties, analysis and applications”, in Proc. of the IEEE, 77:541.80, 1989. 6. M. Ben-Ari, Mathematical Logic for Computer Science. UK: Prentice Hall Interna- tional Ltd, 1993, 305 p. 7. E.M. Clarke, Jr., O. Grumberg, and D.A. Peled, Model Checking. USA: MIT Press, 1999. 8. S.D. Pogorilyy, S.L. Kryvyi, and M.S. Slynko, “Model justification of GPU-based applications”, Control Systems and Computers, vol. 4, pp. 46–56, 2018. 9. S.L. Kryvyi, Linear Diophantine constraints and their applications. Chernivtsi: Buk- rek Publishing House, 2015. 10. S.L. Kryvyi, S.D. Pogorilyy, and M.S. Slynko, “Transition systems as method of de- signing applications in GPGPU technology”, in Proc. 11-th international scientific and practical conference on programming UkrPROG’2018. 11. S.L. Kryvyi et al., “Design of Grid Structures on the Basis of Transition Systems with the Substantiation of the Correctness of Their Operation”, Cybernetics and Sys- tems Analysis, vol. 53, no. 1, pp.105–114, New York, USA:  Springer Science + Business Media, January 2017. Received 15.10.2020 INFORMATION ON THE ARTICLE Serhii L. Kryvyi, ORCID: 0000-0003-4231-0691, Taras Shevchenko National University of Kyiv, Ukraine, e-mail: sl.krivoi@gmail.com Sergiy D. Pogorilyy, ORCID: 0000-0002-6497-5056, Taras Shevchenko National Uni- versity of Kyiv, Ukraine, e-mail: sdp77@i.ua, sdp@univ.net.ua Maksym S. Slynko, ORCID: 0000-0001-9667-8729, Taras Shevchenko National Univer- sity of Kyiv, Ukraine, e-mail: maxim.slinko@gmail.com Artem A. Kramov, ORCID: 0000-0003-3631-1268, Taras Shevchenko National Univer- sity of Kyiv, Ukraine, e-mail: artemkramov@gmail.com МЕТОД СЕМАНТИЧНОЇ ВЕРИФІКАЦІЇ ЗАСТОСУВАНЬ У ТЕХНОЛОГІЇ GPGPU / С.Л. Кривий, С.Д. Погорілий, М.С. Слинько, А.А. Крамов Анотація. Запропоновано метод розроблення та верифікації застосувань для систем з масовим паралелізмом на основі відеоадаптерів від компанії NVIDIA, S.L. Kryvyi, S.D. Pogorilyy, M.S. Slynko, A.A. Kramov ISSN 1681–6048 System Research & Information Technologies, 2020, № 3 22 який дозволяє створювати абстракції різних рівнів за допомогою апарата роз- мічених транзиційних систем. Композиції таких систем трансформуються в мережі Петрі, які далі аналізуються відповідними засобами. Метод також дає змогу створювати моделі на різних рівнях абстракції, а їх властивості можуть специфікуватися формулами темпоральної логіки. Це дозволяє досліджувати властивості систем з масовим паралелізмом, які майже неможливо аналізувати вручну, оскільки кількість потоків у новітніх архітектурах відеоадаптерів (Pascal, Volta, Amper, Тюрінг), виділених для виконання коду, вимірюється сотнями тисяч або мільйонами. Ключові слова: CUDA, графічні процесори (GPU), графічні обчислення загального призначення (GPGPU), транзиційна система, мережа Петрі, побу- дова моделі. МЕТОД СЕМАНТИЧЕСКОЙ ВЕРИФИКАЦИИ ПРИЛОЖЕНИЙ В ТЕХНОЛОГИИ GPGPU / С.Л. Крывый, С.Д. Погорелый, М.С. Слинько, А.А. Крамов Аннотация. Предложен метод разработки и верификации приложений для систем с массовым параллелизмом на основе видеоадаптеров от компании NVIDIA, который позволяет создавать абстракции различных уровней с по- мощью аппарата размеченных транзиционных систем. Композиции таких сис- тем трансформируются в сети Петри, которые далее анализируются соответст- вующими средствами. Метод позволяет создавать модели на различных уровнях абстракции, а их свойства могут специфицироваться формулами тем- поральной логики. Это позволяет исследовать свойства систем с массовым па- раллелизмом, которые практически невозможно анализировать вручную, так как количество потоков в новейших архитектурах видеоадаптеров (Pascal, Volta, Amper, Тьюринг), выделенных для выполнения кода, измеряется сотня- ми тысяч или миллионами. Ключевые слова: CUDA, графические процессоры (GPU), графические вы- числения общего назначения (GPGPU), транзиционная система, сеть Петри, построение модели.
id journaliasakpiua-article-213203
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-07-17T10:26:48Z
publishDate 2020
publisher The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format ojs
resource_txt_mv journaliasakpiua/73/2aa6314f6e34776de3c4f83090be9973.pdf
spelling journaliasakpiua-article-2132032021-01-19T12:18:25Z Method of semantic application verification in GPGPU technology Метод семантической верификации приложений в технологии GPGPU Метод семантичної верифікації застосувань у технології GPGPU Kryvyi, Serhii L. Pogorilyy, Sergiy D. Slynko, Maksym S. Kramov, Artem A. CUDA graphical processing units (GPU) General Purpose Graphics Computing (GPGPU) transition system Petri net model design CUDA графічні процесори (GPU) графічні обчислення загального призначення (GPGPU) транзиційна система мережа Петрі побудова моделі CUDA графические процессоры (GPU) графические вычисления общего назначения (GPGPU) транзиционная система сеть Петри построение модели An application development and verification method for massively parallel systems using NVIDIA GPUs is proposed. The method allows creating models at different levels of abstraction using the apparatus of marked transition systems. The compositions (product) of such systems are transformed into a Petri net, which are then analyzed by appropriate means. The proposed method allows specifying model properties by temporal logic formulas. This allows studying the properties of massively parallel systems which is almost impossible to analyze manually, since the number of execution threads in the latest NVIDIA video adapter architectures (Pascal, Volta, Turing, Ampere) is measured in hundreds of thousands or millions. Предложен метод разработки и верификации приложений для систем с массовым параллелизмом на основе видеоадаптеров от компании NVIDIA, который позволяет создавать абстракции различных уровней с помощью аппарата размеченных транзиционных систем. Композиции таких систем трансформируются в сети Петри, которые далее анализируются соответствующими средствами. Метод позволяет создавать модели на различных уровнях абстракции, а их свойства могут специфицироваться формулами темпоральной логики. Это позволяет исследовать свойства систем с массовым параллелизмом, которые практически невозможно анализировать вручную, так как количество потоков в новейших архитектурах видеоадаптеров (Pascal, Volta, Amper, Тьюринг), выделенных для выполнения кода, измеряется сотнями тысяч или миллионами. Запропоновано метод розроблення та верифікації застосувань для систем з масовим паралелізмом на основі відеоадаптерів від компанії NVIDIA, який дозволяє створювати абстракції різних рівнів за допомогою апарата розмічених транзиційних систем. Композиції таких систем трансформуються в мережі Петрі, які далі аналізуються відповідними засобами. Метод також дає змогу створювати моделі на різних рівнях абстракції, а їх властивості можуть специфікуватися формулами темпоральної логіки. Це дозволяє досліджувати властивості систем з масовим паралелізмом, які майже неможливо аналізувати вручну, оскільки кількість потоків у новітніх архітектурах відеоадаптерів (Pascal, Volta, Amper, Тюрінг), виділених для виконання коду, вимірюється сотнями тисяч або мільйонами. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2020-12-07 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/213203 10.20535/SRIT.2308-8893.2020.3.01 System research and information technologies; No. 3 (2020); 7-22 Системные исследования и информационные технологии; № 3 (2020); 7-22 Системні дослідження та інформаційні технології; № 3 (2020); 7-22 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/213203/223554 Copyright (c) 2021 System research and information technologies
spellingShingle CUDA
графічні процесори (GPU)
графічні обчислення загального призначення (GPGPU)
транзиційна система
мережа Петрі
побудова моделі
Kryvyi, Serhii L.
Pogorilyy, Sergiy D.
Slynko, Maksym S.
Kramov, Artem A.
Метод семантичної верифікації застосувань у технології GPGPU
title Метод семантичної верифікації застосувань у технології GPGPU
title_alt Method of semantic application verification in GPGPU technology
Метод семантической верификации приложений в технологии GPGPU
title_full Метод семантичної верифікації застосувань у технології GPGPU
title_fullStr Метод семантичної верифікації застосувань у технології GPGPU
title_full_unstemmed Метод семантичної верифікації застосувань у технології GPGPU
title_short Метод семантичної верифікації застосувань у технології GPGPU
title_sort метод семантичної верифікації застосувань у технології gpgpu
topic CUDA
графічні процесори (GPU)
графічні обчислення загального призначення (GPGPU)
транзиційна система
мережа Петрі
побудова моделі
topic_facet CUDA
graphical processing units (GPU)
General Purpose Graphics Computing (GPGPU)
transition system
Petri net
model design
CUDA
графічні процесори (GPU)
графічні обчислення загального призначення (GPGPU)
транзиційна система
мережа Петрі
побудова моделі
CUDA
графические процессоры (GPU)
графические вычисления общего назначения (GPGPU)
транзиционная система
сеть Петри
построение модели
url https://journal.iasa.kpi.ua/article/view/213203
work_keys_str_mv AT kryvyiserhiil methodofsemanticapplicationverificationingpgputechnology
AT pogorilyysergiyd methodofsemanticapplicationverificationingpgputechnology
AT slynkomaksyms methodofsemanticapplicationverificationingpgputechnology
AT kramovartema methodofsemanticapplicationverificationingpgputechnology
AT kryvyiserhiil metodsemantičeskojverifikaciipriloženijvtehnologiigpgpu
AT pogorilyysergiyd metodsemantičeskojverifikaciipriloženijvtehnologiigpgpu
AT slynkomaksyms metodsemantičeskojverifikaciipriloženijvtehnologiigpgpu
AT kramovartema metodsemantičeskojverifikaciipriloženijvtehnologiigpgpu
AT kryvyiserhiil metodsemantičnoíverifíkacíízastosuvanʹutehnologíígpgpu
AT pogorilyysergiyd metodsemantičnoíverifíkacíízastosuvanʹutehnologíígpgpu
AT slynkomaksyms metodsemantičnoíverifíkacíízastosuvanʹutehnologíígpgpu
AT kramovartema metodsemantičnoíverifíkacíízastosuvanʹutehnologíígpgpu