Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors
A comparative analysis has been made to describe the potentialities of hardware and software tools of two most widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU architectures are exemplified by fragments of GPGPU programs. Time consumpti...
Gespeichert in:
| Veröffentlicht in: | Вопросы атомной науки и техники |
|---|---|
| Datum: | 2015 |
| Hauptverfasser: | , , , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
Національний науковий центр «Харківський фізико-технічний інститут» НАН України
2015
|
| Schlagworte: | |
| Online Zugang: | https://nasplib.isofts.kiev.ua/handle/123456789/112097 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Zitieren: | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors / V.A. Dudnik, V.I. Kudryavtsev, S.A. Us, M.V. Shestakov // Вопросы атомной науки и техники. — 2015. — № 3. — С. 148-153. — Бібліогр.: 3 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| _version_ | 1859610974092263424 |
|---|---|
| author | Dudnik, V.A. Kudryavtsev, V.I. Us, S.A. Shestakov, M.V. |
| author_facet | Dudnik, V.A. Kudryavtsev, V.I. Us, S.A. Shestakov, M.V. |
| citation_txt | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors / V.A. Dudnik, V.I. Kudryavtsev, S.A. Us, M.V. Shestakov // Вопросы атомной науки и техники. — 2015. — № 3. — С. 148-153. — Бібліогр.: 3 назв. — англ. |
| collection | DSpace DC |
| container_title | Вопросы атомной науки и техники |
| description | A comparative analysis has been made to describe the potentialities of hardware and software tools of two most widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU architectures are exemplified by fragments of GPGPU programs. Time consumption for the program development has been estimated. Some pieces of advice are given as to the optimum choice of the GPU type for speeding up the processing of scientific research results. Recommendations are formulated for the use of software tools that reduce the time of GPGPU application programming for the given types of graphic processors.
Зроблено порівняльний опис можливостей апаратних і програмних засобів двох найбільш поширених сучасних архітектур графічних процесорів (AMD і NVIDIA). Особливості і відмінності архітектури GPU ілюстровані прикладами фрагментів програм GPGPU. Приведена також порівняльна оцінка часових витрат на їх розробку. Дані поради з оптимального вибору типу GPU для прискорення обробки результатів наукових досліджень. Сформульовані рекомендації по використанню програмних інструментів, що дозволяють скоротити час розробки GPGPU-додатків для цих типів графічних процесорів.
Сделано сравнительное описание возможностей аппаратных и программных средств двух наиболее распространённых современных архитектур графических процессоров (AМD и NVIDIA). Особенности и различия архитектур GPU иллюстрированы примерами фрагментов программ GPGPU. Приведена также сравнительная оценка временных затрат на их разработку. Даны советы по оптимальному выбору типа GPU для ускорения обработки результатов научных исследований. Сформулированы рекомендации по использованию программных инструментов, позволяющих сократить время разработки GPGPU-приложений для этих типов графических процессоров.
|
| first_indexed | 2025-11-28T12:01:45Z |
| format | Article |
| fulltext |
COMPARISON BETWEEN RESEARCH DATA PROCESSING
CAPABILITIES OF AMD AND NVIDIA
ARCHITECTURE-BASED GRAPHIC PROCESSORS
V.A.Dudnik,∗V. I.Kudryavtsev, S.A.Us, M.V.Shestakov
National Science Center ”Kharkov Institute of Physics and Technology”, 61108, Kharkov, Ukraine
(Received January 26, 2015)
A comparative analysis has been made to describe the potentialities of hardware and software tools of two most
widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU
architectures are exemplified by fragments of GPGPU programs. Time consumption for the program development
has been estimated. Some pieces of advice are given as to the optimum choice of the GPU type for speeding up the
processing of scientific research results. Recommendations are formulated for the use of software tools that reduce
the time of GPGPU application programming for the given types of graphic processors.
PACS: 89.80.+h, 89.70.+c, 01.10.Hx
1. INTRODUCTION
The development of various architectures of graphic
processors attained its maximal variety in the nineties
of the last century. At that time, a great many com-
panies at the computer graphics market (S3 Graph-
ics, Matrox, 3D Labs, Cirrus Logic, Oak Technolog,
Realtek, XGI Technology Inc., Number Nine Visual
Technologies, etc.) made overtures concerning the
architectures of graphical processors. However, by
the present time, as a result of stiff competition, out
of a variety of the proposed architectures two archi-
tectures of operating companies NVIDIA and AMD
have taken the key positions at the market. Yet, when
considering modern GPU structures of the AMD and
NVIDIA processors, one can notice that there is more
similarity than difference between them (and this be-
ing despite a severe architecture competition). The
reasons for this outcome of GPU-processor develop-
ment are similar to the results of the development
of other high technology products (automobiles, air-
craft, etc.), because in the process of engineering de-
sign improvement we always have the interchange,
borrowing and use of most happy ideas with the result
that the competing companies arrive at very similar
engineering solutions.
2. AMD AND NV IDIA PROCESSOR
STRUCTURE
The similarity between the architectures of AMD and
NVIDIA processors can be explained by the fact that
from the outset the GPU microarchitecture was or-
ganized quite differently than that of ordinary CPU.
At the very beginning of the GPU development, the
graphics tasks implied an independent parallel data
processing, and therefore, the GPU architecture, un-
like the CPU architecture, was multithreaded right
from the start. Besides, the fundamental princi-
ples of the GPU organization were initially general
for video accelerators of all manufacturers, because
they had a single target task, namely, shader pro-
grams. Therefore, the general GPU structure of dif-
ferent manufacturers differed only slightly. The dif-
ferences concerned the details of microarchitecture
realization. The internal organization of the GPU
is similar for both the AMD architecture and the
NVIDIA architecture, i.e., it consists of a few tens
(30 for NVIDIA GT200, 20 for Evergreen, 16 for
Fermi) of central processing elements referred to in
the NVIDIA nomenclature as Streaming Multipro-
cessors, and in the ATI terminology as SIMD Engine
(miniprocessors). They can simultaneously perform
a set of computational processes, i.e., threads. Each
miniprocessor has a local storage of size 16 KB for
GT200, 32 KB for Evergreen and 64 KB for Fermi
(in actual fact, this is a programmed L1 cache). The
local storage is common for all the threads executed
in the miniprocessor. Apart from the local storage,
the miniprocessor has also another storage area, be-
ing approximately four times larger in storage capac-
ity in all the architectures under consideration. It
is shared in equal parts by all the executive threads,
these are registers for storing the temporary variable
and intermediate computation results. Each minipro-
cessor has a great number of computation modules
(8 for GT200, 16 for Radeon and 32 for Fermi), but
all of them can carry out the same instruction with
one and the same software address. However, the
operands can be different in this case, one’s own for
different threads. For example, the instruction ”to
∗Corresponding author. E-mail address: vladimir-1953@mail.ru
148 ISSN 1562-6016. PROBLEMS OF ATOMIC SCIENCE AND TECHNOLOGY, 2015, N3(97).
Series: Nuclear Physics Investigations (64), p.148-153.
add the contents of two registers” is executed simul-
taneously by all computing devices, but the regis-
ters are used different. If however, because of pro-
gram branching, the threads came apart in their way
of running the code, then the so-called serialization
takes place. That is, not all computation modules
are involved, because the threads supply different in-
structions for execution, while the computation mod-
ule block can carry out, as stated above, only the
instruction with one address. The computational ef-
ficiency falls down in this case with reference to the
maximum capability. Another peculiarity of the GPU
in comparison with the CPU is the absence of stack.
On account of a great many simultaneously execut-
ing threads, the GPU does not provide the stack,
which might store the function parameters and lo-
cal variables (simultaneously running 10000 threads
call for an enormous storage). Therefore, there is no
recursion in the GPU, and instead of the call, the
functions are substituted during compilation into the
GPGPU program code. The AMD have obtained a
full-scale support for general-purpose computations,
starting with the Evergreen family, where also Di-
rect X II specifications were first realized. Of impor-
tance is the from the brginning use of the AMD tech-
nology named VLIW (Very Long Instruction Word).
The AMD graphics processors have the same num-
ber of registers as, for example, GT200 have, the dif-
ference being that these are 128-bit vector registers
(e.g., using one single-cycle instruction, the number
a1xb1+a2xb2+a3xb3+a4xb4 or the two-dimensional
vector (a1xb1+a2xb2, a3xb3+a4xb4) can be com-
puted). Until recently, NVIDIA supported only
scalar simple instructions operating on scalar regis-
ters and realizing a simple classical RISC, but since
2009 the NVIDIA programmers have presented a fur-
ther development of the CUDA platform, namely, the
architecture FERMI, and further, Kepler. The sup-
port of one-precision and double-precision floating
point computations realized in a new architecture,
was one of the key demands of the high-performance
computing market.
3. DIFFERENCES BETWEEN AMD AND
NV IDIA MICROSTRUCTURES
Here we mention the principal potentialities of ad-
vanced technical solutions of NVIDIA company,
which specified the main differences between GPU
architectures proposed at present by AMD and
NVIDIA. These differences are due to the initial
tendency of using the NVIDIA graphic processors
not only for processing the computer graphics. The
NVIDIA also marketed its architecture as a means for
high-performance computing. That implied both a
high speed of computation operations and a high reli-
ability combined with high programmability. There-
fore, the last realizations of GPU by NVIDIA have
incorporated the possibilities of finding and correct-
ing errors in the operating storage and cache-memory
subsystems. That provided the fault tolerance and
performance reliability of computational algorithms
comparable with the CPU. Essential modifications
have been made in the memory structure of NVIDIA
graphic processors. The appearance of L2-cache in
the FERMI architecture has radically (by a factor
of a few tens) accelerated operations with memory,
thereby enabling one to organize the internal mem-
ory manager, which is sufficiently rapid to allocate
the memory by the machine language C memory
control functions (malloc/free) for each thread in the
course of operation. Paralleling of global videomem-
ory access has been improved and the general address
space has been realized for all the memory of CPU
and GPU, thereby making it possible to unite in a
single address space all the memory visible for the
computational thread. Besides, the changes in the
memory structure have provided a way of using re-
cursion functions (Fig.1).
Fig.1. NVIDIA GPU structure
The simultaneous optimized execution of several ker-
nels, realized in the NVIDIA architecture, permits or-
ganization of simultaneous running of several CUDA
functions of the same application provided that one
CUDA function cannot fully load the computational
capability of the GPU (as the GPU analogue of the
multitask mode for multi-core CPUs). Two inter-
faces for copy operations, realized in the NVIDIA
GPU, have provided practically twice as fast data
exchange due to simultaneous copying of data from
the memory of CPU multiprocessors to the GPU
and vice versa (from the GPU memory to the CPU
memory). Unlike the NVIDIA GPU, new models of
AMD video adapters differ from the previous versions
practically only in quantitative characteristics. They
are still based on the dated architecture Graphics
Core Next (GCN) Tahiti. This architecture forms
the basis for all present-day engineering solutions of
the company, and even the latest graphic chip Hawaii
differs from Tahiti only in a greater number of ex-
ecution devices, and in some computational power
modifications (e.g., as a support of a greater number
of simultaneously executed instruction streams), as
well as in support of some options Direct X 11.2, and
in the improved technology AMD Power Tune. The
present-day AMD graphic processors still have only
one global read-write memory, and many different
149
sources of texture memory and constant memory,
both being read-only memories. The peculiarity of
the constant memory is the availability of caching
at the access for all GPU streams to one data area
(operates as quickly as the registers do). Another
peculiarity of the texture memory of AMD graphic
processors is the read caching (from 8 Kb on a per
data flow processor basis), and also, the access to
the memory in real coordinates. Though the L1-
L2 cache sizes in the NVIDIA and AMD cards are
approximately similar, this being evidently due to
optimality requirements from the point of view of
game graphics, the latency of access to these caches
is essentially different. The access latency for the
NVIDIA is greater, and the texture caches in the
NVIDIA GeForce first of all aid to reduce the load
on the memory bus rather than immediately acceler-
ate the data access. This is not noticeable in graphics
programs, but is of importance for general-purpose
programs. Meanwhile, in the AMD Radeon the la-
tency of the texture cache is lower, but the latency
of miniprocessors local memory is higher. The fol-
lowing example can be offered. For optimum matrix
multiplication on the NVIDIA cards it is best to use
the local memory, loading it with the matrix, one
block at a time, while in the AMD case, it is better
to rely on a low-latency texture cache, reading the
matrix elements when required. But it appears to
be a rather fine optimization and for the algorithm
already adapted in principle to the GPU. (Fig.2).
Fig.2. AMD GPU structure
The AMD uses its own instruction placement for-
mat in the computer code. They are arranged not
in succession (according to the program listing), but
by sections. First goes the section of conditional
transfer instructions, which comprise references to
the sections of continuous arithmetic instructions
corresponding to different transition paths. They
are called VLIW bundles. These sections comprise
only arithmetic instructions with the data from the
registers or the local memory. This organization sim-
plifies the control over the instruction flow and its
delivery to the executing devices. It is all the more
reasonable, considering that the VLIW instructions
are rather large in size. The AMD structure also pro-
vides sections for the memory access instructions. In
the NVIDIA GPU the instructions are arranged ac-
cording to the listing (for possibilities of debugging,
testing and optimization). When programming the
NVIDIA GPU, this permits the use of the methods
developed when writing programs for the CPU.
4. DIFFERENCES BETWEEN AMD AND
NV IDIA IN THE SOFTWARE FOR GPU
APPLICATION DEVELOPMENT
The supporting software in using Radeon products
for fast computations is still essentially behind the
development of the hardware (unlike the NVIDIA sit-
uation). The AMD-manufactured OpenCL-compiler
once too often produced an erroneous code or re-
fused to compile the code from the correct source
code. And only recently a release with a higher op-
erational capability has appeared. The absence of
function libraries is also typical of the AMD. For
example, there are no sine, cosine and exponent for
double-precision real numbers. For programming
the applications for the AMD GPU, the firm-specific
technologies Compute Abstraction Layer (CAL) and
Intermediate Language (IL) have been designed. The
CAL technology serves for writing the code interact-
ing with the GPU and is executed by the CPU, while
the IL technology permits writing the code, which
will be executed directly by the GPU. The code for
the AMD GPU is designed as shaders. Below we give
an example of the program code for the AMD IL.
We calculate 0.z being the global flow
identifier(uint)
umad r0.__z_, r0.wwww, cb0[0].yyyy,
r0.zzzz
; We save the first part of the data in the
register
ftoi r1.x___, vWinCoord0.xxxx
mov r1._y__, r0.zzzz
mov r1.__z_, cb[0].xxxx
mov r1.___w, l0.yyyy ;
We calculate the output buffer shift g[] umul
r0.__z_,r0.zzzz,l0.wwww
; Save the first part of the data in the
storage
mov g[r0.z+0].xyzw, r1.xyzw
; Load the texture data i0 ;
Preliminarily, we transfer the coordinates
to float and add 0.5
itof r0.xy__, r0.xyyy
add r0.xy__, r0.xyyy, l0.zzzz
sample_resource(0)_sampler(0)_aoffimmi(0
,0,0) r1, r0
; Save the
second part of the data in the
storage
mov g[r0.z+1].xyzw, r1.xyzw
; Exit from the main function endmain
150
; Program code termination
end
The IL code is looks like the assembly code, but there
is no sense in trying to make use of optimization pro-
cedures typical of the assembler program (rearrange-
ment of independent operations, precomputation of
constant operators), because this is a pseudoassem-
bler program, and only the IL compiler can carry out
a correct optimization of the code. The usage of the
second constituent of the AMD development tools,
i.e., Compute Abstraction Layer (CAL), is necessary
for organization of the interaction between the CPU-
executed program parts with the computing proce-
dures executed by the GPU:
• Driver initialization
• Obtaining information about all supported
GPU
• Memory allocating and copying
• Compilation and loading of the GPU kernel
• Kernel run for execution
• Operation synchronization with CPU
Unlike the NIVIDIA CUDA, which has both the
Run-time API and Driver API, the AMD need only
the Driver API. Below we give the program code
fragment for the CAL, which executes data copying
into the GPU memory:
unsigned int pitch;
unsigned char* mappedPointer;
unsignedchar*dataBuffer; char*dataBuffer;
CALresult result =
calResMap(
(CALvoid**)&mappedPointer, &pitch, resource,
0 );
unsigned int width;
unsigned int height;
unsigned int elementSize = 16;
if( pitch > width ) {
for( uint index = 0; index < height;
++index )
{ memcpy( mappedPointer + index *
pitch * elementSize,
dataBuffer + index * width *
elementSize,
width * elementSize );
}
} else {
memcpy( mappedPointer, dataBuffer,
width * height * elementSize );
}
It can be seen that the program for the CAL repre-
sents the program written in the language similar to
C, and comprises specific subroutine call sequences.
However, irrespective of a rigid binding to the Win-
dows 7 platform, the use of API Direct Compute 5.0
seems more promising for programming Radeon video
cards, because it is much simpler than OpenCL and
is expected to be more stable.
5. DEVELOPMENT TOOLS FOR NV IDIA
GPU
The hardware performance capabilities of the
NVIDIA GPU architecture have made it possible to
develop the software solution for improving the pro-
cesses of creating and checking of CUDA applications
- NVIDIA NEXUS. The C++ support is the most
important peculiarity of the NVIDIA software devel-
opment tools. Furthermore, the interaction is pro-
vided between the mechanisms of graphics processing
and the tools for executing general-purpose compu-
tations. For data processing through the CUDA
mechanism, the Open GL library can be used; as
well the OpenCL is available for the development of
applications. Below, by way of example, we give the
program code fragment for NVIDIA CUDA:
#include <stdio.h>
#include <stdio.h>
#include <assert.h>
#include <assert.h>
#include <cuda.h>
#include <cuda.h>
int main(void) int main(void)
{
float *a_h, *b_h; // pointers to host memory
float *a_h, *b_h; float *a_d, *b_d;
// pointers to device memory
float *a_d, *b_d;
int i;
int i;
// allocate arrays on host
a_h = (float *)malloc(sizeof(float)*N);
a_h = (float*)malloc(sizeof(float)*N);
b_h = (float *)malloc(sizeof(float)*N);
b_h = (float *)malloc(sizeof(float)*N);
// allocate arrays on device
cudaMalloc((void **) &a_d, sizeof(float)*N);
cudaMalloc((void **) &a_d, sizeof(float)*N);
cudaMalloc((void **) &b_d, sizeof(float)*N);
cudaMalloc((void **) &b_d, sizeof(float)*N);
// send data from host to device: a_h to a_d
cudaMemcpy(a_d, a_h, sizeof(float)*N,
cudaMemcpyHostToDevice);
151
cudaMemcpy(a_d, a_h, sizeof(float)*N,
cudaMemcpyHostToDevice);
// copy data within device: a_d to b_d
cudaMemcpy(b_d, a_d, sizeof(float)*N,
cudaMemcpyDeviceToDevice);
cudaMemcpy(b_d, a_d, sizeof(float)*N,
cudaMemcpyDeviceToDevice);
// Defect deletion
MedianFilter(b_d);
// Data retrieval from device: b_d to b_h
cudaMemcpy(b_h, b_d, sizeof(float)*N,
udaMemcpyDeviceToHost);
cudaMemcpy(b_h, b_d, sizeof(float)*N,
cudaMemcpyDeviceToHost);
// cleanup
// Resource deallocation
free(a_h);
free(b_h);
free(a_h);
free(b_h);
cudaFree(a_d);cudaFree(b_d);
cudaFree(a_d); cudaFree(b_d);
It can be seen that the program for NVIDIA GPU is
very similar in its form and structure to the custom-
ary programs written in C++ for the CPU. But, per-
haps, the most essential novelties realized in NVIDIA
NEXUS are the possibilities of full-rate application
debugging into the GPU. Previously, it has been nec-
essary to use the GPU emulation program for debug-
ging. The use of NVIDIA NEXUS for Microsoft Vi-
sual Studio will make it possible to avoid the majority
of the existing problems of debugging, and thus, to
increase the speed of application development. Let us
consider in more detail the new debugging features of
NEXUS. The NEXUS debugger supports the code de-
bugging by CUDA C and HLSL directly on the GPU
hardware in the working space of Visual Studio 2008,
and includes the following performance capabilities:
• CUDA home page provides complete informa-
tion about CUDA run state in the user application.
Users can filter and obtain detailed information on
the exception cases, breakpoints, facts added to the
database, MMU errors, and also, can easily switch
over for problem debugging.
• CUDA Warp Watchprovides a more efficient
technique of resident threads navigation and
threads state visualization at the point of warp.
• Supports graphics and GPU computing.A
simple debugging of shaders or the programs of sci-
entific and technical computations directly into the
GPU.
• Parallel-aware- debugging of applications us-
ing thousands of the data processing streams (or
graphic primitives).
• Source breakpoints - breakpoint at any points
(using the hardware evaluation of conditions break-
points);
• Memory inspection- direct control and image
of the GPU memory with the use of Visual Studio
Memory Window.
• Data breakpoints - writing breakpoint wherever
in the memory;
• Memory Checker- halting on running-out of the
allocated memory.
• Trace - journaling of effects and events executed
by CPU and GPU in the chosen correlated line. It
includes:
1.CUDA C, DX10, Open GL and Cg API calls;
2.GPU - Host memory transfers;
3.GPU workload executions;
4.CPU core, thread and process events;
5.Custom user events – Mark custom events or time
ranges using a C API.
A further tool is the NEXUS Analyzer that sup-
ports tracing and profiling of GPU applications; acqui-
sition and analysis of information on the level of core
efficiency, including hardware productivity counters.
6. COMPARISON BETWEEN THE GRAPHIC
CAPABILITIES OF NV IDIA AND AMD
ARCHITECTURE
The comparison between the AMD and NVIDIA archi-
tecture capabilities of data displaying shows that mainly
they are similar in both the rate and quality. Among
most useful capabilities incorporated into the graphic
architectures we may point out the technology, which
provides a combined connection of several displays to the
personal computer and, thus, permits their use as a single
large display consisting of several parts. In NVIDIA, this
technology is called the multimonitor SLI, and the similar
technology of AMD is known as Eyefinity. Comparing the
capabilities of Eyefinity and multimonitor SLI, it is wor-
thy of note that the AMD multimonitor appears a more
widespread and more advanced technology. Unlike the
multimonitor NVIDIA SLI technology, which is provided
only for rather expensive video accelerators of profes-
sional series Quadro, the AMD Eyefinity technology is
maintained practically for all AMD-made display cards.
Besides, the AMD Eyefinity capabilities of controlling the
inline monitors are of larger scale than those of NVIDIA.
And another thing, the Eyefinity offers the possibility
of connecting more monitors to one video accelerator.
In AMD graphic processors (starting with RV870) the
image output unit has been updated in such a way that
the chip supports the imaging on the output devices in
quantity 6 pieces inclusive. The number of supported
monitors depends on a specific card capability and can
be three or six (through the monitor interface Display-
Port). Multimonitor configurations can operate in the
clone and desktop extend modes. One large image can
be composed of several monitors; this is applicable for
both theWindows desktop visual display, and the full-
screen video and 3D applications. This special feature
is supported in the operating systems Windows Vista,
Windows 7 and Windows 8, Linux, and also, in other OS
(Fig.3).
152
Fig.3. Multimonitor configurations in the clone and
desktop extend modes
The AMD has announced the team work with display
manufacturers, in particular, with Samsung Company.
They come out with special versions of monitors having
screens 23′′ in size, supported by the 1920x1080 pixels
resolution, to the interfaces DisplayPort, DVI and VGA,
and also, a it has very narrow screen frame, 7 to 8 mm
in size. The use of multimonitor connection may ap-
pear very efficient in many cases, where the operation
of large-size monitors permits one both to raise labor
productivity and to reduce the performance time. These
are the work with large electronic circuit diagrams and
complicated equipment drawings, the program develop-
ment, the generation and analysis of data represented as
large-size tables, etc. It should be noted that the cost of
a large monitor composed of several monitors of smaller
size turns out to be several orders of magnitude lower
than the cost of a ”monolithic” (one-piece) monitor; and
that may essentially extend the field of application of
these monitors.
7. CONCLUSIONS
In view of the foregoing, some general recommendations
on the use of graphic accelerators AMD and NVIDIA can
be formulated:
• To attain the maximum graphic performance capa-
bilities with minimum expenses (support of multi-
screen monitors, 3D graphics formation), it is more
reasonable to use the AMD architecture.
• To create programs relying on the GPGPU poten-
tialities, it is preferable to use NVIDIA video accel-
erators that offer more powerful,different-level and
versatile means for program development and de-
bugging.
References
1. 1. A. Zubinsky. NVIDIA Cuda: unification of graph-
ics and computations. (In Russian). 8 May. 2007.
http://itc.ua/node/27969
2. D. Luebke. Graphics CPU-not only for graphics(In
Russian). http://www.osp.ru/os/2007/02/4106864/
3. 3. David Luebke, Greg Humphreys. How GPUs
Work.. IEEE Computer, February, 2007. IEEE Com-
puter Society, 2007.
ÑÐÀÂÍÅÍÈÅ ÂÎÇÌÎÆÍÎÑÒÅÉ ÏÎ ÎÁÐÀÁÎÒÊÅ ÐÅÇÓËÜÒÀÒΠÍÀÓ×ÍÛÕ
ÈÑÑËÅÄÎÂÀÍÈÉ ÄËß ÃÐÀÔÈ×ÅÑÊÈÕ ÏÐÎÖÅÑÑÎÐΠÀÐÕÈÒÅÊÒÓÐ
AMD È NVIDIA
Â.À.Äóäíèê, Â.È.Êóäðÿâöåâ, Ñ.À.Óñ, Ì.Â.Øåñòàêîâ
Ñäåëàíî ñðàâíèòåëüíîå îïèñàíèå âîçìîæíîñòåé àïïàðàòíûõ è ïðîãðàììíûõ ñðåäñòâ äâóõ íàèáîëåå
ðàñïðîñòðàíëííûõ ñîâðåìåííûõ àðõèòåêòóð ãðàôè÷åñêèõ ïðîöåññîðîâ (AÌD è NVIDIA). Îñîáåííîñòè
è ðàçëè÷èÿ àðõèòåêòóð GPU èëëþñòðèðîâàíû ïðèìåðàìè ôðàãìåíòîâ ïðîãðàìì GPGPU. Ïðèâåäå-
íà òàêæå ñðàâíèòåëüíàÿ îöåíêà âðåìåííûõ çàòðàò íà èõ ðàçðàáîòêó. Äàíû ñîâåòû ïî îïòèìàëüíîìó
âûáîðó òèïà GPU äëÿ óñêîðåíèÿ îáðàáîòêè ðåçóëüòàòîâ íàó÷íûõ èññëåäîâàíèé. Ñôîðìóëèðîâàíû ðå-
êîìåíäàöèè ïî èñïîëüçîâàíèþ ïðîãðàììíûõ èíñòðóìåíòîâ, ïîçâîëÿþùèõ ñîêðàòèòü âðåìÿ ðàçðàáîòêè
GPGPU-ïðèëîæåíèé äëÿ ýòèõ òèïîâ ãðàôè÷åñêèõ ïðîöåññîðîâ.
ÏÎÐIÂÍßÍÍß ÌÎÆËÈÂÎÑÒÅÉ Ç ÎÁÐÎÁÊÈ ÐÅÇÓËÜÒÀÒI ÍÀÓÊÎÂÈÕ
ÄÎÑËIÄÆÅÍÜ ÄËß ÃÐÀÔI×ÍÈÕ ÏÐÎÖÅÑÎÐI ÀÐÕIÒÅÊÒÓÐ AMD I NVIDIA
Â.Î.Äóäíiê, Â. I.Êóäðÿâöåâ, Ñ.Î.Óñ, Ì.Â.Øåñòàêîâ
Çðîáëåíî ïîðiâíÿëüíèé îïèñ ìîæëèâîñòåé àïàðàòíèõ i ïðîãðàìíèõ çàñîáiâ äâîõ íàéáiëüø ïîøèðåíèõ
ñó÷àñíèõ àðõiòåêòóð ãðàôi÷íèõ ïðîöåñîðiâ (AMD i NVIDIA). Îñîáëèâîñòi i âiäìiííîñòi àðõiòåêòóðè
GPU iëþñòðîâàíi ïðèêëàäàìè ôðàãìåíòiâ ïðîãðàì GPGPU. Ïðèâåäåíà òàêîæ ïîðiâíÿëüíà îöiíêà ÷à-
ñîâèõ âèòðàò íà ��õ ðîçðîáêó. Äàíi ïîðàäè ç îïòèìàëüíîãî âèáîðó òèïó GPU äëÿ ïðèñêîðåííÿ îáðîáêè
ðåçóëüòàòiâ íàóêîâèõ äîñëiäæåíü. Ñôîðìóëüîâàíi ðåêîìåíäàöi�� ïî âèêîðèñòàííþ ïðîãðàìíèõ iíñòðó-
ìåíòiâ, ùî äîçâîëÿþòü ñêîðîòèòè ÷àñ ðîçðîáêè GPGPU-äîäàòêiâ äëÿ öèõ òèïiâ ãðàôi÷íèõ ïðîöåñîðiâ.
153
|
| id | nasplib_isofts_kiev_ua-123456789-112097 |
| institution | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| issn | 1562-6016 |
| language | English |
| last_indexed | 2025-11-28T12:01:45Z |
| publishDate | 2015 |
| publisher | Національний науковий центр «Харківський фізико-технічний інститут» НАН України |
| record_format | dspace |
| spelling | Dudnik, V.A. Kudryavtsev, V.I. Us, S.A. Shestakov, M.V. 2017-01-17T15:55:57Z 2017-01-17T15:55:57Z 2015 Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors / V.A. Dudnik, V.I. Kudryavtsev, S.A. Us, M.V. Shestakov // Вопросы атомной науки и техники. — 2015. — № 3. — С. 148-153. — Бібліогр.: 3 назв. — англ. 1562-6016 PACS: 89.80.+h, 89.70.+c, 01.10.Hx https://nasplib.isofts.kiev.ua/handle/123456789/112097 A comparative analysis has been made to describe the potentialities of hardware and software tools of two most widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU architectures are exemplified by fragments of GPGPU programs. Time consumption for the program development has been estimated. Some pieces of advice are given as to the optimum choice of the GPU type for speeding up the processing of scientific research results. Recommendations are formulated for the use of software tools that reduce the time of GPGPU application programming for the given types of graphic processors. Зроблено порівняльний опис можливостей апаратних і програмних засобів двох найбільш поширених сучасних архітектур графічних процесорів (AMD і NVIDIA). Особливості і відмінності архітектури GPU ілюстровані прикладами фрагментів програм GPGPU. Приведена також порівняльна оцінка часових витрат на їх розробку. Дані поради з оптимального вибору типу GPU для прискорення обробки результатів наукових досліджень. Сформульовані рекомендації по використанню програмних інструментів, що дозволяють скоротити час розробки GPGPU-додатків для цих типів графічних процесорів. Сделано сравнительное описание возможностей аппаратных и программных средств двух наиболее распространённых современных архитектур графических процессоров (AМD и NVIDIA). Особенности и различия архитектур GPU иллюстрированы примерами фрагментов программ GPGPU. Приведена также сравнительная оценка временных затрат на их разработку. Даны советы по оптимальному выбору типа GPU для ускорения обработки результатов научных исследований. Сформулированы рекомендации по использованию программных инструментов, позволяющих сократить время разработки GPGPU-приложений для этих типов графических процессоров. en Національний науковий центр «Харківський фізико-технічний інститут» НАН України Вопросы атомной науки и техники Вычислительные и модельные системы Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors Порівняння можливостей з обробки результатів наукових досліджень для графічних процесорів архітектури AMD і NVIDIA Сравнение возможностей по обработке результатов научных исследований для графических процессоров архитектур AMD и NVIDIA Article published earlier |
| spellingShingle | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors Dudnik, V.A. Kudryavtsev, V.I. Us, S.A. Shestakov, M.V. Вычислительные и модельные системы |
| title | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors |
| title_alt | Порівняння можливостей з обробки результатів наукових досліджень для графічних процесорів архітектури AMD і NVIDIA Сравнение возможностей по обработке результатов научных исследований для графических процессоров архитектур AMD и NVIDIA |
| title_full | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors |
| title_fullStr | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors |
| title_full_unstemmed | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors |
| title_short | Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors |
| title_sort | comparison between research data processing capabilities of amd and nvidia architecture-based graphic processors |
| topic | Вычислительные и модельные системы |
| topic_facet | Вычислительные и модельные системы |
| url | https://nasplib.isofts.kiev.ua/handle/123456789/112097 |
| work_keys_str_mv | AT dudnikva comparisonbetweenresearchdataprocessingcapabilitiesofamdandnvidiaarchitecturebasedgraphicprocessors AT kudryavtsevvi comparisonbetweenresearchdataprocessingcapabilitiesofamdandnvidiaarchitecturebasedgraphicprocessors AT ussa comparisonbetweenresearchdataprocessingcapabilitiesofamdandnvidiaarchitecturebasedgraphicprocessors AT shestakovmv comparisonbetweenresearchdataprocessingcapabilitiesofamdandnvidiaarchitecturebasedgraphicprocessors AT dudnikva porívnânnâmožlivosteizobrobkirezulʹtatívnaukovihdoslídženʹdlâgrafíčnihprocesorívarhítekturiamdínvidia AT kudryavtsevvi porívnânnâmožlivosteizobrobkirezulʹtatívnaukovihdoslídženʹdlâgrafíčnihprocesorívarhítekturiamdínvidia AT ussa porívnânnâmožlivosteizobrobkirezulʹtatívnaukovihdoslídženʹdlâgrafíčnihprocesorívarhítekturiamdínvidia AT shestakovmv porívnânnâmožlivosteizobrobkirezulʹtatívnaukovihdoslídženʹdlâgrafíčnihprocesorívarhítekturiamdínvidia AT dudnikva sravnenievozmožnosteipoobrabotkerezulʹtatovnaučnyhissledovaniidlâgrafičeskihprocessorovarhitekturamdinvidia AT kudryavtsevvi sravnenievozmožnosteipoobrabotkerezulʹtatovnaučnyhissledovaniidlâgrafičeskihprocessorovarhitekturamdinvidia AT ussa sravnenievozmožnosteipoobrabotkerezulʹtatovnaučnyhissledovaniidlâgrafičeskihprocessorovarhitekturamdinvidia AT shestakovmv sravnenievozmožnosteipoobrabotkerezulʹtatovnaučnyhissledovaniidlâgrafičeskihprocessorovarhitekturamdinvidia |