Modelling videocard memory performance for LLM neural networks
The paper covers the analysis of performance characteristics of neural network-based algorithms class Generative pre-trained transformer, also known as Large Language Model, for contemporary videocards. The goal is to check the application limitations for this class of neural networks for mobile com...
Збережено в:
| Дата: | 2024 |
|---|---|
| Автори: | , |
| Формат: | Стаття |
| Мова: | Ukrainian |
| Опубліковано: |
PROBLEMS IN PROGRAMMING
2024
|
| Теми: | |
| Онлайн доступ: | https://pp.isofts.kiev.ua/index.php/ojs1/article/view/617 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
| Назва журналу: | Problems in programming |
| Завантажити файл: | |
Репозитарії
Problems in programming| Резюме: | The paper covers the analysis of performance characteristics of neural network-based algorithms class Generative pre-trained transformer, also known as Large Language Model, for contemporary videocards. The goal is to check the application limitations for this class of neural networks for mobile computing platforms. This network class is interesting for use as control system tool, but for the bigger text corpuses the network performance degrades and the number of used computer resources grows quickly, so we need to explore if this type of network, but based on a smaller text corpus, is a feasible tool for devices with comparatively low computing capability. The performance investigation was performed with the help of GPGPUSim simulator, which can be freely configured as a virtual videocard of any computing capability. As this neural network computations are based on the calculation of a sequence of matrix multiplications, and its performance is limited by the memory bandwidth, we analyze the behavior statistics of the different cache memory levels of the videocard processor and the cache interaction with the main memory. With the help of GPGPUSim we have gathered statistics for different Generative pre-trained transformer version 2 configurations, from small to xl configurations. The level 2 cache memory access statistics, level 2 cache memory access misses, number of accesses to main memory show that even for the middle-level network configurations the number of level 2 cache memory misses exceeds the 7-8% level. This number looks high and this evidence shows that the size of the cache memory is quite small for executing this neural network configuration, also there is the substantial traffic from cache memory to main memory. Although, the minimal so-called small configuration can be computed faster and with moderated resources, and so can be used further as a part of decision-making system for computing platforms with moderate performance and resources for the case of limited text corpuses. This opens good enough perspectives for using this type of neural networks for autonomous decision making.Prombles in programming 2024; 2-3: 37-44 |
|---|