Ємнісно-часова оптимізація у стисканні природномовних текстів

In this paper, we discuss various problems arising in space and time optimization of natural language text compression methods. We define a new class of variable-length universal data compression codes with multiple delimiters — the Reverse Multi-Delimiter (RMD) codes. They are synchronizable, allow...

Повний опис

Збережено в:

Бібліографічні деталі
Дата:	2023
Автори:	Anisimov, Anatoly, Zavadskyi, Igor
Формат:	Стаття
Мова:	Українська
Опубліковано:	Інститут прикладних проблем механіки і математики ім. Я. С. Підстригача НАН України 2023
Теми:	word-based compression archiver code multi-delimiter
Онлайн доступ:	https://www.fmmit.lviv.ua/index.php/fmmit/article/view/294
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:	Physico-mathematical modeling and informational technologies
Завантажити файл:

Репозитарії

Physico-mathematical modeling and informational technologies

Опис
Резюме:	In this paper, we discuss various problems arising in space and time optimization of natural language text compression methods. We define a new class of variable-length universal data compression codes with multiple delimiters — the Reverse Multi-Delimiter (RMD) codes. They are synchronizable, allow us to perform fast Boyer-Moore-style search in a compressed file, and at the same time provide the best compression ratio among all codes of a discussed class. In combination with a special technique of preprocessing a natural language text and its dictionary, they improve the performance of modern powerful achievers. Also, we construct a very fast decoding algorithm for RMD-codes operating almost at the same speed as (s,c)-dense codes and times faster than Fi-bonacci codes decoding. The provided experiments show that RMD-codes occupy a very attractive position by the means of space/decoding time tradeoffs in natural language text compression.

Ємнісно-часова оптимізація у стисканні природномовних текстів

Репозитарії

Схожі ресурси