Ємнісно-часова оптимізація у стисканні природномовних текстів
In this paper, we discuss various problems arising in space and time optimization of natural language text compression methods. We define a new class of variable-length universal data compression codes with multiple delimiters — the Reverse Multi-Delimiter (RMD) codes. They are synchronizable, allow...
Збережено в:
| Дата: | 2023 |
|---|---|
| Автори: | , |
| Формат: | Стаття |
| Мова: | Українська |
| Опубліковано: |
Інститут прикладних проблем механіки і математики ім. Я. С. Підстригача НАН України
2023
|
| Теми: | |
| Онлайн доступ: | https://www.fmmit.lviv.ua/index.php/fmmit/article/view/294 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
| Назва журналу: | Physico-mathematical modeling and informational technologies |
| Завантажити файл: | |
Репозитарії
Physico-mathematical modeling and informational technologies| Резюме: | In this paper, we discuss various problems arising in space and time optimization of natural language text compression methods. We define a new class of variable-length universal data compression codes with multiple delimiters — the Reverse Multi-Delimiter (RMD) codes. They are synchronizable, allow us to perform fast Boyer-Moore-style search in a compressed file, and at the same time provide the best compression ratio among all codes of a discussed class. In combination with a special technique of preprocessing a natural language text and its dictionary, they improve the performance of modern powerful achievers. Also, we construct a very fast decoding algorithm for RMD-codes operating almost at the same speed as (s,c)-dense codes and times faster than Fi-bonacci codes decoding. The provided experiments show that RMD-codes occupy a very attractive position by the means of space/decoding time tradeoffs in natural language text compression. |
|---|