Means and methods of the unstructured data analysis

Analysis of the current trends in the unstructured text data  wide usage  and the development of software tools for their processing causes the high urgency of this research direction and the necessity of intelligent information systems in such processing. A signigicant part of Big Data consists of...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2019
Автор: Rogushina, J.V.
Формат: Стаття
Мова:Ukrainian
Опубліковано: Інститут програмних систем НАН України 2019
Теми:
Онлайн доступ:https://pp.isofts.kiev.ua/index.php/ojs1/article/view/348
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Problems in programming
Завантажити файл: Pdf

Репозитарії

Problems in programming
Опис
Резюме:Analysis of the current trends in the unstructured text data  wide usage  and the development of software tools for their processing causes the high urgency of this research direction and the necessity of intelligent information systems in such processing. A signigicant part of Big Data consists of unstructured texts that require the further development of specific Text Mining and algorythms of machine learning. Unstructured  data consisting of natural language text in the general case, do not have a predetermined data model. Their ambiguity, heterogeneity and context dependence considerably complicate the classification of documents, the identification of their components and the automated obtaining of user-oriented knowledge from their content, while the large volumes and dynamism of such data do not involve efficient manual processing. The means and methods of data structuring, their various software implementations are considered. The prospects of using background knowledge for such structuring are analyzed. The feasibility of application such W3C standards as RDF and OWL is substantiated. The use of semantic Wiki-technologies for development of distributed information resources simplifies the process of natural text structuring by users and also generates the source of background knowledge for the analysis of arbitrary texts of the corresponding domains. The models and methods proposed in the work allow to improve this process.Problems in programming 2019; 1: 57-77