Means and methods of the unstructured data analysis

Analysis of the current trends in the unstructured text data  wide usage  and the development of software tools for their processing causes the high urgency of this research direction and the necessity of intelligent information systems in such processing. A signigicant part of Big Data consists of...

Full description

Saved in:
Bibliographic Details
Date:2019
Main Author: Rogushina, J.V.
Format: Article
Language:Ukrainian
Published: PROBLEMS IN PROGRAMMING 2019
Subjects:
Online Access:https://pp.isofts.kiev.ua/index.php/ojs1/article/view/348
Tags: Add Tag
No Tags, Be the first to tag this record!
Journal Title:Problems in programming
Download file: Pdf

Institution

Problems in programming
Description
Summary:Analysis of the current trends in the unstructured text data  wide usage  and the development of software tools for their processing causes the high urgency of this research direction and the necessity of intelligent information systems in such processing. A signigicant part of Big Data consists of unstructured texts that require the further development of specific Text Mining and algorythms of machine learning. Unstructured  data consisting of natural language text in the general case, do not have a predetermined data model. Their ambiguity, heterogeneity and context dependence considerably complicate the classification of documents, the identification of their components and the automated obtaining of user-oriented knowledge from their content, while the large volumes and dynamism of such data do not involve efficient manual processing. The means and methods of data structuring, their various software implementations are considered. The prospects of using background knowledge for such structuring are analyzed. The feasibility of application such W3C standards as RDF and OWL is substantiated. The use of semantic Wiki-technologies for development of distributed information resources simplifies the process of natural text structuring by users and also generates the source of background knowledge for the analysis of arbitrary texts of the corresponding domains. The models and methods proposed in the work allow to improve this process.Problems in programming 2019; 1: 57-77