Methods and tools for extracting personal data from theses abstracts

The problem of extraction of data about a person from scarce data collection is studied. The data collections are public resources on the internet. When these data are collected and parsed they present additional value for users. Collecting such data is problematic because of it’s weak structure res...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2019
Hauptverfasser:	Kudim, K.A., Proskudina, G.Yu.
Format:	Artikel
Sprache:	Russisch
Veröffentlicht:	PROBLEMS IN PROGRAMMING 2019
Schlagworte:	data extraction weakly structured documents XPath technology regular expressions semantic web UDC 004.82
Online Zugang:	https://pp.isofts.kiev.ua/index.php/ojs1/article/view/359
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	Problems in programming
Завантажити файл:

Institution

Problems in programming

Beschreibung
Zusammenfassung:	The problem of extraction of data about a person from scarce data collection is studied. The data collections are public resources on the internet. When these data are collected and parsed they present additional value for users. Collecting such data is problematic because of it’s weak structure restrictions. Thus the system is suggested to automate information gathering and parsing. The initial task is to process personal data from thesis documents publicly available on the internet. This data presents information about scientists which can’t be obtained from other sources. The goal is to be able to make requests to the data having its semantics in mind and not only plain text.The prototype system is developed with PHP and XPath able to collect raw documents from digital repository of National Library of Ukraine by V. I. Vernadskiy. The system also extracts data from the collected documents and stores them locally in RDF data model suitable for specific data and for future exposition to the Semantic Web. The collection of more than 63000 documents was processed to test the system.
DOI:	10.15407/pp2019.02.038

Methods and tools for extracting personal data from theses abstracts

Institution

Ähnliche Einträge