The concept and evaluating of big data quality in the semantic environment
Big data refers to large volumes, complex data sets with various autonomous sources, characterized by continuous growth. Data storage and data collection capabilities are now rapidly expanding in all fields of science and technology due to the rapid development of networks. Evaluating the quality of...
Saved in:
| Date: | 2023 |
|---|---|
| Main Author: | |
| Format: | Article |
| Language: | English |
| Published: |
PROBLEMS IN PROGRAMMING
2023
|
| Subjects: | |
| Online Access: | https://pp.isofts.kiev.ua/index.php/ojs1/article/view/527 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Problems in programming |
| Download file: | |
Institution
Problems in programming| _version_ | 1859502220854165504 |
|---|---|
| author | Novitsky, A.V. |
| author_facet | Novitsky, A.V. |
| author_sort | Novitsky, A.V. |
| baseUrl_str | https://pp.isofts.kiev.ua/index.php/ojs1/oai |
| collection | OJS |
| datestamp_date | 2023-06-25T06:57:27Z |
| description | Big data refers to large volumes, complex data sets with various autonomous sources, characterized by continuous growth. Data storage and data collection capabilities are now rapidly expanding in all fields of science and technology due to the rapid development of networks. Evaluating the quality of data is a difficult task in the context of big data, because the speed of semantic data reasoning directly depends on its quality. The appropriate strategies are necessary to evaluate and assess data quality according to the huge amount of data and its rapid generation. Managing a large volume of heterogeneous and distributed data requires defining and continuously updating metadata describing various aspects of data semantics and its quality, such as conformance to metadata schema, provenance, reliability, accuracy and other properties. The article examines the problem of evaluating the quality of big data in the semantic environment. The definition of big data and its semantics is given below and there is a short excursion on a theory of quality assessment. The model and its components which allow to form and specify metrics for quality have already been developed. This model includes such components as: quality characteristics; quality metric; quality system; quality policy. A quality model for big data that defines the main components and requirements for data evaluation has already been proposed. In particular, such evaluation components as: accessibility, relevance, popularity, compliance with the standard, consistency, etc. are highlighted. The problem of inference complexity is demonstrated in the article. Approaches to improving fast semantic inference through materialization and division of the knowledge base into two components, which are expressed by different dialects of descriptive logic, are also considered below. The materialization of big data makes it possible to significantly speed up the processing of requests for information extraction. It is demonstrated how the quality of metadata affects materialization. The proposed model of the knowledge base allows increasing the qualitative indicators of the reasoning speed.Prombles in programming 2022; 3-4: 260-270 |
| first_indexed | 2025-07-17T09:41:12Z |
| format | Article |
| fulltext |
260
Програмні засоби аналітики даних
УДК 004.05 http://doi.org/10.15407/pp2022.03-04.260
THE CONCEPT AND EVALUATING
OF BIG DATA QUALITY
IN THE SEMANTIC ENVIRONMENT
Oleksandr Novytskyi
Великі дані стосуються великих обсягів, складних наборів даних із різними автономними джерелами, що характеризуються
постійним зростанням. Зі швидким розвитком мереж, зберігання даних і можливостей збору даних, великі дані швидко розши-
рюються в усіх сферах науки та техніки. У контексті великих даних оцінка якості даних є складною задачею. Для семантичних
даних якість і швидкість виводу безпосередньо залежить від якості даних. Враховуючи величезний обсяг даних і їх швидке
генерування, це вимагає відповідних стратегій для оцінки якості даних. Управління великим обсягом різнорідних і розподілених
даних вимагає визначення та постійного оновлення метаданих, що описують різні аспекти семантики та якості даних, такі як від-
повідність схемі метаданих, походження, надійність, точність та інші властивості. В статі розглянута проблематика оцінювання
якості великих даних у семантичному середовищі. Наведено визначення великих даних та їх семантики, зроблено невеликий екс-
курс в теорію оцінювання якості. Розроблена модель та її компоненти, що дозволяє сформувати та конкретизувати метрики для
якості. В дану модель входять такі компоненти як: характеристика якості, метрика якості, система якості, політка якості. Запро-
понована модель якості для великих даних, яка визначає основні компоненти та вимоги до оцінювання даних. Зокрема, виділено
такі компоненти оцінювання як: доступність, релеватність, популярність, відповідність стандарту, узгодженість тощо. Продемон-
стрована проблема складності виводу. Розглянуто підходи до покращення швидкого семантичного виводу через матеріалізацію
та поділ бази знань на два компоненти, які виражаються різними діалектами дескриптивної логіки. Оскільки матеріалізація
великих даних дозволяє значно пришвидшити обробку запитів на екстракцію інформації. Продемонстровано як якість метаданих
вливає на матеріалізацію. Запропонована модель бази знань, яка дозволяє підвищити якісні показники швидкості виводу.
Big data refers to large volumes, complex data sets with various autonomous sources, characterized by continuous growth. Data storage
and data collection capabilities are now rapidly expanding in all fields of science and technology due to the rapid development of net-
works. Evaluating the quality of data is a difficult task in the context of big data, because the speed of semantic data reasoning directly
depends on its quality. The appropriate strategies are necessary to evaluate and assess data quality according to the huge amount of data
and its rapid generation. Managing a large volume of heterogeneous and distributed data requires defining and continuously updating
metadata describing various aspects of data semantics and its quality, such as conformance to metadata schema, provenance, reliability,
accuracy and other properties. The article examines the problem of evaluating the quality of big data in the semantic environment. The
definition of big data and its semantics is given below and there is a short excursion on a theory of quality assessment. The model and
its components which allow to form and specify metrics for quality have already been developed. This model includes such components
as: quality characteristics; quality metric; quality system; quality policy. A quality model for big data that defines the main components
and requirements for data evaluation has already been proposed. In particular, such evaluation components as: accessibility, relevance,
popularity, compliance with the standard, consistency, etc. are highlighted. The problem of inference complexity is demonstrated in the
article. Approaches to improving fast semantic inference through materialization and division of the knowledge base into two compo-
nents, which are expressed by different dialects of descriptive logic, are also considered below. The materialization of big data makes it
possible to significantly speed up the processing of requests for information extraction. It is demonstrated how the quality of metadata
affects materialization. The proposed model of the knowledge base allows increasing the qualitative indicators of the reasoning speed.
1. Introduction
The concept of Big Data in the broad sense of this word is used to define data processing, spread, and analytics
(Stuart Ward & Barker, 2013). The main special feature of this data is increased exponentially. Many efforts are aimed at
solving the problem of big data, this is due to the need to develop new methods and algorithms for BD processing.
Defining big data is primarily related to the difficulty of defining a quantitative definition of a set of information
objects. The most accepted definition is indicated in the report (Laney, 2001), where the problem of managing large data
sets is based on the three Vs: Volume, Velocity, and Variety. They are expressed due to the growth of data volumes, the
heterogeneity of data formats and metadata which make the rapid management of data more complicated. Later, such a
criterion as Veracity (Schroeck, et al., 2012) was added to the definition of big data. This term was clarified and supple-
mented with criteria that affected the complexity and unstructuredness of the data (Intel IT Center, 2012), (Suthaharan,
2014). A number of big data definitions came from real business problems. However, we assume that the semantics and
structure are given through external ontologies and fixed through metadata for semantic big data. We do not consider the
problem of normalization and data extraction but evaluate the quality of such data. But this does not solve the problems of
operating with such data and creates additional problems related to the reasoning of information from such a BD set. Our
semantic data model must satisfy such requirements as Findable, Accessible, Interoperable and Reusable data or metadata
(Wilkinson, et al., 2016).
2. Big Data Semantics
The issue of semantics was studied in works (Ceravolo, et al., 2018), where big data was considered on the ba-
sis that data semantics refers to the meaningful and effective use of a data object to represent a concept or object in the
real world. Such a general concept unites a wide variety of applications (Amsler, 1972). Big Data semantic knowledge
© О.В. Новицький, 2022
ISSN 1727-4907. Проблеми програмування. 2022. № 3-4. Спеціальний випуск
261
Програмні засоби аналітики даних
refers to numerous aspects of rules, expert knowledge and domain information (Woods, 1975). One of specific big data
in semantic environment is complexly of reasoning even this data not to big for first view. Online web-application is very
sensitive for delay for response and union approach reasoning and web technology provide high requirement to velocity
big data. Our article survey the problem big data quality for web application and means for increasing velocity.
3. Model quality of Dig Data
The practical suitability of BD is determined primarily by its quality. The urgency of solving the BD quality
problem is determined by the scale of its creation and distribution.
Let us consider the main concepts related to the quality of BD (Novytskyi, et al., 2014) some concepts was taken
from the digital library domain and adapting to big data. Quality is a set of properties of objects that give them the ability
to satisfy the stipulated or anticipated needs of the consumer following the purpose.
The quality characteristic is a property or a set of object properties, with the help of which quality can be de-
scribed and evaluated. Each object has its nomenclature characteristic. A characteristic can be a composition of other
characteristics, forming a hierarchical structure.
Metric is a formula or rule for determining the degree to which an object possesses a characteristic.
A quality indicator is a quantitative or qualitative value, obtained as a result of the procedure for evaluating the
quality of a characteristic according to the evaluation methodology. Quantitative indicators have a numerical expression
within a certain scale. Qualitative indicators have a verbal expression within a certain verbal ordered scale.
Quality level is the degree of acceptability of the obtained quality indicator from the view of the expected
(planned) quality.
The quality system is a set of organizational structures, methods, processes, procedures and resources necessary
for the general direction and management of quality by established methods. It includes quality policy, quality model;
quality achievement system; quality system documentation.
The quality policy is a document developed by the responsible management. It expresses the goals in the quality
field, the acceptable level of quality, the duties of various persons and structures for quality assurance, a set of measures
to achieve quality. The quality policy is defined based on tasks set in the quality field.
Quality model is a set of objects for which it is described, evaluated and supported. Also, it includes quality
characteristics, methods and means of quality assessment, metrics and algorithms for determining quality indicators. A
specific quality model is selected based on the developed quality policy and other factors.
Achieving quality is a set of organizational structure, responsibilities, procedures, processes and resources that
implement general quality management (Novytskyi, et al., 2014).
The quality management system is an organizational structure that includes personnel who implement quality
management functions using established methods.
Quality management is the general management of quality provided by resources, particularly human resources.
It organizes quality assurance work, interacts with the external environment, defines policies, goals and plans in the qual-
ity field, and makes strategic and important operational decisions regarding quality.
Also an quality assurance is creating confidence that quality requirements will be met. It includes administrative
and procedural measures carried out within the framework of the quality system to ensure the fulfillment of requirements
and goals. This is a systematic measurement, comparison with a standard, process monitoring, making technological or
any other process adjustments to achieve the required quality.
Quality control is a set of measures, procedures, methods and means that allow performing a systematic and in-
dependent analysis. It is possible to determine the compliance of activities and results in the quality field with the planned
measures and the effectiveness of their implementation and compliance with the set goals. The quality assurance system
is the subject of the system analysis.
Програмні засоби аналітики даних
3. Model quality of Dig Data
The practical suitability of BD is determined primarily by its quality. The urgency of solving the BD quality
problem is determined by the scale of its creation and distribution.
Let us consider the main concepts related to the quality of BD (Novytskyi, et al., 2014) some concepts was taken
from the digital library domain and adapting to big data. Quality is a set of properties of objects that give them the
ability to satisfy the stipulated or anticipated needs of the consumer following the purpose.
The quality characteristic is a property or a set of object properties, with the help of which quality can be
described and evaluated. Each object has its nomenclature characteristic. A characteristic can be a composition of other
characteristics, forming a hierarchical structure.
Metric is a formula or rule for determining the degree to which an object possesses a characteristic.
A quality indicator is a quantitative or qualitative value, obtained as a result of the procedure for evaluating the
quality of a characteristic according to the evaluation methodology. Quantitative indicators have a numerical expression
within a certain scale. Qualitative indicators have a verbal expression within a certain verbal ordered scale.
Quality level is the degree of acceptability of the obtained quality indicator from the view of the expected
(planned) quality.
The quality system is a set of organizational structures, methods, processes, procedures and resources necessary
for the general direction and management of quality by established methods. It includes quality policy, quality model;
quality achievement system; quality system documentation.
The quality policy is a document developed by the responsible management. It expresses the goals in the quality
field, the acceptable level of quality, the duties of various persons and structures for quality assurance, a set of measures
to achieve quality. The quality policy is defined based on tasks set in the quality field.
Quality model is a set of objects for which it is described, evaluated and supported. Also, it includes quality
characteristics, methods and means of quality assessment, metrics and algorithms for determining quality indicators. A
specific quality model is selected based on the developed quality policy and other factors.
Achieving quality is a set of organizational structure, responsibilities, procedures, processes and resources that
implement general quality management (Novytskyi, et al., 2014).
The quality management system is an organizational structure that includes personnel who implement quality
management functions using established methods.
Quality management is the general management of quality provided by resources, particularly human resources.
It organizes quality assurance work, interacts with the external environment, defines policies, goals and plans in the
quality field, and makes strategic and important operational decisions regarding quality.
Also an quality assurance is creating confidence that quality requirements will be met. It includes administrative
and procedural measures carried out within the framework of the quality system to ensure the fulfillment of
requirements and goals. This is a systematic measurement, comparison with a standard, process monitoring, making
technological or any other process adjustments to achieve the required quality.
Quality control is a set of measures, procedures, methods and means that allow performing a systematic and
independent analysis. It is possible to determine the compliance of activities and results in the quality field with the
planned measures and the effectiveness of their implementation and compliance with the set goals. The quality
assurance system is the subject of the system analysis.
Quality managementQuality assurance
Quality control Quality assessment
Tools of support
Methods Approach Tools
Manage
Monitors execution Evaluates efficiency Manage
3.1 The system of quality achieving
Quality assessment measures the achieved or expected level of quality overall at every stage of the BD life cycle.
There is a distinction between objective and subjective assessment. Objective assessment is a clearly defined
assessment process, usually fixed by mathematical formulas, which does not depend on subjective perception.
Subjective assessment is based on personal feelings, views and opinions.
We propose considering the main requirements for the quality model (Spirin, et al., 2012), which are also applied
to BD.
3.1 The system of quality achieving
262
Програмні засоби аналітики даних
Quality assessment measures the achieved or expected level of quality overall at every stage of the BD life cycle.
There is a distinction between objective and subjective assessment. Objective assessment is a clearly defined assessment
process, usually fixed by mathematical formulas, which does not depend on subjective perception. Subjective assessment
is based on personal feelings, views and opinions.
We propose considering the main requirements for the quality model (Spirin, et al., 2012), which are also applied to BD.
A. The quality model should provide an opportunity to highlight the quality of the product itself and its interac-
tion with the environment. The following components are distinguished in this context as:
- the quality of the product itself, without taking into account its behavior with the external environment (in-
ternal quality);
- product quality regarding its behavior in the external environment (external quality);
- the quality of technological processes of product development (process quality);
- the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from require-
ments development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the in-
formation object consists of a list of statements in the form «subject - predicate - object». Each such statement is called a
triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are predicates.
Certain metadata describes each node of such a graph. That is, the model of the information object in the BD environment
is defined as
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
.
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The metric
of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain property. Let a
set of equivalent objects
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
, be given, which may or may not have a certain property. We define
the following characteristic function:
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
(2)
If the objects
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
are unequal and their weighting factor
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
, is
given for each of them, which determines the relative importance of the objects, then the above formula takes the follow-
ing form:
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is necessary
to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it in
many cases. For example, let’s imagine
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
as an expert with
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
competence specifying a range of values for the i with its
characteristics:
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
– where the optimal value of the characteristic is
Програмні засоби аналітики даних
[Введите текст]
A. The quality model should provide an opportunity to highlight the quality of the product itself and its
interaction with the environment. The following components are distinguished in this context as:
− the quality of the product itself, without taking into account its behavior with the external environment
(internal quality);
− product quality regarding its behavior in the external environment (external quality);
− the quality of technological processes of product development (process quality);
− the quality of the product to its use in different contexts (and the quality experienced by the user in specific
scenarios of product use (quality during use)).
B. The quality model should include all stages of the BD development and use life cycle starting from
requirements development and ending with the industrial operation.
С. The quality model is relevant to all structural elements of BD. It contains all types of support for the software
system — functional, informational, mathematical, technical, etc.
D. An important component of the quality model is the structure of quality characteristics and metrics that assess
elementary characteristics.
BD consist of two components are data and data base application, information is retrieved from a computerized
BD by using a computer program.
The semantic information model for BD defines as a set of information objects in which each predicate define
through top-level ontology.
Each information IO object in the BD environment is specified in a certain directed acyclic graph where the
information object consists of a list of statements in the form «subject - predicate - object». Each such statement is
called a triplet. The set of such triplets forms a directed graph, in which vertices are subjects and objects, and edges are
predicates. Certain metadata describes each node of such a graph. That is, the model of the information object in the BD
environment is defined as ( ), ( ), ( )IO s m p m o m .
Evaluating the quality of elementary characteristics involves determining their metrics represented by formulas
or rules for determining the degree to which an object has an elementary characteristic (Novitsky, et al., 2016). The
metric of an elementary characteristic reflects the degree to which an object or a set of objects possesses a certain
property. Let a set of equivalent objects
i
M M ( 1,...,i N ), be given, which may or may not have a certain
property. We define the following characteristic function:
1, ;
( , ) 0, .
i
i
object M has property p
M p anothercase
.
(1)
Then the estimate of the degree to which the set of objects M has the property p is equal to:
1
,
N
j
j
M p
M p
N .
(2)
If the objects
i
M ( 1,...,i N ) are unequal and their weighting factor : 0 1
i i
K K ( 1,...,i N ), is given
for each of them, which determines the relative importance of the objects, then the above formula takes the following
form:
1
,
N
j j
J
K M p
M p
N .
(3)
Similarly, a metric can be defined for a situation where one object can have multiple properties and it is
necessary to determine to what extent they are inherent to the object.
Establishing acceptable values for certain characteristics and adding a qualitative measure to the appropriate
range is important for metrics. This range can be determined experimentally or algorithmically. An expert establishes it
in many cases. For example, let's imagine j as an expert with
j
K competence specifying a range of values for the i with
its characteristics: ,
ij ij
X Y ,
ij
Y - where the optimal value of the characteristic is
ij
X with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
with its worst value.
M experts evaluated the characteristics. The final score for the range of values is calculated as follows:
Програмні засоби аналітики даних
1
1
M
j ij
j
i M
j
j
K X
X
K
1
1
M
j ij
j
i M
j
j
K Y
Y
K
.
(4)
It should be noted that intervals ,
ij ij
X Y are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated
according to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality
characteristics.
Accessibility is a complex function that depends on many factors, including:
− the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
− there is a service that can find the IO (one of the ways to remove an information object from public access is to
deactivate its searching characteristics);
− it is the network and data transmission system in the network operational;
− there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific
persons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single
operation as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights ,
i j
SAcc IO
i
S service to
j
IO ,
means a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to
the group to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as
i
P except for access rights restrictions which take the following
values: 1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is
calculated as follows:
1
, ..., , ,
n i j
MIN P c SAc IOP
.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user's knowledge
about their information needs at the current time and the tasks facing them. The user's information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows , ,
i j k
Relevance IO S Query :
1 Servise S IO ,
, ,
0 another case
j i k
i j k
is relevant forQuery
R
appove tha
n
t
l IOe eva ce S Query
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also
tend to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can
lead to a loss of storage accuracy of the new version of the information object compared to the old one. This
characteristic assesses the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and
measure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and
use. Reliability determines the extent to which the IO can be relied upon. This is largely determined by the developer's
credibility and origin source. The credibility of the IO can be measured by:
− the attitude of users towards the IO itself;
− the attitude of users towards the source of the IP;
− the availability of information on the chronology of IO changes;
− the attitude of users to the BD in which the IO is located.
. (4)
It should be noted that intervals
Програмні засоби аналітики даних
1
1
M
j ij
j
i M
j
j
K X
X
K
1
1
M
j ij
j
i M
j
j
K Y
Y
K
.
(4)
It should be noted that intervals ,
ij ij
X Y are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated
according to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality
characteristics.
Accessibility is a complex function that depends on many factors, including:
− the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
− there is a service that can find the IO (one of the ways to remove an information object from public access is to
deactivate its searching characteristics);
− it is the network and data transmission system in the network operational;
− there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific
persons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single
operation as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights ,
i j
SAcc IO
i
S service to
j
IO ,
means a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to
the group to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as
i
P except for access rights restrictions which take the following
values: 1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is
calculated as follows:
1
, ..., , ,
n i j
MIN P c SAc IOP
.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user's knowledge
about their information needs at the current time and the tasks facing them. The user's information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows , ,
i j k
Relevance IO S Query :
1 Servise S IO ,
, ,
0 another case
j i k
i j k
is relevant forQuery
R
appove tha
n
t
l IOe eva ce S Query
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also
tend to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can
lead to a loss of storage accuracy of the new version of the information object compared to the old one. This
characteristic assesses the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and
measure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and
use. Reliability determines the extent to which the IO can be relied upon. This is largely determined by the developer's
credibility and origin source. The credibility of the IO can be measured by:
− the attitude of users towards the IO itself;
− the attitude of users towards the source of the IP;
− the availability of information on the chronology of IO changes;
− the attitude of users to the BD in which the IO is located.
are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated accord-
ing to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
263
Програмні засоби аналітики даних
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality char-
acteristics.
Accessibility is a complex function that depends on many factors, including:
- the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
- there is a service that can find the IO (one of the ways to remove an information object from public access is
to deactivate its searching characteristics);
- it is the network and data transmission system in the network operational;
- there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific per-
sons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single operation
as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights
Програмні засоби аналітики даних
1
1
M
j ij
j
i M
j
j
K X
X
K
1
1
M
j ij
j
i M
j
j
K Y
Y
K
.
(4)
It should be noted that intervals ,
ij ij
X Y are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated
according to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality
characteristics.
Accessibility is a complex function that depends on many factors, including:
− the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
− there is a service that can find the IO (one of the ways to remove an information object from public access is to
deactivate its searching characteristics);
− it is the network and data transmission system in the network operational;
− there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific
persons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single
operation as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights ,
i j
SAcc IO
i
S service to
j
IO ,
means a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to
the group to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as
i
P except for access rights restrictions which take the following
values: 1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is
calculated as follows:
1
, ..., , ,
n i j
MIN P c SAc IOP
.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user's knowledge
about their information needs at the current time and the tasks facing them. The user's information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows , ,
i j k
Relevance IO S Query :
1 Servise S IO ,
, ,
0 another case
j i k
i j k
is relevant forQuery
R
appove tha
n
t
l IOe eva ce S Query
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also
tend to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can
lead to a loss of storage accuracy of the new version of the information object compared to the old one. This
characteristic assesses the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and
measure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and
use. Reliability determines the extent to which the IO can be relied upon. This is largely determined by the developer's
credibility and origin source. The credibility of the IO can be measured by:
− the attitude of users towards the IO itself;
− the attitude of users towards the source of the IP;
− the availability of information on the chronology of IO changes;
− the attitude of users to the BD in which the IO is located.
means
a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to the group
to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as except for access rights restrictions which take the following values:
1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is calculated as follows:
Програмні засоби аналітики даних
1
1
M
j ij
j
i M
j
j
K X
X
K
1
1
M
j ij
j
i M
j
j
K Y
Y
K
.
(4)
It should be noted that intervals ,
ij ij
X Y are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated
according to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality
characteristics.
Accessibility is a complex function that depends on many factors, including:
− the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
− there is a service that can find the IO (one of the ways to remove an information object from public access is to
deactivate its searching characteristics);
− it is the network and data transmission system in the network operational;
− there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific
persons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single
operation as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights ,
i j
SAcc IO
i
S service to
j
IO ,
means a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to
the group to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as
i
P except for access rights restrictions which take the following
values: 1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is
calculated as follows:
1
, ..., , ,
n i j
MIN P c SAc IOP
.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user's knowledge
about their information needs at the current time and the tasks facing them. The user's information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows , ,
i j k
Relevance IO S Query :
1 Servise S IO ,
, ,
0 another case
j i k
i j k
is relevant forQuery
R
appove tha
n
t
l IOe eva ce S Query
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also
tend to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can
lead to a loss of storage accuracy of the new version of the information object compared to the old one. This
characteristic assesses the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and
measure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and
use. Reliability determines the extent to which the IO can be relied upon. This is largely determined by the developer's
credibility and origin source. The credibility of the IO can be measured by:
− the attitude of users towards the IO itself;
− the attitude of users towards the source of the IP;
− the availability of information on the chronology of IO changes;
− the attitude of users to the BD in which the IO is located.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user’s knowledge
about their information needs at the current time and the tasks facing them. The user’s information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows
Програмні засоби аналітики даних
1
1
M
j ij
j
i M
j
j
K X
X
K
1
1
M
j ij
j
i M
j
j
K Y
Y
K
.
(4)
It should be noted that intervals ,
ij ij
X Y are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated
according to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality
characteristics.
Accessibility is a complex function that depends on many factors, including:
− the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
− there is a service that can find the IO (one of the ways to remove an information object from public access is to
deactivate its searching characteristics);
− it is the network and data transmission system in the network operational;
− there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific
persons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single
operation as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights ,
i j
SAcc IO
i
S service to
j
IO ,
means a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to
the group to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as
i
P except for access rights restrictions which take the following
values: 1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is
calculated as follows:
1
, ..., , ,
n i j
MIN P c SAc IOP
.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user's knowledge
about their information needs at the current time and the tasks facing them. The user's information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows , ,
i j k
Relevance IO S Query :
1 Servise S IO ,
, ,
0 another case
j i k
i j k
is relevant forQuery
R
appove tha
n
t
l IOe eva ce S Query
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also
tend to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can
lead to a loss of storage accuracy of the new version of the information object compared to the old one. This
characteristic assesses the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and
measure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and
use. Reliability determines the extent to which the IO can be relied upon. This is largely determined by the developer's
credibility and origin source. The credibility of the IO can be measured by:
− the attitude of users towards the IO itself;
− the attitude of users towards the source of the IP;
− the availability of information on the chronology of IO changes;
− the attitude of users to the BD in which the IO is located.
:
Програмні засоби аналітики даних
1
1
M
j ij
j
i M
j
j
K X
X
K
1
1
M
j ij
j
i M
j
j
K Y
Y
K
.
(4)
It should be noted that intervals ,
ij ij
X Y are set by experts or determined algorithmically only for elementary
characteristics. At other levels, i.e. for integral characteristics, the minimum and maximum values are calculated
according to the defined formulas based on the given or calculated values of the previous levels (Novitsky, et al., 2016).
4. Quality properties of information objects in Big Data
Next, the issues of evaluating the quality of semantic information objects are considered. IO quality
characteristics.
Accessibility is a complex function that depends on many factors, including:
− the IO is actually available in the DB (the information object may be in the BD, but for some reasons, it may
be removed from public access or due to the amount of data, it may not be identified among a set of objects);
− there is a service that can find the IO (one of the ways to remove an information object from public access is to
deactivate its searching characteristics);
− it is the network and data transmission system in the network operational;
− there are no restrictions on access to the IO or if there are such restrictions they do not apply to specific
persons or groups of persons.
It should be noted that in the given context, they talk about the availability of the IO to perform a single
operation as reading. Our review does not include other possible operations with IO (changes, deletion, administration).
For BD this is availability for a specific service that interacts with BD. As a rule, a distinction is made between
availability for all and certain services. In this case, the restriction of access rights ,
i j
SAcc IO
i
S service to
j
IO ,
means a function that acquires the following values: 1 — the service does not have access restrictions or it belongs to
the group to which access is open; 0 — otherwise.
Now, if we mark other availability indicators as
i
P except for access rights restrictions which take the following
values: 1 — the indicator is satisfied, 0 — the indicator is not satisfied, then the general availability formula is
calculated as follows:
1
, ..., , ,
n i j
MIN P c SAc IOP
.
(5)
Relevance is the measure to which the information content of the information object meets the information needs
of the user. Both cannot be strictly formalized. This assessment largely depends on the depth of the user's knowledge
about their information needs at the current time and the tasks facing them. The user's information needs at the current
moment are expressed through his information search query as a result of knowledge reasoning. The query implicitly
defines the context in which relevance is evaluated. The user carries out an evaluation of this compliance as a result of
receiving a response to the request (the user can be a group of people).
The relevance evaluation function is as follows , ,
i j k
Relevance IO S Query :
1 Servise S IO ,
, ,
0 another case
j i k
i j k
is relevant forQuery
R
appove tha
n
t
l IOe eva ce S Query
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also
tend to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can
lead to a loss of storage accuracy of the new version of the information object compared to the old one. This
characteristic assesses the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and
measure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and
use. Reliability determines the extent to which the IO can be relied upon. This is largely determined by the developer's
credibility and origin source. The credibility of the IO can be measured by:
− the attitude of users towards the IO itself;
− the attitude of users towards the source of the IP;
− the availability of information on the chronology of IO changes;
− the attitude of users to the BD in which the IO is located.
(6)
Accuracy of storage. In the process of existence, the object can go into different states caused by the transition to
other software and technology platforms. Big data is characterized by constant changes, and errors in these data also tend
to accumulate and scale [14], including changing the storage format, using newer versions of BD, etc. All this can lead to a
loss of storage accuracy of the new version of the information object compared to the old one. This characteristic assesses
the loss degree of storage accuracy in the above-described cases (Novitsky, et al., 2016).
Credibility means that the IO has the ability to confirm that it is what it should be. The ability to verify and mea-
sure the extent to which an IO is what it is claimed to be is fundamentally important in its correct perception and use. Reli-
ability determines the extent to which the IO can be relied upon. This is largely determined by the developer’s credibility
and origin source. The credibility of the IO can be measured by:
- the attitude of users towards the IO itself;
- the attitude of users towards the source of the IP;
- the availability of information on the chronology of IO changes;
- the attitude of users to the BD in which the IO is located.
Integrity determines to what extent the IO is complete and correct from the point of view of the software object
it represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can ac-
curately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This character-
istic evaluates how quickly the set
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
is updated compared to the real state of affairs. The charac-
teristic is measured by the ratio of the actual delay time compared to the permissible one:
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is possible
to trace the prehistory of the existence of an IO. This is an important characteristic since inference over semantic data
depends on the data itself. Understanding the historical information about the data helps to determine the reasons for
changing the system’s behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
264
Програмні засоби аналітики даних
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed the
time requirements for receiving a response to the information from the BD. That is why it is necessary to develop meth-
ods that will allow the detection of such problems at an early stage. There are various approaches to deal with the task,
like the way to control all data entered into the system through the ontology. In practice, it is often not known what the
data model should be since the requirements for the BD system can change as the data increases. These requirements can
be constantly updated. This means that data previously entered into the BD management environment in the previously
specified structure may not correspond to the quality model after some time. Identifying these problems due to the scale
is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance at
query time by making the required information explicit in advance. Thus, recalculation of the necessary information for
each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family.
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed lan-
guage expressions, and semantics indicates their formal meaning.
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
be finite, non-empty sets of atomic concepts and atomic roles.
The ALC syntax is defined as follows:
– M and L are concepts;
– an arbitrary atomic concept A is a concept;
– if C is an arbitrary concept, then
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
are concepts. he corresponding constructors are
called addition, intersection and union;
– if C is a concept, R is an atomic role, then
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
are arbitrary concepts.
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
semantics is defined through the concept of interpretation. An interpretation is a pair of
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
, where Δ – is a non-empty set, called the domain of interpretation,
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
is an interpreting function that assigns the
measure
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
to each atomic A concept and R to each atomic role as an binary relation
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
. Other formulas
are interpreted as follows:
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
(8)
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
(9)
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
(10)
Програмні засоби аналітики даних
[Введите текст]
Integrity determines to what extent the IO is complete and correct from the point of view of the software object it
represents. Integrity contributes to increasing trust in the IO [13]. Accuracy of reproduction determines the degree of
accuracy of the reproduction of the IO of its original. For example, a text document reproducing an ancient book can
accurately reproduce the text and completely ignore its artistic design.
Timeliness indicates that the IO is introduced and updated on time, as this issue is specific to BD. This
characteristic evaluates how quickly the set ( ), ( ), ( )s m p m o m in IO is updated compared to the real state of affairs.
The characteristic is measured by the ratio of the actual delay time compared to the permissible one:
( , , )
exp
real timedelay
Timeliness IO s p o
ected timedelay
.
(7)
Origin is a characteristic of the quality of an IO. It indicates how well (correctly, completely, qualitatively) the
entire prehistory of the origin and change of an IO is presented, and how accurately and during what period it is
possible to trace the prehistory of the existence of an IO. This is an important characteristic since inference over
semantic data depends on the data itself. Understanding the historical information about the data helps to determine the
reasons for changing the system's behavior, which is not a trivial task in the BD environment.
Susceptibility indicates how easily a person can understand and accept IO. It can be used to analyze which set of
IO is most easily perceived by a group of persons due to the solved tasks.
Practical aspects of assessment of the quality of BD. One of the most challenging tasks in achieving data quality
metrics is the early detection of data-related problems. Typical problems include completeness, the integrity of data and
lack of contradictions. The problem lies in that in the conditions of the BD, the time to detect such issues may exceed
the time requirements for receiving a response to the information from the BD. That is why it is necessary to develop
methods that will allow the detection of such problems at an early stage. There are various approaches to deal with the
task, like the way to control all data entered into the system through the ontology. In practice, it is often not known what
the data model should be since the requirements for the BD system can change as the data increases. These
requirements can be constantly updated. This means that data previously entered into the BD management environment
in the previously specified structure may not correspond to the quality model after some time. Identifying these
problems due to the scale is a difficult problem.
One of the criteria of the quality model is the ability of BD to give a quick response to user requests. The most
effective method of increasing such speed is materialization [15]. Materialization can be used to improve performance
at query time by making the required information explicit in advance. Thus, recalculation of the necessary information
for each separate request is avoided. However, this method can be ineffective if there is excessive materialization.
Consider a certain graph of semantic data G in which the connections between concepts are built on the basis of
descriptive logic.
We will briefly describe the DL, which is the basis for all DL of the family. means «Attributive Language
with Complements». It is defined in [16]. The language is based on the previously introduced language AL (Attributive
Language), to which the addition constructor (negation) was added. Syntax describes a set of correctly constructed
language expressions, and semantics indicates their formal meaning.
Let
1
, . . . ,
m
CN A A і
1
, . . . ,
n
RN R R be finite, non-empty sets of atomic concepts and atomic
roles. The ALC syntax is defined as follows:
− M and L are concepts;
− an arbitrary atomic concept A is a concept;
− if C is an arbitrary concept, then C , C Dh and C Dg are concepts. he corresponding constructors are
called addition, intersection and union;
− if C is a concept, R is an atomic role, then .RCj and .RCi are arbitrary concepts.
semantics is defined through the concept of interpretation. An interpretation is a pair of . ), ( II ,
where Δ – is a non-empty set, called the domain of interpretation, Ia is an interpreting function that assigns the
measure ΔIA 8 to each atomic A concept and R to each atomic role as an binary relation Δ ΔR ×I8 . Other formulas are
interpreted as follows:
ΔI I= , =M L ; (8)
\ , ( ) ( ) ( ), I I I I I I I IA A C D C D C D C Dy h 1 g 2 (9)
{ | (( ) )}. , I IRC a b a b R b Cj 9 j 9 9 o 9 (10)
{ | (( ) )}. , I IRC a b a b R b Ci 9 i 9 9 9 (11)
(11)
Next the essence of the
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
terminology is revealed for DL
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
. However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific indi-
viduals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a set of
names of individuals is entered into the DL. There are two types of facts: a statement about an individual’s belonging to a
concept (written as C a); the statement about the belonging of a pair of individuals a and b and a role (written as R a, b).
A system of facts or
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
is a finite set of statements of form
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
, where a and
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
are indi-
viduals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
. We denote the power of such a set by
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
. The fol-
lowing constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
–
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
is a concept for limitation of functionality;
–
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
is a concept for quantitative limitation;
–
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
is a concept for qualitative limitation.
The following constructors are interpreted as follows:
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
(12)
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
(13)
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
(14)
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
(15)
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
(16)
265
Програмні засоби аналітики даних
There are cases when it is necessary to describe specific characteristics of an object In order to describe the
real world, for example, the number of pages in an information resource. To solve this problem, a specific area with
a fixed set of predicates is created (Lutz, 2002). A concrete domain is a pair
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
, where D is a non-empty set
and Ф is a set of predicates in the D. It can be assumed that given a set of predicate symbols PN where each predi-
cate symbol
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
is associated with an n arity and Ф maps an n-relation to it as
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
. It should be noted that
Ф always contains a single predicate D, that is PN always includes M symbol and is interpreted as
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
. Also is
always closed with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate
symbol in P, which is interpreted as
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
.
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
з k≥1 with atomic abstract attributes
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
and one concrete attribute
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
will be called
a complex, concrete attribute.
Concepts of
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
logic are defined by grammar (Lutz, 2002):
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
(17)
where
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
are arbitrary attributes,
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
is the n-concrete predicate. The semantics of
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
logic is considered as
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ;
interpretation with the following additions:
– sets Δ and D must not intersect;
– each atomic abstract attribute
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ; is assigned a partial function
Програмні засоби аналітики даних
Next the essence of the (TBox ) terminology is revealed for DL . However, all introduced concepts are
easily transferred to other DL.
Terminologies describe general knowledge about concepts and roles. To describe knowledge about specific
individuals (their belonging to concepts and roles), the DL offers a system of facts about individuals or ABox. For this, a
set of names of individuals is entered into the DL. There are two types of facts: a statement about an individual's
belonging to a concept (written as C a ); the statement about the belonging of a pair of individuals a and b and a
role (written as ,R a b ).
A system of facts or ABox is a finite set of statements of form C a and ,R a b , where a and b IN are
individuals, C is an arbitrary concept and R is a role.
Here are some ALC extensions that were used to fulfill the tasks of the dissertation work.
R-follower is an individual who is the right part of the role R. We denote the set of R-followers for e that can be
written as ( )IR e , where e : ( ) | ,I IR e d e d R . We denote the power of such a set by I|R |e . The
following constructors are called numerical role constraints. If R is a concept, n and 0 is a natural number, then:
− 1R is a concept for limitation of functionality;
− nR and nR is a concept for quantitative limitation;
− .nRC and .nRC is a concept for qualitative limitation.
The following constructors are interpreted as follows:
1 | ( ) 1
I IR e R e , (12)
| ( )
I InR e R e n
,
(13)
| ( )
I InR e R e n
,
(14)
. | ( )
I I InRC e R e C n
,
(15)
. | ( )
I I InRC e R e C n1
.
(16)
There are cases when it is necessary to describe specific characteristics of an object In order to describe the real
world, for example, the number of pages in an information resource. To solve this problem, a specific area with a fixed
set of predicates is created (Lutz, 2002). A concrete domain is a pair ,D , where D is a non-empty set and is
a set of predicates in the D . It can be assumed that given a set of predicate symbols PN where each predicate symbol
P PN is associated with an n arity and maps an n-relation to it as nP D8 . It should be noted that always
contains a single predicateD , that is PN always includes M symbol and is interpreted as DM . Also is always closed
with respect to the complement, that is for every n-predicate symbol P in PN there is an n-predicate symbol in P,
which is interpreted as \n P .
Let be a given concrete area D with a set of predicate symbols PN. Also let a finite set of symbols be given: CN
are atomic concepts, RN are atomic roles,AF RN8 are atomic abstract attributes, CF are atomic concrete attributes. A
sequence of
1 k
f f h з k≥1 with atomic abstract attributes
i
f AF and one concrete attributeh CF will be called a
complex, concrete attribute.
Concepts of logic are defined by grammar (Lutz, 2002):
1
| | | | | | |. . .
n
A C D C D RC RC u u PML y h g j i j
(17)
where A CN , R RN ,
1
,...,
n
u u are arbitrary attributes, P PN is the n-concrete predicate. The
semantics of logic is considered as .,( )II interpretation with the following additions:
- sets Δ and D must not intersect;
- each atomic abstract attribute f AF is assigned a partial function :If ; ;
– each atomic abstract attribute
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
is assigned a partial function
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
.
A composite concrete attribute
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
is interpreted as a composition of partial functions
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
. As a result, a partial function
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
is formed.
The only new (compared to ) type of concept is interpreted as follows:
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
. (18)
The set of points on which the attribute u is defined is expressed by the concept
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
, where M is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
(19)
Indeed, the condition
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
means that either one of functions
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
is undefined at point е or the
tuple
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
does not belong to the predicate
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
but belongs to its complement
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
.
So, the G graph we have is given by BD
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
. (20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let’s take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
possible combinations that
will determine the concept
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
. If we take into account that the motherboard also has
limitations in terms of supporting the maximum size of RAM and the real situation will become even more complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the
graph available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This
266
Програмні засоби аналітики даних
means that the complexity depends on the size of the input data and to solve the problems of inference and feasibil-
ity of concepts, it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the
knowledge base, which traditionally consists of TBox and ABox into two components, so that the subject area is
described DL
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2)
and then
Програмні засоби аналітики даних
[Введите текст]
- each atomic abstract attribute h CF is assigned a partial function :If D .
A composite concrete attribute
1
k
u f f h is interpreted as a composition of partial functions
1
(I I I I
k
u x h f f x . As a result, a partial function :Iu D is formed.
The only new (compared to ) type of concept is interpreted as follows:
1 1 1
1 1
( ) { |
}
. :
, ,
I I
n n
I D
n n n
u u P e x x D u e
x u e x x x P
j j
o o o
.
(18)
The set of points on which the attribute u is defined is expressed by the concept u., where is a specific
predicate that is always present in the PN signature. The following equivalence is valid:
1 1 1 1
, , . . . , , . .
n n
u u P u u u u Py y Mg h y Mg y
(19)
Indeed, the condition
1
( ), , . I
n
e u u Py j means that either one of functions I
i
u is undefined at point е or
the tuple
1
, ,I I
n
u e u e does not belong to the predicate P , P, but belongs to its complement Py .
So, the G graph we have is given by BD
1
| | | | | | | 1 |
| | . | . |
. .
.
n
R
nR P
A C D C D RC R
C
C
un n uR nRC R
ML y h g j i
j
.
(20)
When building a materialization, rules are set according to which it should be built.
Consider the problem of excessive materialization, which can be caused by the following way of constructing
concepts. For example, let's take the computer components motherboard and RAM. The concept that will determine the
compatibility of these two components will be defined as follows:
4_2_ 4
) ( ).( 4 1 . 4hasSlotTypeDDR
R
h
amDDR Mai
a
mboardDDR
eMemory MainboarsSlotTyp DDR dg h g
(21)
RAM Main Board Slots
Ram Model 1 DDR4 MainBoard Model 1 DDR 4 2
Ram Model 2 DDR4 MainBoard Model 2 DDR 4 4
As a result of the materialization, we will get the next G graph that will be set 2
6
15C possible combinations
that will determine the concept 4_2_ 4RamDDR MaimboardDDR . If we take into account that the motherboard also
has limitations in terms of supporting the maximum size of RAM and the real situation will become even more
complicated.
RAM RAM Size Main Board RAM Slots Max Memory support
Ram Model 1 DDR4 32 MainBoard Model 1 DDR 4 2 32
Ram Model 2 DDR4 12 MainBoard Model 2 DDR 4 4 128
Such dependence means that even with a small number of components, the knowledge base representation
system will have to store a huge number of relationships that will determine the materialization. Accordingly, the
inference on such a graph will work very slowly due to the huge number of combinations that form nodes of the graph
available for search, as stated in (Lutz, 2002), such an inference problem belongs to the P Space class. This means that
the complexity depends on the size of the input data and to solve the problems of inference and feasibility of concepts,
it is necessary to reduce the set of input data. To avoid such a problem, it is proposed to divide the knowledge base,
which traditionally consists of TBox and ABox into two components, so that the subject area is described DL
and then (Pic. 4.2) (Pic. 4.2)Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
and two
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
.
An I interpretation satisfies
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
and
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
, in this case is
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
is called executable and the
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
interpre-
tation is called a
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
model and written as
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
.
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of the
semantics in the BD. The quality of metadata affects many processes related to the use of inference, building connections
between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the descrip-
tion of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the organiza-
tion of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet or do not
fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB and reduces
its quality. The measure of compliance with the standard can be the ratio of the number of non-standard metadata to the
total number of metadata used in the description of the IO (Novitsky, et al., 2016):
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
, (22)
where
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
is the total number of IO metadata, а is the number of metadata that does not meet the standard ad-
opted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
267
Програмні засоби аналітики даних
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Програмні засоби аналітики даних
ABox
TBox
Knowledge baseDescription logic Reasoning
Dig data Rules
Abox(D)
Tbox(D)
Knowledge baseDescription logic Reasoning
Pic 4.2 Knowledge base with separation
Thus, the knowledge base consists of two TBox , and two ABox , , , , . An I
interpretation satisfies if ,Q and ,Q , in this case is is called executable and the
interpretation is called a model and written as Q .
5. Estimate quality of metadata and an information object family in Big Data
Metadata quality assessment is intended to find out to what extent certain metadata or metadata schemas present
in a BD meet the tasks that were set before the BD when it was designed. They contribute to the quality functioning of
the semantics in the BD. The quality of metadata affects many processes related to the use of inference, building
connections between the IO description, their input, storage, identification, search and access.
There are two aspects of quality related to metadata. The first of them refers to IO metadata (what IO metadata
is, how fully it describes IO, whether it meets a certain metadata schema standard). The second aspect is related to the
schema of metadata (is the schema of metadata standard, to what extent the chosen schema meets the needs of the
description of IS in a specific subject area). The quality of both aspects is described below.
Compliance with the standard. This characteristic indicates whether a standard IO metadata description scheme
is used. The use of a standard metadata scheme is a fundamental issue in the consideration of the problem of the
organization of search and retrieval of knowledge. The existence of IOs in the DB, the metadata of which do not meet
or do not fully meet the standard, significantly reduces the resolution of fundamentally important issues facing the DB
and reduces its quality. The measure of compliance with the standard can be the ratio of the number of non-standard
metadata to the total number of metadata used in the description of the IO (Novitsky, et al., 2016):
( ( ))
1
( ( ))
w IO md
S dard IO
n IO md
tan
,
(22)
where ( ( ))n IO md is the total number of IO metadata, а is the number of metadata that does not meet the
standard adopted for this BD model
The completeness of the description of the IO in relation to the metadata scheme. This characteristic indicates
the extent to which the metadata schema is fully used to describe the IO. Please note that not all metadata of the selected
scheme can be applied to some types of IOs. Several metadata schemes can be used simultaneously in the DB network,
but the completeness is determined relative to only those metadata that participate in the construction of semantic links
between IOs.
Therefore, the degree of completeness of the description of the IO, according to the selected MS metadata
scheme, is determined as follows:
Pr ( ( ))
,
Re ( ( ))
esent IO md
Completeness IO MS
quired IO md ,
(23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required to
describe the IO.
, (23)
where: md is metadata, MS is metadata schema, Present(md) is the total number of metadata required to describe
the IO, which is actually present in the IO description, Required(md) is the total number of MS metadata required
to describe the IO.
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The charac-
teristic of matching the metadata scheme determines how well the properties of the metadata of the IO correspond to the
properties of the corresponding metadata of the selected scheme. Such properties include the type of data or attributes of
relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
is calculated by the formula:
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In
the general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD envi-
ronment becomes isolated from other data sets and will not be able to take (at least fully) in the process of integration
and reasoning information, in a sense the system becomes isolated because even using mappings between data schemas
will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use standard meta-
data schemas, which are usually aimed at describing IOs of a certain class. There are many metadata schemes. In this
connection, the question of choosing the most suitable for a certain subject area arises. This task is facilitated by the
evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB
information objects conform to the standard. For IO, the characteristic of compliance with the standard is also
significant, but it is at the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic
mean of compliance with the IO standard
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
. (27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess
how much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of
the completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for
all IOs of the BD:
Програмні засоби аналітики даних
[Введите текст]
Compliance with metadata schema. A metadata schema can set certain properties to its metadata. The
characteristic of matching the metadata scheme determines how well the properties of the metadata of the IO
correspond to the properties of the corresponding metadata of the selected scheme. Such properties include the type of
data or attributes of relations between IOs, which in general are also included in the quality model.
Let n is the number of metadata in the MS scheme, mi – is the number of properties of metadata mdi,
,Conformance i j is compliance of property j of metadata mdi ІО with the standard specification of the MS schema.
,Conformance i j is calculated by the formula:
,
0 otherwise
1 iff i metadata propertybelong to j property fromMS
Conformance i j
(24)
The correspondence of the IO to the i MS schema metadata is calculated according to the formula:
1
,
im
j
i
i
Conformance i j
Conformance md
m
(25)
Then the correspondence to the Conformance(MS) metadata schema is calculated using the formula:
1
n
i
i
Conformance md
Conformance MS
n
(26)
Metadata scheme quality characteristics. A set of specially selected metadata make up a metadata schema. In the
general case, such a set can be arbitrary, but this significantly reduces the quality of the DB, because our BD
environment becomes isolated from other data sets and will not be able to take (at least fully) in the process of
integration and reasoning information, in a sense the system becomes isolated because even using mappings between
data schemas will be inefficient due to the scale of the data. In this regard, efforts are being made to develop and use
standard metadata schemas, which are usually aimed at describing IOs of a certain class. There are many metadata
schemes. In this connection, the question of choosing the most suitable for a certain subject area arises. This task is
facilitated by the evaluation of the quality of the metadata scheme.
Compliance with standard metadata schema. This characteristic evaluates the extent to which all DB information
objects conform to the standard. For IO, the characteristic of compliance with the standard is also significant, but it is at
the IO level. In general, compliance with the standard scheme is evaluated as the arithmetic mean of compliance with
the IO standard
1
n
i
i
S dard IO
S dard MS
n
tan
tan
.
(27)
The completeness (usage) of the metadata scheme. This characteristic provides an opportunity to assess how
much a certain scheme is used to describe the entire population of BD IOs. It is based on the characteristic of the
completeness of the description of the IO in relation to the metadata scheme and is its arithmetic average for all IOs of
the BD:
1
,
n
i
i
Completeness IO MS
Completeness MS
n .
(28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let's introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into a
single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family contains
those IOs that it should contain. Completeness can be measured only when it is known what exactly the collection
should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule, families are
distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
:
. (28)
This characteristic makes it possible to assess to what extent the decision to use a certain metadata scheme is
justified, and, if necessary, to make a decision to replace it.
Let’s introduce metrics for evaluating the IO family. A family is a systematized set of IOs that are united into
a single whole based on some meaningful or formal criteria of belonging, for example, regarding the general content,
sources, purpose, semantic independence, method of use, etc.
Completeness of the family. This characteristic establishes to what degree of completeness the family
contains those IOs that it should contain. Completeness can be measured only when it is known what exactly
268
Програмні засоби аналітики даних
the collection should contain, that is, when the original family, which acts as a sample, is known [13]. As a rule,
families are distinguished on the basis of IO attributes.
The formula for measuring family completeness is as follows:
Програмні засоби аналітики даних
1
1
( )
( )
n
i
i
n
i original
i
IO F
Completeness F
IO F
.
(29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the
standard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with
the standard of its IO::
1
( )
n
i
i
S dard IO F
S dard F
n
tan
tan
,
(30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this
feature can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function ,
i j
IdentMd IO IO acquire the following values:
1 have the same set of metadata
,
0 otherwise
i j
i j
IO and IO
IdentMd IO IO
.
(31)
Then the family matching function is defined as follows:
1 1,
,
1
1
n n
i j
i j j i
IdentMd IO IO
Consistency F
n n
.
(32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller, 2013),
(Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took a database
for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find all available
interpretations which will be models for our knowledge base. It means the need to find all compatible components or
find a list of components that are compatible with the selected. This problem more detail describe in (Trentin, et al.,
2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the result depends on the
quality of metadata. And another important characteristic for semantic networks is the speed of reasoning for checking
interpretation. It is related to time which needs to get answers about the compatibility of electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for
interconnecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don't have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does not
belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types of cases
the result of materialization gives 246912 available combinations for our system. This materialization is not included in
the concrete domain. Materialization in the concrete domain will bring an enormous quantity of available nodes because
if we have for example attribute which describes the count of ram slots on main board it allows putting on these slots a
different combination of memory modules. Our optimization also includes checking only bi-directional dependencies
between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
. (29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the stan-
dard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with the
standard of its IO::
Програмні засоби аналітики даних
1
1
( )
( )
n
i
i
n
i original
i
IO F
Completeness F
IO F
.
(29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the
standard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with
the standard of its IO::
1
( )
n
i
i
S dard IO F
S dard F
n
tan
tan
,
(30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this
feature can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function ,
i j
IdentMd IO IO acquire the following values:
1 have the same set of metadata
,
0 otherwise
i j
i j
IO and IO
IdentMd IO IO
.
(31)
Then the family matching function is defined as follows:
1 1,
,
1
1
n n
i j
i j j i
IdentMd IO IO
Consistency F
n n
.
(32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller, 2013),
(Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took a database
for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find all available
interpretations which will be models for our knowledge base. It means the need to find all compatible components or
find a list of components that are compatible with the selected. This problem more detail describe in (Trentin, et al.,
2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the result depends on the
quality of metadata. And another important characteristic for semantic networks is the speed of reasoning for checking
interpretation. It is related to time which needs to get answers about the compatibility of electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for
interconnecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don't have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does not
belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types of cases
the result of materialization gives 246912 available combinations for our system. This materialization is not included in
the concrete domain. Materialization in the concrete domain will bring an enormous quantity of available nodes because
if we have for example attribute which describes the count of ram slots on main board it allows putting on these slots a
different combination of memory modules. Our optimization also includes checking only bi-directional dependencies
between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
, (30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this feature
can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function
Програмні засоби аналітики даних
1
1
( )
( )
n
i
i
n
i original
i
IO F
Completeness F
IO F
.
(29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the
standard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with
the standard of its IO::
1
( )
n
i
i
S dard IO F
S dard F
n
tan
tan
,
(30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this
feature can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function ,
i j
IdentMd IO IO acquire the following values:
1 have the same set of metadata
,
0 otherwise
i j
i j
IO and IO
IdentMd IO IO
.
(31)
Then the family matching function is defined as follows:
1 1,
,
1
1
n n
i j
i j j i
IdentMd IO IO
Consistency F
n n
.
(32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller, 2013),
(Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took a database
for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find all available
interpretations which will be models for our knowledge base. It means the need to find all compatible components or
find a list of components that are compatible with the selected. This problem more detail describe in (Trentin, et al.,
2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the result depends on the
quality of metadata. And another important characteristic for semantic networks is the speed of reasoning for checking
interpretation. It is related to time which needs to get answers about the compatibility of electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for
interconnecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don't have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does not
belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types of cases
the result of materialization gives 246912 available combinations for our system. This materialization is not included in
the concrete domain. Materialization in the concrete domain will bring an enormous quantity of available nodes because
if we have for example attribute which describes the count of ram slots on main board it allows putting on these slots a
different combination of memory modules. Our optimization also includes checking only bi-directional dependencies
between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
acquire the following values:
Програмні засоби аналітики даних
1
1
( )
( )
n
i
i
n
i original
i
IO F
Completeness F
IO F
.
(29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the
standard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with
the standard of its IO::
1
( )
n
i
i
S dard IO F
S dard F
n
tan
tan
,
(30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this
feature can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function ,
i j
IdentMd IO IO acquire the following values:
1 have the same set of metadata
,
0 otherwise
i j
i j
IO and IO
IdentMd IO IO
.
(31)
Then the family matching function is defined as follows:
1 1,
,
1
1
n n
i j
i j j i
IdentMd IO IO
Consistency F
n n
.
(32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller, 2013),
(Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took a database
for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find all available
interpretations which will be models for our knowledge base. It means the need to find all compatible components or
find a list of components that are compatible with the selected. This problem more detail describe in (Trentin, et al.,
2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the result depends on the
quality of metadata. And another important characteristic for semantic networks is the speed of reasoning for checking
interpretation. It is related to time which needs to get answers about the compatibility of electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for
interconnecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don't have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does not
belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types of cases
the result of materialization gives 246912 available combinations for our system. This materialization is not included in
the concrete domain. Materialization in the concrete domain will bring an enormous quantity of available nodes because
if we have for example attribute which describes the count of ram slots on main board it allows putting on these slots a
different combination of memory modules. Our optimization also includes checking only bi-directional dependencies
between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
. (31)
Then the family matching function is defined as follows:
Програмні засоби аналітики даних
1
1
( )
( )
n
i
i
n
i original
i
IO F
Completeness F
IO F
.
(29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the
standard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with
the standard of its IO::
1
( )
n
i
i
S dard IO F
S dard F
n
tan
tan
,
(30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this
feature can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function ,
i j
IdentMd IO IO acquire the following values:
1 have the same set of metadata
,
0 otherwise
i j
i j
IO and IO
IdentMd IO IO
.
(31)
Then the family matching function is defined as follows:
1 1,
,
1
1
n n
i j
i j j i
IdentMd IO IO
Consistency F
n n
.
(32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller, 2013),
(Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took a database
for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find all available
interpretations which will be models for our knowledge base. It means the need to find all compatible components or
find a list of components that are compatible with the selected. This problem more detail describe in (Trentin, et al.,
2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the result depends on the
quality of metadata. And another important characteristic for semantic networks is the speed of reasoning for checking
interpretation. It is related to time which needs to get answers about the compatibility of electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for
interconnecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don't have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does not
belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types of cases
the result of materialization gives 246912 available combinations for our system. This materialization is not included in
the concrete domain. Materialization in the concrete domain will bring an enormous quantity of available nodes because
if we have for example attribute which describes the count of ram slots on main board it allows putting on these slots a
different combination of memory modules. Our optimization also includes checking only bi-directional dependencies
between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
. (32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller,
2013), (Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took
a database for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find
all available interpretations which will be models for our knowledge base. It means the need to find all compatible
components or find a list of components that are compatible with the selected. This problem more detail describe in
(Trentin, et al., 2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the
result depends on the quality of metadata. And another important characteristic for semantic networks is the speed
of reasoning for checking interpretation. It is related to time which needs to get answers about the compatibility of
electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for inter-
connecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don’t have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does
not belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types
of cases the result of materialization gives 246912 available combinations for our system. This materialization is
not included in the concrete domain. Materialization in the concrete domain will bring an enormous quantity of
available nodes because if we have for example attribute which describes the count of ram slots on main board it
allows putting on these slots a different combination of memory modules. Our optimization also includes checking
only bi-directional dependencies between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL
Програмні засоби аналітики даних
1
1
( )
( )
n
i
i
n
i original
i
IO F
Completeness F
IO F
.
(29)
Conformity of the collection to the standard. Determines the extent to which collection IOs conform to the
standard. Compliance with the standard of the family can be considered as the arithmetic average of compliance with
the standard of its IO::
1
( )
n
i
i
S dard IO F
S dard F
n
tan
tan
,
(30)
where n is the number of IOs in the collection
A variety of standards. It is believed that the family should be based on one standard metadata scheme specified
in the external ontology, as the use of many schemes deteriorates the operational characteristics. The quality of this
feature can be measured as the inverse of the number of metadata schema standards used.
Consistency. There are many different situations where a collection can be considered inconsistent (conflicting).
For non-limiting generalizations, we consider only one situation when there are two IOs with absolutely identical values
of their metadata.
Let the function ,
i j
IdentMd IO IO acquire the following values:
1 have the same set of metadata
,
0 otherwise
i j
i j
IO and IO
IdentMd IO IO
.
(31)
Then the family matching function is defined as follows:
1 1,
,
1
1
n n
i j
i j j i
IdentMd IO IO
Consistency F
n n
.
(32)
For modeling our approach, we are using Neo4j as a system for storing and managing big data (Miller, 2013),
(Shi, et al., 2021). Neo4j is a database whose data model is a graph, specifically a property graph. We took a database
for electronic components consisting of boxes, main boards, and memory modules. Our goal is to find all available
interpretations which will be models for our knowledge base. It means the need to find all compatible components or
find a list of components that are compatible with the selected. This problem more detail describe in (Trentin, et al.,
2012), (Thorsten, et al., 2004), (Wang, et al., 2020). As specified in these works the quality of the result depends on the
quality of metadata. And another important characteristic for semantic networks is the speed of reasoning for checking
interpretation. It is related to time which needs to get answers about the compatibility of electronic components.
The metrics of quality data are allowing us to reveal a problem with missing required metadata for
interconnecting components. Due to this information and metrics like compliance with metadata schema as a result of
cleaning data, we built graph storage which consists of 44195 relations, we don't have any nodes without missing
important data. This graph has a relation between memories, main boards, and cases. At first look, this graph does not
belong to big data but if we take only 54 different types of memories, 113 types of mainboards, and 119 types of cases
the result of materialization gives 246912 available combinations for our system. This materialization is not included in
the concrete domain. Materialization in the concrete domain will bring an enormous quantity of available nodes because
if we have for example attribute which describes the count of ram slots on main board it allows putting on these slots a
different combination of memory modules. Our optimization also includes checking only bi-directional dependencies
between components.
Our idea to split the knowledge database into two-part brings the possibility of extracting information from a
database with materialization without a concrete domain.
We are build relation in our graph that it responsibility to DL the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
the main condition for building relation
avoid concrete domain. On Pic5.3 demonstrate relation between our components.
Three approaches were tested on the test data set. The first time q1 when relation were built taking into account
all possible variations, including the quantities of the selected components.
The second approach q2 consisted in grouping components by common value of attributes in such a way as to
avoid building additional connections. And the last optimization q3 consisted in the fact that first all compatible compo-
nents were searched, and only then the conditions of quantitative restrictions for a concrete domain were checked for
satisfaction.
269
Програмні засоби аналітики даних
Програмні засоби аналітики даних
[Введите текст]
Pic 5.3 Part of the graph representation for sematic big data
Three approaches were tested on the test data set. The first time 1
q when relation were built taking into account
all possible variations, including the quantities of the selected components.
The second approach 2
q consisted in grouping components by common value of attributes in such a way as to
avoid building additional connections. And the last optimization 3
q consisted in the fact that first all compatible
components were searched, and only then the conditions of quantitative restrictions for a concrete domain were checked
for satisfaction.
One problem is that the same component can be reinstalled twice or more depending on the number of
previously selected components. That is, if the motherboard has 8 RAM sockets, then there may be a situation when 8
identical memory modules are selected, and there may be 8 different modules. Moreover, for the motherboard, we must
check not only the quantitative limitation of the number of occupied sockets, but also the limitation regarding the
maximum amount of memory supported by the motherboard
Type optimization and query Execution time Count results
1
q list all mainboards 5612 ms 6850
1
q list all mainboards for the specific memory modules 4630 ms 6432
2
q list all mainboards (specification was grouped) 3400 ms 6850
2
q list all mainboards for the specific memory modules (specification was grouped) 2530 ms 6432
3
q list all main boards (specification was grouped and quantity restriction included) time for two
query
780 ms 6850
3
q list all main boards for the specific memory modules (specification was grouped and quantity
restriction included) time for two query
43 ms 6432
As we can see, the simplification of requests gives a significant increase in the speed of execution. But result BD
systems depend on characteristics such as the completeness of the description, compliance with the metadata scheme. It
should be noted that according to the expert evaluation of work with web resources, the response of the web service
should be up to 600 ms.
6. Conclusions
The complexity of big data applications combined with the lack of standards for the representation of
information objects, processing and storage requires significant resources. Data quality is one of the approaches that
will allow achieving modeling of data that will require simpler algorithms for analysis. Analysis of data quality allows
increasing their accuracy in various aspects. Enrich data semantics is a complex process of describing big data by
ontological means. However, there is a problem with the speed of inference, the article proposes a method of knowledge
base materialization in the environment of big data to optimize inference. The quality of the data plays a key role in this,
allowing to build of appropriate graphs of schematic data on the basis of metadata.
Pic 5.3 Part of the graph representation for sematic big data
One problem is that the same component can be reinstalled twice or more depending on the number of previ-
ously selected components. That is, if the motherboard has 8 RAM sockets, then there may be a situation when 8 identical
memory modules are selected, and there may be 8 different modules. Moreover, for the motherboard, we must check not
only the quantitative limitation of the number of occupied sockets, but also the limitation regarding the maximum amount
of memory supported by the motherboard
Type optimization and query Execution time Count results
list all mainboards 5612 ms 6850
list all mainboards for the specific memory modules 4630 ms 6432
list all mainboards (specification was grouped) 3400 ms 6850
list all mainboards for the specific memory modules
(specification was grouped) 2530 ms 6432
list all main boards (specification was grouped and quantity restriction
included) time for two query 780 ms 6850
list all main boards for the specific memory modules
(specification was grouped and quantity restriction included)
time for two query
43 ms 6432
As we can see, the simplification of requests gives a significant increase in the speed of execution. But result BD
systems depend on characteristics such as the completeness of the description, compliance with the metadata scheme.
It should be noted that according to the expert evaluation of work with web resources, the response of the web service
should be up to 600 ms.
6. Conclusions
The complexity of big data applications combined with the lack of standards for the representation of informa-
tion objects, processing and storage requires significant resources. Data quality is one of the approaches that will allow
achieving modeling of data that will require simpler algorithms for analysis. Analysis of data quality allows increasing
their accuracy in various aspects. Enrich data semantics is a complex process of describing big data by ontological means.
However, there is a problem with the speed of inference, the article proposes a method of knowledge base materialization
in the environment of big data to optimize inference. The quality of the data plays a key role in this, allowing to build of
appropriate graphs of schematic data on the basis of metadata.
Higher data quality levels can help produce better reasoning results but also help improve data maintainability
and reusability and integration.
References
1. Amsler, R., 1972. Application of Citation-based Automatic Classification, Austin: s.n.
2. Ceravolo, P. et al., 2018. Big data semantics. Journal on Data Semantics, 7(2), pp. 65-85.
3. Harford, T., 2014. Big data: A big mistake?. Significance , 11(5), pp. 14-19.
4. Intel IT Center, I. C., 2012. Centre. Big Data Analytics: Intel’s IT Manager Survey on How Organizations Are Using Big Data, Santa Clara: s.n.
270
Програмні засоби аналітики даних
5. Laney, D., 2001. 3D data management: Controlling data volume, velocity and variety, місце видання невідоме: META group.
6. Lutz, C., 2002. The Complexity of Description Logics with Concrete Domains, Hamburg: автор невідомий
7. Miller, J. J., 2013. Graph database applications and concepts with Neo4j. In Proceedings of the southern association for information systems con-
ference, 2324(36).
8. Novitsky, A., Reznychenko, V. & Romanov, E., 2016. Characteristics and quality metrics of electronic libraries in the semantic web. Software
engineering, 1(25), pp. 17-36.
9. Novytskyi, O., Proskudina, G. & Ovdiy, O., 2014. Development of an digital library quality model. місце видання невідоме, Lviv Polytechnic
Publishing House, p. 284–285.
10. Novytskyi, O., Proskudina, G. Y., Reznichenko, V. & Ovdiy, O., 2014. Evaluation of the quality of electronic libraries in the web environment.
Software engineering, 20(4).
11. Novytskyi, O. V., 2010. Data integration in the Internet: linked data. Kyiv, Institute of Software Systems of the National Academy of Sciences of
Ukraine, pp. 487-493.
12. Raphael, V., Staab, S. & Motik, B., 2005. Incrementally maintaining materializations of ontologies stored in logic databases. Journal on Data
Semantics, pp. 1-34.
13. Schmidt-Schaubß, M. & Smolka, G., 1991. Attributive concept descriptions with complements. Artif. Intell, 48(1), pp. 1-26.
14. Schroeck, M. et al., 2012. Analytics: The Real-World Use of Big Data, s.l.: IBM.
15. Shi, P., Fan, G., Li, S. & Kou, D., 2021. Big Data Storage Technology for Smart Distribution Grid Based on Neo4j Graph Database. IEEE 4th
International Conference on Electronics Technology (ICET), pp. 441-445.
16. Spirin, O. M. et al., 2012. Collective monograph. Electronic library information systems of scientific and educational institutions. Kyiv: Pedagogi-
cal press.
17. Stuart Ward, J. & Barker, A., 2013. Undefined By Data: A Survey of Big Data Definitions.
18. Suthaharan, S., 2014. Big data classification: Problems and challenges in network intrusion prediction with machine learning.. ACM SIGMETRICS
Performance Evaluation Review, 41(4), pp. 70-73.
19. Thorsten, B., Nizar, A., Kreutler, G. & Gerhard, F., 2004. Product Configuration Systems: State of the Art, Conceptualization and Extensions.
Munich, University Library of Munich, pp. 25-36.
20. Trentin, A., Perin, E. & Forza, C., 2012. Product configurator impact on product quality. International Journal of Production Economics, 135(2),
pp. 850-859.
21. Wang, Y., Wenlong, Z. & Wayne, X. W., 2020. Needs-based product configurator design for mass customization using hierarchical attention net-
work. IEEE Transactions on Automation Science and Engineering, 18(1), pp. 195-204.
22. Wilkinson, M. et al., 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, pp. 1-9.
23. Woods, W. A., 1975. What’s in a link: Foundations for semantic networks.. Representation and understanding, pp. 35-82.
Received 03.08.2022
About the author:
Novytskyi Oleksandr Vadumovuch
PhD, researcher, Kyiv, Hirsch index 9,
number of publications 70,
ORCID 0000-0002-9955-7882,
tel. 067 44 53 173, alex.googl@gmail.com
Place of work:
Institute of Software Systems of NAS of Ukraine
3187, Kyiv, ave. Akademika Glushkova, 40, building 5,
tel. (044) 526-33-19, e-mail: iss@isofts.kiev.ua
Прізвище та ініціали автора і назва доповіді українською мовою:
Новицький О.В.
Поняття якості та оцінювання якості великих даних
в семантичному середовищі
Прізвище та ініціали автора і назва доповіді англійською мовою:
Novytskyi O.V.
The concept and evaluating of big data quality in the semantic environment
|
| id | pp_isofts_kiev_ua-article-527 |
| institution | Problems in programming |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-07-17T09:41:12Z |
| publishDate | 2023 |
| publisher | PROBLEMS IN PROGRAMMING |
| record_format | ojs |
| resource_txt_mv | ppisoftskievua/a7/0e3d015f055b467859318072ecd958a7.pdf |
| spelling | pp_isofts_kiev_ua-article-5272023-06-25T06:57:27Z The concept and evaluating of big data quality in the semantic environment Поняття якості та оцінювання якості великих даних в семантичному середовищі Novitsky, A.V. big data; complex data sets UDC 004.05 великі дані; складних наборів даних; теорія оцінювання якості УДК 004.05 Big data refers to large volumes, complex data sets with various autonomous sources, characterized by continuous growth. Data storage and data collection capabilities are now rapidly expanding in all fields of science and technology due to the rapid development of networks. Evaluating the quality of data is a difficult task in the context of big data, because the speed of semantic data reasoning directly depends on its quality. The appropriate strategies are necessary to evaluate and assess data quality according to the huge amount of data and its rapid generation. Managing a large volume of heterogeneous and distributed data requires defining and continuously updating metadata describing various aspects of data semantics and its quality, such as conformance to metadata schema, provenance, reliability, accuracy and other properties. The article examines the problem of evaluating the quality of big data in the semantic environment. The definition of big data and its semantics is given below and there is a short excursion on a theory of quality assessment. The model and its components which allow to form and specify metrics for quality have already been developed. This model includes such components as: quality characteristics; quality metric; quality system; quality policy. A quality model for big data that defines the main components and requirements for data evaluation has already been proposed. In particular, such evaluation components as: accessibility, relevance, popularity, compliance with the standard, consistency, etc. are highlighted. The problem of inference complexity is demonstrated in the article. Approaches to improving fast semantic inference through materialization and division of the knowledge base into two components, which are expressed by different dialects of descriptive logic, are also considered below. The materialization of big data makes it possible to significantly speed up the processing of requests for information extraction. It is demonstrated how the quality of metadata affects materialization. The proposed model of the knowledge base allows increasing the qualitative indicators of the reasoning speed.Prombles in programming 2022; 3-4: 260-270 Великі дані стосуються великих обсягів, складних наборів даних із різними автономними джерелами, що характеризуються постійним зростанням. Зі швидким розвитком мереж, зберігання даних і можливостей збору даних, великі дані швидко розши- рюються в усіх сферах науки та техніки. У контексті великих даних оцінка якості даних є складною задачею. Для семантичних даних якість і швидкість виводу безпосередньо залежить від якості даних. Враховуючи величезний обсяг даних і їх швидке генерування, це вимагає відповідних стратегій для оцінки якості даних. Управління великим обсягом різнорідних і розподілених даних вимагає визначення та постійного оновлення метаданих, що описують різні аспекти семантики та якості даних, такі як від- повідність схемі метаданих, походження, надійність, точність та інші властивості. В статі розглянута проблематика оцінювання якості великих даних у семантичному середовищі. Наведено визначення великих даних та їх семантики, зроблено невеликий екскурс в теорію оцінювання якості. Розроблена модель та її компоненти, що дозволяє сформувати та конкретизувати метрики для якості. В дану модель входять такі компоненти як: характеристика якості, метрика якості, система якості, політка якості. Запро- понована модель якості для великих даних, яка визначає основні компоненти та вимоги до оцінювання даних. Зокрема, виділено такі компоненти оцінювання як: доступність, релеватність, популярність, відповідність стандарту, узгодженість тощо. Продемонстрована проблема складності виводу. Розглянуто підходи до покращення швидкого семантичного виводу через матеріалізацію та поділ бази знань на два компоненти, які виражаються різними діалектами дескриптивної логіки. Оскільки матеріалізація великих даних дозволяє значно пришвидшити обробку запитів на екстракцію інформації. Продемонстровано як якість метаданих вливає на матеріалізацію. Запропонована модель бази знань, яка дозволяє підвищити якісні показники швидкості виводу.Prombles in programming 2022; 3-4: 260-270 PROBLEMS IN PROGRAMMING ПРОБЛЕМЫ ПРОГРАММИРОВАНИЯ ПРОБЛЕМИ ПРОГРАМУВАННЯ 2023-01-23 Article Article application/pdf https://pp.isofts.kiev.ua/index.php/ojs1/article/view/527 10.15407/pp2022.03-04.260 PROBLEMS IN PROGRAMMING; No 3-4 (2022); 260-270 ПРОБЛЕМЫ ПРОГРАММИРОВАНИЯ; No 3-4 (2022); 260-270 ПРОБЛЕМИ ПРОГРАМУВАННЯ; No 3-4 (2022); 260-270 1727-4907 10.15407/pp2022.03-04 en https://pp.isofts.kiev.ua/index.php/ojs1/article/view/527/579 Copyright (c) 2023 PROBLEMS IN PROGRAMMING |
| spellingShingle | big data complex data sets UDC 004.05 Novitsky, A.V. The concept and evaluating of big data quality in the semantic environment |
| title | The concept and evaluating of big data quality in the semantic environment |
| title_alt | Поняття якості та оцінювання якості великих даних в семантичному середовищі |
| title_full | The concept and evaluating of big data quality in the semantic environment |
| title_fullStr | The concept and evaluating of big data quality in the semantic environment |
| title_full_unstemmed | The concept and evaluating of big data quality in the semantic environment |
| title_short | The concept and evaluating of big data quality in the semantic environment |
| title_sort | concept and evaluating of big data quality in the semantic environment |
| topic | big data complex data sets UDC 004.05 |
| topic_facet | big data complex data sets UDC 004.05 великі дані складних наборів даних теорія оцінювання якості УДК 004.05 |
| url | https://pp.isofts.kiev.ua/index.php/ojs1/article/view/527 |
| work_keys_str_mv | AT novitskyav theconceptandevaluatingofbigdataqualityinthesemanticenvironment AT novitskyav ponâttââkostítaocínûvannââkostívelikihdanihvsemantičnomuseredoviŝí AT novitskyav conceptandevaluatingofbigdataqualityinthesemanticenvironment |