Алгоритми очищення статистичної вибірки від аномалій для задач data science
The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative...
Збережено в:
| Дата: | 2023 |
|---|---|
| Автори: | , , , |
| Формат: | Стаття |
| Мова: | Англійська |
| Опубліковано: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2023
|
| Теми: | |
| Онлайн доступ: | https://journal.iasa.kpi.ua/article/view/260175 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Репозитарії
System research and information technologies| _version_ | 1867334426393837568 |
|---|---|
| author | Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya |
| author_facet | Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya |
| author_institution_txt_mv | [
{
"author": "Oleksii Pysarchuk",
"institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv"
},
{
"author": "Danylo Baran",
"institution": "Codeimpact B.V., Kyiv"
},
{
"author": "Yurii Mironov",
"institution": "National Aviation University, Kyiv"
},
{
"author": "Illya Pysarchuk",
"institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv"
}
] |
| author_sort | Pysarchuk, Oleksii |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2023-05-24T21:28:17Z |
| description | The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2023.1.06 |
| first_indexed | 2025-07-17T10:27:54Z |
| format | Article |
| fulltext |
O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk, 2023
78 ISSN 1681–6048 System Research & Information Technologies, 2023, № 1
UDC 004.5
DOI: 10.20535/SRIT.2308-8893.2023.1.06
ALGORITHMS OF STATISTICAL ANOMALIES CLEARING FOR
DATA SCIENCE APPLICATIONS
O. PYSARCHUK, D. BARAN, Yu. MIRONOV, I. PYSARCHUK
Abstract. The paper considers the nature of input data used by Data Science algo-
rithms of modern-day application domains. It then proposes three algorithms de-
signed to remove statistical anomalies from datasets as a part of the Data Science
pipeline. The main advantages of given algorithms are their relative simplicity and a
small number of configurable parameters. Parameters are determined by machine
learning with respect to the properties of input data. These algorithms are flexible
and have no strict dependency on the nature and origin of data. The efficiency of the
proposed approaches is verified with a modeling experiment conducted using algo-
rithms implemented in Python. The results are illustrated with plots built using raw
and processed datasets. The algorithms application is analyzed, and results are com-
pared.
Keywords: anomaly removal, anomaly detection, noise removal, statistical tech-
niques, data analysis, big data, data cleaning.
INTRODUCTION
The Data Science techniques and approaches are widely used to solve modern day
technical problems. One of frequent use-cases is implementing Back-End soft-
ware components for distributed CRM and ERP with intellectual features. A pos-
sible format of input data for such systems is Big Data arrays that could be inter-
preted as numeric statistical samples. This kind of data is used in a variety of
application domains, such as Commodity Trade and Risk Management systems,
automotive applications like automated driver assistants, Unmanned Aerial Vehi-
cle software, Computer Vision software responsible for raster-to-vector conver-
sion et cetera [1–6].
In solutions from the aforementioned application domains, the underlying
mathematical models are a priory known. Therefore, dataset processing is reduced
to mapping data features to model properties. This allows to determine the trend
line of a considered process both within observation range and beyond it (by
means of interpolation and extrapolations – retrospective and perspective progno-
ses) [2–4]. This paper suggests treating the process of analytical model configur-
ing as an unsupervised learning activity, since any solution of Artificial Intelli-
gence problem, in its first iteration, is implemented as a process of parameters
configuration with respect to input dataset [2–4]. Moreover, such models could be
associated with artificial intelligence technologies because of the aforementioned
and because of their prognosis features.
Statistical dataset processing is based on the hypothesis of accidental factors
model that result in input dataset bias. Statistical data are usually obtained through
experiments and sampling.
Algorithms of statistical anomalies clearing for data science applications
Системні дослідження та інформаційні технології, 2023, № 1 79
The first stage of step-by-step statistical analysis includes a preparatory
stage — data research [2–4, 7]. Tasks of this stage include checking input Big
Data array for anomalies. Any value that has a significant difference from the
other dataset values could be treated as anomaly. Abnormal data is a result of
various miscalculations and malfunctions during the data sampling. Two major
types of anomalies are value skips and crude measurements. Both of these
anomaly types, if not handled during next stages, will result in distorted results of
statistical analysis. The distortions are represented with increased dispersion of
scoring and/or with bias of scoring result [2–4, 7]. Possible ways to address the
problem of distorted data are smoothing or evaluation.
In order to reduce influence of anomalies on further data processing, it is
necessary to detect abnormalities and to restore or remove abnormal values.
These actions could be translated to a tuple of stages that will represent the entire
process of clearing dataset from abnormal values [3, 4, 7–10].
Anomaly detection is based on their analyzing properties with respect to ab-
solute values, trend dynamics and statistical properties change [3, 4, 7, 9–10].
Related works. A significant amount of known approaches to anomaly
clearing has been considered [7, 8]. They possess a common flaw – it is necessary
to manually tune parameters with respect to input dataset and anomaly properties.
Moreover, the majority of them will result in NP problems when applied to Big
Data array. This makes their application impossible.
Therefore, the task of designing simple and efficient data clearing algorithms
for Data Science purposes remains relevant.
Goal. The goal of the paper is proposing precise, performant, efficient and
relatively simple algorithms for clearing datasets from anomalies.
Proposal. Experience of multiple IT proejcts related to Data Science allows
to formulate practical requirements to algorithms responsible for anomaly
detection and clearing. They include:
1. High efficiency of detection, integrity and adequacy of output, precise
values of dispersion and standard deviation.
2. Relatively low computing complexity, therefore high performance with
large datasets.
3. Automatic adjustment of parameters with respect to the input data
properties, or minimal amount of manual settings;
Three algorithms of anomaly detection and removal have been developed to
address these requirements.
Data preprocessing algorithm – the “sliding_wind” algorithm
The main idea of the algorithm is application of smoothing within a trivial sliding
window. Within the window, an arithmetic mean value is calculated. The size of
the sliding window is calculated in accordance to the demand of quasi-linear trend
that describes the change considered process withing the window.
Sliding_wind algorithm stages:
1. Considering statistical sample nixX i ...1},{ as an input.
2. Formulating Nwin-dimensional sliding window.
3. Calculating an arithmetic mean:
winNj ,...,1 .
O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 80
4. Formulating a clean dataset nixX i ...1},ˆ{ˆ by replacing elements of
X sample with arithmetic mean jx̂ : jNwini xx ˆ , starting from the last entry
within sliding window.
5. Shifting the sliding window to the next dataset dimension to the right – to
1 ii .
6. Repeating steps 2–5 within input statistical sample ni ...1 .
7. After processing the entire dataset, in order to adjust data within the first
sliding window, a subset N=2Nwin is created. After that, steps 2–5 are applied to
this subset, traversing it in reverse.
The advantages of sliding_wind algorithm are its simplicity, minimum num-
ber of manual settings (only window size), potential for effective usage on data-
sets with fuzzy trend properties and anomalies and equal sizes of input and output.
It could be possible to include nonlinear estimation model for jx̂ inside
sliding_wind algorithm, but it is inadvisable due to major increase in complexity.
This would especially affect Big Data inputs.
Results of modeling and efficiency estimation of sliding_wind algorithm.
For the modeling, a n = 10000 dataset has been considered. Quadratic trend and
normal distribution are present in the dataset. Normal distribution has expected
value of 0 and standard deviation X 5.0. Abnormal entries are evenly
distributed within selection and constitute 10% of values. The model of trend and
stochastic components is additive. Computations are conducted using Python
implementation of algorithm [5, 6] using features of numpy and matplotlib
libraries.
Results of sliding_wind execution are provided in Fig. 1.
Fig. 1, a represents a plot built from input dataset nixX i ,...,1,}{ with
normal noise and anomalies, as well as the trend line obtained via least squares
method [3, 4]. Standard deviation of input dataset with anomalies is X 6.64.
Fig. 1, b represents a plot built from dataset cleared from anomalies by means of
sliding_wind algorithm. Standard deviation of such a dataset is X 2.95. Com-
paring plots 1, a and 1, b, as well as reduced value of standard deviation indicates
efficiency of the sliding_wind algorithm.
}{ ixX
ni ...1
i
}{ ixX
ni ...1a b
Fig. 1. Results of sliding_wind execution: a — input dataset, X 6.64; b — processed
dataset, X 2.95
Algorithms of statistical anomalies clearing for data science applications
Системні дослідження та інформаційні технології, 2023, № 1 81
Algorithm of dataset statistic properties control – the “medium” algorithm
The main idea of the algorithm is detecting abnormal elements by determining
etalon parameters of dataset (expected value, standard deviation) without anoma-
lies and comparing them with initial statistical features. Detected anomalies are
replaced with arithmetic mean. Etalon and initial statistical features are derived
from datasets that formulate initial initial and current sliding windows respec-
tively.
Stages of medium algorithm:
1. Considering statistical sample nixX i ...,1},{ .
2. Within the first sliding window with size of Nwin dimensions, esti-
mation of etalon expected value, standard deviation and dispersion is con-
ducted:
Nwin
j
jj x
Nwin
x
1
etalon 1
ˆ ,
Nwin
j
jj xx
Nwin
D
1
2etalon
j )ˆ(
1
1ˆ ,
jD̂ˆ etalon
j , Nwinj ...1 .
3. Formulating the next sliding window with size of Nwin dimensions is
conducted.
4. Determining expected value, dispersion and standard deviation for sliding
windows:
Nwin
j
jj x
Nwin
x
1
1
ˆ ,
Nwin
j
jjj xx
Nwin
D
1
2)ˆ(
1
1ˆ , jj D̂ˆ , Nwinj ...1 .
5. If the following condition is true
etalon
jˆˆ Qj ,
( Q — weighting factor), then current j -th dimension, added to sliding window
is considered to be abnormal.
Abnormal j -th dimension is replaced with estimated expected value of cur-
rent sliding window ji xx ˆ inside }{ ixX dataset.
If condition etalonˆˆ jj Q is false, then the }{ ixX dataset is not modified.
6. Repeating steps 3–5 within statistical sample of ni ...1 .
7. The result of algorithm is dataset nixX i ...1},ˆ{ˆ clear of anomalies.
The main advantages of suggested algorithm are simplicity of
implementation, statistical approach to determining abnormality of values,
immutability of dataset when no anomalies detected, keeping the same size of
dataset after processing.
One peculiarity of the algorithm is that etalon dataset should contain no
anomalies. Also, Q weighting factor is determined proportionally to the size of
abnormality bias.
Results of modeling and efficiency estimation of medium algorithm.
Modeling conditions are the same as for sliding_wind algorithm. Research results
are provided on Fig. 2, in equivalence to Fig. 1.
O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 82
Comparing plots depicted at Fig. 2, a and 2, b, as well as standard deviation
reduction ( X 4.45) tells of medium algorithm efficiency.
Algorithm of control over dataset dynamic properties change — the LSM
algorithm
The idea of the algorithm is in detecting abnormal values by determining etalon
parameters of trend dynamic change speed, and comparing them to current prop-
erties using Least Squares Method. Etalon values of speed are calculated accord-
ing to input dataset, and current values – according to sliding window selections.
During comparison, the statistical estimation errors are respected.
LSM algorithm steps:
1. Considering input dataset nixX i ...1},{ .
2. Using polynomial model LSM, the etalon speed is calculated within
dataset:
2
etalon cSpeed , with LSM polynomial ...)( 2
210 xcxccxT
3. Formulating sliding window with Nwin dimensions.
4. Determining current estimation of expected value, dispersion and standard
error for the sliding windows:
Nwin
j
jj x
Nwin
x
1
1
ˆ ,
Nwin
j
jjj xx
Nwin
D
1
2)ˆ(
1
1ˆ , jj D̂ˆ , Nwinj ...1 .
5. Determining controlled parameters for abnormality that are scaled up to
size of the datasets with scores and respecting dimension errors:
nSpeed etalonInd_1 , NwinSpeedQ j etalonˆInd_2 ,
( Q — setting weighting factor).
6. If condition is true
Ind_1Ind_2 ,
then current j -th dimension added to sliding window is treated as anomaly.
Abnormal j -th dimension is replaced with LSM score )ˆ( ji xxTx inside
dataset }{ ixX .
If condition Ind_1Ind_2 is false, }{ ixX dataset remains immutable.
ixX }{
ni ...1
}{ ixX
ni ...1 a b
Fig. 2. Results of medium algorithm execution: a — input dataset X 6.64; b —
rocessed dataset X 4.45
Algorithms of statistical anomalies clearing for data science applications
Системні дослідження та інформаційні технології, 2023, № 1 83
6. Repeating steps 3–5 within input dataset ni ...1 .
7. The result of algorithm is dataset nixX i ...1},ˆ{ˆ clear of anomalies.
The main advantages of suggested algorithm are simplicity of implementa-
tion, dynamic criteria of abnormality, modification of abnormal elements only,
same size of input and result datasets.
One peculiarity of algorithm is that it does not require parameter control
when determining etalon values; the weighting factor Q is determined propor-
tionally to the size of abnormality bias.
Results of modeling and efficiency estimation of LSM algorithm. Model-
ing conditions are the same as for sliding_wind algorithm. Results are depicted at
Fig. 3. Equivalent to Fig. 1.
Comparison of plots depicted at Fig. 3, a and 3, b, as well decreasing stan-
dard deviation ( X 4.7) illustrates algorithm efficienccy.
Generalized statistical properties and reduction of bias with three proposed
algorithms are depicted at table.
Generalized statistical properties of three proposed algorithms
Raw data with abnormalities sliding_wind medium MNK
X 6.64 X 2.95 X 4.45 X 4.71
Data from table demonstrate that sliding_wind algorithm is the most suc-
cessful precision-wise — it has the least standard deviation. This algorithm is also
the most productive and has no need for configuration. However, it may be
unapplicable to dynamic strongly nonlinear processes — its usage may result in
biases. Medium and MNK algorithms are the opposites to sliding_wind. In order
to choose specific algorithm, dataset properties and anomaly features (such as
margin of error or the nature of the trend model) have to be considered.
Comparing proposed methods with more sophisticated alternatives gave no
results, since using data of equivalent complexity (N=10000) yielded no results
from these alternatives in sensible time.
Conclusions. The proposed algorithms are not expensive to implement, they
do not require major manual tuning and show acceptable precision and
performance for Big Data arrays processing.
{ }{ ixX
ni ...1
}{ ixX
ni ...1a b
Fig. 3. Results on MNK algorithm execution: a — raw dataset X 6.64; b — processed
dataset X 4.71
O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 84
REFERENCES
1. F. Provost and T. Fawcett, Data Science for Business. USA: O’Reilly Media, Inc,
2013, 409 p.
2. D. Dietrich, Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing
and Presenting Data. Indianapolis, Indiana, USA: John Wiley & Sons, Inc, 2015,
420 p. doi: 10.1002/9781119183686.ch1.
3. O. Pysarchuk and V. Kharchenko, Nonlinear multi-criteria process modeling in traf-
fic management systems, (in Ukrainian). Kyiv: Institute of Gifted Child, 2015, 248 p.
4. S. Kovbasiuk, O. Pysarchuk, and M. Rakushev, Least Squares Method and its practical
applications, (in Ukrainian). Zhytomyr: Zhytomyr Military Institute, 2008, 228 p.
5. S. Raschka, Y. Liu, and V. Mirjalili, Machine Learning with PyTorch and Scikit-
Learn: Develop machine learning and deep learning models with Python. Birming-
ham: Packt, 2022.
6. P. Joshi, Artificial Intelligence with Python. Birmingham: Packt, 2017.
7. G. Kishan, K. Chilukuri, and H. HuaMing, Anomaly Detection Principles and Algo-
rithms. Switzerland, Springer, 2017, 229 p. doi: 10.1007/978-3-319-67526-8.
8. O. Pysarchuk and Y. Mironov, Chromosome Feature Extraction and Ideogram-
Powered Chromosome Categorization. Switzerland, Springer, 2022. doi:
10.1007/978-3-031-04812-8_36.
9. H. Blomquist and J. Möller, Anomaly detection with Machine learning. Quality as-
surance of statistical data in the Aid community. Uppsala: Uppsala University, 2015, 60 p.
10. S. Thudumu, P. Branch, J. Jin, and J. Singh, A comprehensive survey of anomaly de-
tection techniques for high dimensional big data. Switzerland, Springer, 2017, 30 p.
doi: 10.1186/s40537-020-00320-x.
Received 10.08.2022
INFORMATION THE ARTICLE
Oleksii O. Pysarchuk, ORCID: 0000-0001-5271-0248, National Technical University of
Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail:
PlatinumPA2212@gmail.com
Danylo R. Baran, ORCID: 0000-0002-3251-8897, “Codeimpact B.V”, Ukraine, e-mail:
danil.baran15@gmail.com
Yurii G. Mironov, ORCID: 0000-0002-2291-5864, National Aviation University,
Ukraine, e-mail: yuriymironov96@gmail.com
Illya O. Pysarchuk, ORCID: 0000-0003-4343-0142, National Technical University of
Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: flimka134@gmail.com
АЛГОРИТМИ ОЧИЩЕННЯ СТАТИСТИЧНОЇ ВИБІРКИ ВІД АНОМАЛІЙ
ДЛЯ ЗАДАЧ DATA SCIENCE / О.О. Писарчук, Д.Р. Баран, Ю.Г. Міронов,
І.О. Писарчук
Анотація. Розглянуто природу даних, що використовуються в задачах сучас-
них прикладних областей. Запропоновано декілька алгоритмів очищення ста-
тистичної вибірки від аномалій в конвеєрі задач Data Science. Відзнакою та пе-
ревагою запропонованих алгоритмів є їх відносна простота та обмежена
кількість параметрів налаштувань, що визначаються за технологіями навчання
відповідно до властивостей вхідних статистичних даних. Запропоновані алго-
ритми є достатньо гнучкими у використанні і не залежать від природи та по-
ходження даних. Результати модельного експерименту запропонованих підхо-
дів у вигляді скриптів мовою Python та базових бібліотек довели їх
ефективність. Результати проілюстровано графіками, побудованими з викори-
станням початкових даних та даних, що змінені за допомогою запропонованих
алгоритмів. Застосування алгоритмів проаналізовано та порівняно результати
виконання алгоритмів.
Ключові слова: очищення від аномалій, виявлення аномалій, видалення шу-
му, статистичні методи, аналіз даних, великі дані, очищення даних.
|
| id | journaliasakpiua-article-260175 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-07-17T10:27:54Z |
| publishDate | 2023 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/41/d7419517dc1acdb2265da9bdca60b841.pdf |
| spelling | journaliasakpiua-article-2601752023-05-24T21:28:17Z Algorithms of statistical anomalies clearing for data science applications АЛГОРИТМЫ ОЧИЩЕНИЯ СТАТИСТИЧЕСКОЙ ВЫБОРКИ ОТ АНОМАЛИЙ ДЛЯ ЗАДАЧ DATA SCIENCE Алгоритми очищення статистичної вибірки від аномалій для задач data science Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних anomaly removal anomaly detection noise removal statistical techniques data analysis big data data cleaning The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared. Розглянуто природу даних, що використовуються в задачах сучасних прикладних областей. Запропоновано декілька алгоритмів очищення статистичної вибірки від аномалій в конвеєрі задач Data Science. Відзнакою та перевагою запропонованих алгоритмів є їх відносна простота та обмежена кількість параметрів налаштувань, що визначаються за технологіями навчання відповідно до властивостей вхідних статистичних даних. Запропоновані алгоритми є достатньо гнучкими у використанні і не залежать від природи та походження даних. Результати модельного експерименту запропонованих підходів у вигляді скриптів мовою Python та базових бібліотек довели їх ефективність. Результати проілюстровано графіками, побудованими з використанням початкових даних та даних, що змінені за допомогою запропонованих алгоритмів. Застосування алгоритмів проаналізовано та порівняно результати виконання алгоритмів. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023-03-30 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/260175 10.20535/SRIT.2308-8893.2023.1.06 System research and information technologies; No. 1 (2023); 78-84 Системные исследования и информационные технологии; № 1 (2023); 78-84 Системні дослідження та інформаційні технології; № 1 (2023); 78-84 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/260175/274360 |
| spellingShingle | очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya Алгоритми очищення статистичної вибірки від аномалій для задач data science |
| title | Алгоритми очищення статистичної вибірки від аномалій для задач data science |
| title_alt | Algorithms of statistical anomalies clearing for data science applications АЛГОРИТМЫ ОЧИЩЕНИЯ СТАТИСТИЧЕСКОЙ ВЫБОРКИ ОТ АНОМАЛИЙ ДЛЯ ЗАДАЧ DATA SCIENCE |
| title_full | Алгоритми очищення статистичної вибірки від аномалій для задач data science |
| title_fullStr | Алгоритми очищення статистичної вибірки від аномалій для задач data science |
| title_full_unstemmed | Алгоритми очищення статистичної вибірки від аномалій для задач data science |
| title_short | Алгоритми очищення статистичної вибірки від аномалій для задач data science |
| title_sort | алгоритми очищення статистичної вибірки від аномалій для задач data science |
| topic | очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних |
| topic_facet | очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних anomaly removal anomaly detection noise removal statistical techniques data analysis big data data cleaning |
| url | https://journal.iasa.kpi.ua/article/view/260175 |
| work_keys_str_mv | AT pysarchukoleksii algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT barandanylo algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT mironovyurii algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT pysarchukillya algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT pysarchukoleksii algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT barandanylo algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT mironovyurii algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT pysarchukillya algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT pysarchukoleksii algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience AT barandanylo algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience AT mironovyurii algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience AT pysarchukillya algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience |