Алгоритми очищення статистичної вибірки від аномалій для задач data science

The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative...

Повний опис

Збережено в:

Бібліографічні деталі
Дата:	2023
Автори:	Pysarchuk, Oleksii, Baran, Danylo, Mironov, Yurii, Pysarchuk, Illya
Формат:	Стаття
Мова:	Англійська
Опубліковано:	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023
Теми:	очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних
Онлайн доступ:	https://journal.iasa.kpi.ua/article/view/260175
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:	System research and information technologies
Завантажити файл:

Репозитарії

System research and information technologies

_version_	1867334426393837568
author	Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya
author_facet	Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya
author_institution_txt_mv	[ { "author": "Oleksii Pysarchuk", "institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Danylo Baran", "institution": "Codeimpact B.V., Kyiv" }, { "author": "Yurii Mironov", "institution": "National Aviation University, Kyiv" }, { "author": "Illya Pysarchuk", "institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" } ]
author_sort	Pysarchuk, Oleksii
baseUrl_str	http://journal.iasa.kpi.ua/oai
collection	OJS
datestamp_date	2023-05-24T21:28:17Z
description	The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared.
doi_str_mv	10.20535/SRIT.2308-8893.2023.1.06
first_indexed	2025-07-17T10:27:54Z
format	Article
fulltext	 O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk, 2023 78 ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 UDC 004.5 DOI: 10.20535/SRIT.2308-8893.2023.1.06 ALGORITHMS OF STATISTICAL ANOMALIES CLEARING FOR DATA SCIENCE APPLICATIONS O. PYSARCHUK, D. BARAN, Yu. MIRONOV, I. PYSARCHUK Abstract. The paper considers the nature of input data used by Data Science algo- rithms of modern-day application domains. It then proposes three algorithms de- signed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algo- rithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are com- pared. Keywords: anomaly removal, anomaly detection, noise removal, statistical tech- niques, data analysis, big data, data cleaning. INTRODUCTION The Data Science techniques and approaches are widely used to solve modern day technical problems. One of frequent use-cases is implementing Back-End soft- ware components for distributed CRM and ERP with intellectual features. A pos- sible format of input data for such systems is Big Data arrays that could be inter- preted as numeric statistical samples. This kind of data is used in a variety of application domains, such as Commodity Trade and Risk Management systems, automotive applications like automated driver assistants, Unmanned Aerial Vehi- cle software, Computer Vision software responsible for raster-to-vector conver- sion et cetera [1–6]. In solutions from the aforementioned application domains, the underlying mathematical models are a priory known. Therefore, dataset processing is reduced to mapping data features to model properties. This allows to determine the trend line of a considered process both within observation range and beyond it (by means of interpolation and extrapolations – retrospective and perspective progno- ses) [2–4]. This paper suggests treating the process of analytical model configur- ing as an unsupervised learning activity, since any solution of Artificial Intelli- gence problem, in its first iteration, is implemented as a process of parameters configuration with respect to input dataset [2–4]. Moreover, such models could be associated with artificial intelligence technologies because of the aforementioned and because of their prognosis features. Statistical dataset processing is based on the hypothesis of accidental factors model that result in input dataset bias. Statistical data are usually obtained through experiments and sampling. Algorithms of statistical anomalies clearing for data science applications Системні дослідження та інформаційні технології, 2023, № 1 79 The first stage of step-by-step statistical analysis includes a preparatory stage — data research [2–4, 7]. Tasks of this stage include checking input Big Data array for anomalies. Any value that has a significant difference from the other dataset values could be treated as anomaly. Abnormal data is a result of various miscalculations and malfunctions during the data sampling. Two major types of anomalies are value skips and crude measurements. Both of these anomaly types, if not handled during next stages, will result in distorted results of statistical analysis. The distortions are represented with increased dispersion of scoring and/or with bias of scoring result [2–4, 7]. Possible ways to address the problem of distorted data are smoothing or evaluation. In order to reduce influence of anomalies on further data processing, it is necessary to detect abnormalities and to restore or remove abnormal values. These actions could be translated to a tuple of stages that will represent the entire process of clearing dataset from abnormal values [3, 4, 7–10]. Anomaly detection is based on their analyzing properties with respect to ab- solute values, trend dynamics and statistical properties change [3, 4, 7, 9–10]. Related works. A significant amount of known approaches to anomaly clearing has been considered [7, 8]. They possess a common flaw – it is necessary to manually tune parameters with respect to input dataset and anomaly properties. Moreover, the majority of them will result in NP problems when applied to Big Data array. This makes their application impossible. Therefore, the task of designing simple and efficient data clearing algorithms for Data Science purposes remains relevant. Goal. The goal of the paper is proposing precise, performant, efficient and relatively simple algorithms for clearing datasets from anomalies. Proposal. Experience of multiple IT proejcts related to Data Science allows to formulate practical requirements to algorithms responsible for anomaly detection and clearing. They include: 1. High efficiency of detection, integrity and adequacy of output, precise values of dispersion and standard deviation. 2. Relatively low computing complexity, therefore high performance with large datasets. 3. Automatic adjustment of parameters with respect to the input data properties, or minimal amount of manual settings; Three algorithms of anomaly detection and removal have been developed to address these requirements. Data preprocessing algorithm – the “sliding_wind” algorithm The main idea of the algorithm is application of smoothing within a trivial sliding window. Within the window, an arithmetic mean value is calculated. The size of the sliding window is calculated in accordance to the demand of quasi-linear trend that describes the change considered process withing the window. Sliding_wind algorithm stages: 1. Considering statistical sample nixX i ...1},{  as an input. 2. Formulating Nwin-dimensional sliding window. 3. Calculating an arithmetic mean: winNj ,...,1 . O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 80 4. Formulating a clean dataset nixX i ...1},ˆ{ˆ  by replacing elements of X sample with arithmetic mean jx̂ : jNwini xx ˆ , starting from the last entry within sliding window. 5. Shifting the sliding window to the next dataset dimension to the right – to 1 ii . 6. Repeating steps 2–5 within input statistical sample ni ...1 . 7. After processing the entire dataset, in order to adjust data within the first sliding window, a subset N=2Nwin is created. After that, steps 2–5 are applied to this subset, traversing it in reverse. The advantages of sliding_wind algorithm are its simplicity, minimum num- ber of manual settings (only window size), potential for effective usage on data- sets with fuzzy trend properties and anomalies and equal sizes of input and output. It could be possible to include nonlinear estimation model for jx̂ inside sliding_wind algorithm, but it is inadvisable due to major increase in complexity. This would especially affect Big Data inputs. Results of modeling and efficiency estimation of sliding_wind algorithm. For the modeling, a n = 10000 dataset has been considered. Quadratic trend and normal distribution are present in the dataset. Normal distribution has expected value of 0 and standard deviation X 5.0. Abnormal entries are evenly distributed within selection and constitute 10% of values. The model of trend and stochastic components is additive. Computations are conducted using Python implementation of algorithm [5, 6] using features of numpy and matplotlib libraries. Results of sliding_wind execution are provided in Fig. 1. Fig. 1, a represents a plot built from input dataset nixX i ,...,1,}{  with normal noise and anomalies, as well as the trend line obtained via least squares method [3, 4]. Standard deviation of input dataset with anomalies is X 6.64. Fig. 1, b represents a plot built from dataset cleared from anomalies by means of sliding_wind algorithm. Standard deviation of such a dataset is X 2.95. Com- paring plots 1, a and 1, b, as well as reduced value of standard deviation indicates efficiency of the sliding_wind algorithm. }{ ixX  ni ...1 i }{ ixX  ni ...1a b Fig. 1. Results of sliding_wind execution: a — input dataset, X 6.64; b — processed dataset, X 2.95 Algorithms of statistical anomalies clearing for data science applications Системні дослідження та інформаційні технології, 2023, № 1 81 Algorithm of dataset statistic properties control – the “medium” algorithm The main idea of the algorithm is detecting abnormal elements by determining etalon parameters of dataset (expected value, standard deviation) without anoma- lies and comparing them with initial statistical features. Detected anomalies are replaced with arithmetic mean. Etalon and initial statistical features are derived from datasets that formulate initial initial and current sliding windows respec- tively. Stages of medium algorithm: 1. Considering statistical sample nixX i ...,1},{  . 2. Within the first sliding window with size of Nwin dimensions, esti- mation of etalon expected value, standard deviation and dispersion is con- ducted:    Nwin j jj x Nwin x 1 etalon 1 ˆ ,      Nwin j jj xx Nwin D 1 2etalon j )ˆ( 1 1ˆ , jD̂ˆ etalon j  , Nwinj ...1 . 3. Formulating the next sliding window with size of Nwin dimensions is conducted. 4. Determining expected value, dispersion and standard deviation for sliding windows:    Nwin j jj x Nwin x 1 1 ˆ ,      Nwin j jjj xx Nwin D 1 2)ˆ( 1 1ˆ , jj D̂ˆ  , Nwinj ...1 . 5. If the following condition is true etalon jˆˆ  Qj , ( Q — weighting factor), then current j -th dimension, added to sliding window is considered to be abnormal. Abnormal j -th dimension is replaced with estimated expected value of cur- rent sliding window ji xx ˆ inside }{ ixX  dataset. If condition etalonˆˆ jj Q is false, then the }{ ixX  dataset is not modified. 6. Repeating steps 3–5 within statistical sample of ni ...1 . 7. The result of algorithm is dataset nixX i ...1},ˆ{ˆ  clear of anomalies. The main advantages of suggested algorithm are simplicity of implementation, statistical approach to determining abnormality of values, immutability of dataset when no anomalies detected, keeping the same size of dataset after processing. One peculiarity of the algorithm is that etalon dataset should contain no anomalies. Also, Q weighting factor is determined proportionally to the size of abnormality bias. Results of modeling and efficiency estimation of medium algorithm. Modeling conditions are the same as for sliding_wind algorithm. Research results are provided on Fig. 2, in equivalence to Fig. 1. O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 82 Comparing plots depicted at Fig. 2, a and 2, b, as well as standard deviation reduction ( X 4.45) tells of medium algorithm efficiency. Algorithm of control over dataset dynamic properties change — the LSM algorithm The idea of the algorithm is in detecting abnormal values by determining etalon parameters of trend dynamic change speed, and comparing them to current prop- erties using Least Squares Method. Etalon values of speed are calculated accord- ing to input dataset, and current values – according to sliding window selections. During comparison, the statistical estimation errors are respected. LSM algorithm steps: 1. Considering input dataset nixX i ...1},{  . 2. Using polynomial model LSM, the etalon speed is calculated within dataset: 2 etalon cSpeed  , with LSM polynomial ...)( 2 210  xcxccxT 3. Formulating sliding window with Nwin dimensions. 4. Determining current estimation of expected value, dispersion and standard error for the sliding windows:    Nwin j jj x Nwin x 1 1 ˆ ,      Nwin j jjj xx Nwin D 1 2)ˆ( 1 1ˆ , jj D̂ˆ  , Nwinj ...1 . 5. Determining controlled parameters for abnormality that are scaled up to size of the datasets with scores and respecting dimension errors: nSpeed  etalonInd_1 , NwinSpeedQ j  etalonˆInd_2 , ( Q — setting weighting factor). 6. If condition is true Ind_1Ind_2  , then current j -th dimension added to sliding window is treated as anomaly. Abnormal j -th dimension is replaced with LSM score )ˆ( ji xxTx  inside dataset }{ ixX  . If condition Ind_1Ind_2  is false, }{ ixX  dataset remains immutable. ixX }{ ni ...1 }{ ixX  ni ...1 a b Fig. 2. Results of medium algorithm execution: a — input dataset X 6.64; b — rocessed dataset X 4.45 Algorithms of statistical anomalies clearing for data science applications Системні дослідження та інформаційні технології, 2023, № 1 83 6. Repeating steps 3–5 within input dataset ni ...1 . 7. The result of algorithm is dataset nixX i ...1},ˆ{ˆ  clear of anomalies. The main advantages of suggested algorithm are simplicity of implementa- tion, dynamic criteria of abnormality, modification of abnormal elements only, same size of input and result datasets. One peculiarity of algorithm is that it does not require parameter control when determining etalon values; the weighting factor Q is determined propor- tionally to the size of abnormality bias. Results of modeling and efficiency estimation of LSM algorithm. Model- ing conditions are the same as for sliding_wind algorithm. Results are depicted at Fig. 3. Equivalent to Fig. 1. Comparison of plots depicted at Fig. 3, a and 3, b, as well decreasing stan- dard deviation ( X 4.7) illustrates algorithm efficienccy. Generalized statistical properties and reduction of bias with three proposed algorithms are depicted at table. Generalized statistical properties of three proposed algorithms Raw data with abnormalities sliding_wind medium MNK X 6.64 X 2.95 X 4.45 X 4.71 Data from table demonstrate that sliding_wind algorithm is the most suc- cessful precision-wise — it has the least standard deviation. This algorithm is also the most productive and has no need for configuration. However, it may be unapplicable to dynamic strongly nonlinear processes — its usage may result in biases. Medium and MNK algorithms are the opposites to sliding_wind. In order to choose specific algorithm, dataset properties and anomaly features (such as margin of error or the nature of the trend model) have to be considered. Comparing proposed methods with more sophisticated alternatives gave no results, since using data of equivalent complexity (N=10000) yielded no results from these alternatives in sensible time. Conclusions. The proposed algorithms are not expensive to implement, they do not require major manual tuning and show acceptable precision and performance for Big Data arrays processing. { }{ ixX  ni ...1 }{ ixX  ni ...1a b Fig. 3. Results on MNK algorithm execution: a — raw dataset X 6.64; b — processed dataset X 4.71 O. Pysarchuk, D. Baran, Yu. Mironov, I. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 1 84 REFERENCES 1. F. Provost and T. Fawcett, Data Science for Business. USA: O’Reilly Media, Inc, 2013, 409 p. 2. D. Dietrich, Data Science & Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. Indianapolis, Indiana, USA: John Wiley & Sons, Inc, 2015, 420 p. doi: 10.1002/9781119183686.ch1. 3. O. Pysarchuk and V. Kharchenko, Nonlinear multi-criteria process modeling in traf- fic management systems, (in Ukrainian). Kyiv: Institute of Gifted Child, 2015, 248 p. 4. S. Kovbasiuk, O. Pysarchuk, and M. Rakushev, Least Squares Method and its practical applications, (in Ukrainian). Zhytomyr: Zhytomyr Military Institute, 2008, 228 p. 5. S. Raschka, Y. Liu, and V. Mirjalili, Machine Learning with PyTorch and Scikit- Learn: Develop machine learning and deep learning models with Python. Birming- ham: Packt, 2022. 6. P. Joshi, Artificial Intelligence with Python. Birmingham: Packt, 2017. 7. G. Kishan, K. Chilukuri, and H. HuaMing, Anomaly Detection Principles and Algo- rithms. Switzerland, Springer, 2017, 229 p. doi: 10.1007/978-3-319-67526-8. 8. O. Pysarchuk and Y. Mironov, Chromosome Feature Extraction and Ideogram- Powered Chromosome Categorization. Switzerland, Springer, 2022. doi: 10.1007/978-3-031-04812-8_36. 9. H. Blomquist and J. Möller, Anomaly detection with Machine learning. Quality as- surance of statistical data in the Aid community. Uppsala: Uppsala University, 2015, 60 p. 10. S. Thudumu, P. Branch, J. Jin, and J. Singh, A comprehensive survey of anomaly de- tection techniques for high dimensional big data. Switzerland, Springer, 2017, 30 p. doi: 10.1186/s40537-020-00320-x. Received 10.08.2022 INFORMATION THE ARTICLE Oleksii O. Pysarchuk, ORCID: 0000-0001-5271-0248, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: PlatinumPA2212@gmail.com Danylo R. Baran, ORCID: 0000-0002-3251-8897, “Codeimpact B.V”, Ukraine, e-mail: danil.baran15@gmail.com Yurii G. Mironov, ORCID: 0000-0002-2291-5864, National Aviation University, Ukraine, e-mail: yuriymironov96@gmail.com Illya O. Pysarchuk, ORCID: 0000-0003-4343-0142, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: flimka134@gmail.com АЛГОРИТМИ ОЧИЩЕННЯ СТАТИСТИЧНОЇ ВИБІРКИ ВІД АНОМАЛІЙ ДЛЯ ЗАДАЧ DATA SCIENCE / О.О. Писарчук, Д.Р. Баран, Ю.Г. Міронов, І.О. Писарчук Анотація. Розглянуто природу даних, що використовуються в задачах сучас- них прикладних областей. Запропоновано декілька алгоритмів очищення ста- тистичної вибірки від аномалій в конвеєрі задач Data Science. Відзнакою та пе- ревагою запропонованих алгоритмів є їх відносна простота та обмежена кількість параметрів налаштувань, що визначаються за технологіями навчання відповідно до властивостей вхідних статистичних даних. Запропоновані алго- ритми є достатньо гнучкими у використанні і не залежать від природи та по- ходження даних. Результати модельного експерименту запропонованих підхо- дів у вигляді скриптів мовою Python та базових бібліотек довели їх ефективність. Результати проілюстровано графіками, побудованими з викори- станням початкових даних та даних, що змінені за допомогою запропонованих алгоритмів. Застосування алгоритмів проаналізовано та порівняно результати виконання алгоритмів. Ключові слова: очищення від аномалій, виявлення аномалій, видалення шу- му, статистичні методи, аналіз даних, великі дані, очищення даних.
id	journaliasakpiua-article-260175
institution	System research and information technologies
keywords_txt_mv	keywords
language	English
last_indexed	2025-07-17T10:27:54Z
publishDate	2023
publisher	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format	ojs
resource_txt_mv	journaliasakpiua/41/d7419517dc1acdb2265da9bdca60b841.pdf
spelling	journaliasakpiua-article-2601752023-05-24T21:28:17Z Algorithms of statistical anomalies clearing for data science applications АЛГОРИТМЫ ОЧИЩЕНИЯ СТАТИСТИЧЕСКОЙ ВЫБОРКИ ОТ АНОМАЛИЙ ДЛЯ ЗАДАЧ DATA SCIENCE Алгоритми очищення статистичної вибірки від аномалій для задач data science Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних anomaly removal anomaly detection noise removal statistical techniques data analysis big data data cleaning The paper considers the nature of input data used by Data Science algorithms of modern-day application domains. It then proposes three algorithms designed to remove statistical anomalies from datasets as a part of the Data Science pipeline. The main advantages of given algorithms are their relative simplicity and a small number of configurable parameters. Parameters are determined by machine learning with respect to the properties of input data. These algorithms are flexible and have no strict dependency on the nature and origin of data. The efficiency of the proposed approaches is verified with a modeling experiment conducted using algorithms implemented in Python. The results are illustrated with plots built using raw and processed datasets. The algorithms application is analyzed, and results are compared. Розглянуто природу даних, що використовуються в задачах сучасних прикладних областей. Запропоновано декілька алгоритмів очищення статистичної вибірки від аномалій в конвеєрі задач Data Science. Відзнакою та перевагою запропонованих алгоритмів є їх відносна простота та обмежена кількість параметрів налаштувань, що визначаються за технологіями навчання відповідно до властивостей вхідних статистичних даних. Запропоновані алгоритми є достатньо гнучкими у використанні і не залежать від природи та походження даних. Результати модельного експерименту запропонованих підходів у вигляді скриптів мовою Python та базових бібліотек довели їх ефективність. Результати проілюстровано графіками, побудованими з використанням початкових даних та даних, що змінені за допомогою запропонованих алгоритмів. Застосування алгоритмів проаналізовано та порівняно результати виконання алгоритмів. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023-03-30 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/260175 10.20535/SRIT.2308-8893.2023.1.06 System research and information technologies; No. 1 (2023); 78-84 Системные исследования и информационные технологии; № 1 (2023); 78-84 Системні дослідження та інформаційні технології; № 1 (2023); 78-84 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/260175/274360
spellingShingle	очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних Pysarchuk, Oleksii Baran, Danylo Mironov, Yurii Pysarchuk, Illya Алгоритми очищення статистичної вибірки від аномалій для задач data science
title	Алгоритми очищення статистичної вибірки від аномалій для задач data science
title_alt	Algorithms of statistical anomalies clearing for data science applications АЛГОРИТМЫ ОЧИЩЕНИЯ СТАТИСТИЧЕСКОЙ ВЫБОРКИ ОТ АНОМАЛИЙ ДЛЯ ЗАДАЧ DATA SCIENCE
title_full	Алгоритми очищення статистичної вибірки від аномалій для задач data science
title_fullStr	Алгоритми очищення статистичної вибірки від аномалій для задач data science
title_full_unstemmed	Алгоритми очищення статистичної вибірки від аномалій для задач data science
title_short	Алгоритми очищення статистичної вибірки від аномалій для задач data science
title_sort	алгоритми очищення статистичної вибірки від аномалій для задач data science
topic	очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних
topic_facet	очищення від аномалій виявлення аномалій видалення шуму статистичні методи аналіз даних великі дані очищення даних anomaly removal anomaly detection noise removal statistical techniques data analysis big data data cleaning
url	https://journal.iasa.kpi.ua/article/view/260175
work_keys_str_mv	AT pysarchukoleksii algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT barandanylo algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT mironovyurii algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT pysarchukillya algorithmsofstatisticalanomaliesclearingfordatascienceapplications AT pysarchukoleksii algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT barandanylo algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT mironovyurii algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT pysarchukillya algoritmyočiŝeniâstatističeskojvyborkiotanomalijdlâzadačdatascience AT pysarchukoleksii algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience AT barandanylo algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience AT mironovyurii algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience AT pysarchukillya algoritmiočiŝennâstatističnoívibírkivídanomalíjdlâzadačdatascience

Алгоритми очищення статистичної вибірки від аномалій для задач data science

Репозитарії

Схожі ресурси