Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем

The paper considers discovering new and potentially useful information from large amounts of data that actualizes the role of developing data mining tools for complex socio-economic processes and systems based on the principles of the digital economy and their processing using network applications....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2022
1. Verfasser: Obelets, Tetyana
Format: Artikel
Sprache:Englisch
Veröffentlicht: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022
Schlagworte:
Online Zugang:https://journal.iasa.kpi.ua/article/view/256771
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Institution

System research and information technologies
_version_ 1866302813324181504
author Obelets, Tetyana
author_facet Obelets, Tetyana
author_sort Obelets, Tetyana
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2023-05-21T20:04:38Z
description The paper considers discovering new and potentially useful information from large amounts of data that actualizes the role of developing data mining tools for complex socio-economic processes and systems based on the principles of the digital economy and their processing using network applications. The stages of data mining for complex socio-economic processes and systems were outlined. The algorithm of data mining was considered. It is determined that the previously used stages of data mining, which were limited to the model-building process, can be extended through the use of more powerful computer technology and the emergence of free access to large amounts of multidimensional data. The available stages of data mining for complex socio-economic processes and systems include the processes of facilitating data preparation, evaluation, and visualization of models, as well as in-depth learning. The data mining tools for complex socio-economic processes and systems in the context of technological progress and following the big data paradigm were identified. The data processing cycle has been investigated; this process consists of a series of steps starting with the input of raw data and ending with the output of useful information. The knowledge obtained at the data processing stage is the basis for creating models of complex socio-economic processes and systems. Two types of models (descriptive and predictive) that could be created in the data mining process were outlined. Algorithms for estimating and analyzing data for modeling complex socio-economic processes and systems in accordance with the pre-set task were determined. The efficiency of introducing neural networks and deep learning methods used in data mining was analyzed. It was determined that they would allow effective analysis and use of the existing large data sets for operational human resources management and strategic planning of complex socio-economic processes and systems.
doi_str_mv 10.20535/SRIT.2308-8893.2022.4.06
first_indexed 2025-07-17T10:27:48Z
format Article
fulltext  T.V. Obelets, 2022 68 ISSN 1681–6048 System Research & Information Technologies, 2022, № 4 UDC 330.46:004.89 DOI: 10.20535/SRIT.2308-8893.2022.4.06 DATA MINING TOOLS FOR COMPLEX SOCIO-ECONOMIC PROCESSES AND SYSTEMS T.V. OBELETS Abstract. The paper considers discovering new and potentially useful information from large amounts of data that actualizes the role of developing data mining tools for complex socio-economic processes and systems based on the principles of the digital economy and their processing using network applications. The stages of data mining for complex socio-economic processes and systems were outlined. The algo- rithm of data mining was considered. It is determined that the previously used stages of data mining, which were limited to the model-building process, can be extended through the use of more powerful computer technology and the emergence of free access to large amounts of multidimensional data. The available stages of data mining for complex socio-economic processes and systems include the processes of facilitating data preparation, evaluation, and visualization of models, as well as in- depth learning. The data mining tools for complex socio-economic processes and systems in the context of technological progress and following the big data paradigm were identified. The data processing cycle has been investigated; this process con- sists of a series of steps starting with the input of raw data and ending with the out- put of useful information. The knowledge obtained at the data processing stage is the basis for creating models of complex socio-economic processes and systems. Two types of models (descriptive and predictive) that could be created in the data mining process were outlined. Algorithms for estimating and analyzing data for modeling complex socio-economic processes and systems in accordance with the pre-set task were determined. The efficiency of introducing neural networks and deep learning methods used in data mining was analyzed. It was determined that they would allow effective analysis and use of the existing large data sets for operational human re- sources management and strategic planning of complex socio-economic processes and systems. Keywords: data mining, complex socio-economic systems, predictive modeling, neural networks, deep learning. INTRODUCTION The continuous scaling of data on the Internet is changing the way we interact in economic and social systems. Many users search, publish, and create new data daily, leaving a digital footprint that can help describe their behavior, decisions, and intentions. This highlights the role of developing data mining tools for com- plex socio-economic processes and systems based on the principles of the digital economy and their processing using network applications. Analysis of recent research and publications. The most significant results in statistical analysis and applications for data mining were achieved in the works of R. Nisbet, H. Miner, O. Maimon, L. Rokach. Such scientists as H. Jiawei, M. Kamber, P. Jiang, H. Choi, H. Varian, and others devoted their research to the creation of concepts and development of data mining techniques. H. Xiong, H. Pandei, A. Kryzhevsky, I. Sutskever and Jeffrey E. Hinton explored machine- learning capabilities with deep convolution neural networks. Successful results in Data mining tools for complex socio-economic processes and systems Системні дослідження та інформаційні технології, 2022, № 4 69 the field of artificial intelligence with deep learning were obtained in the works of J. Lekun, Y. Bengio, G. Hinton, J. Schmidhuber, and Ukrainian authors M. Lavrenyuk, N. Kusul, O. Novikov. The analysis of complex socio-economic systems was carried out by foreign authors in their works P. dos Santos, N. Wiener, J. Stefanovsky, and the issues of forecasting socio-economic proc- esses were in the interests of Ukrainian scientists G. Prisenko and E. Ravikovych and others. At the same time, the progress of computing power and the availability of large amounts of multidimensional data make it necessary to develop data mining tools for complex socio-economic processes and systems. The purpose of the article is to study the process of identifying new and potentially useful information from large amounts of data, outlining the stages of data mining for complex socio-economic processes and systems, and identifying appropriate tools in the context of the progress of computing power and the emergence of a large number of multi-dimensional data in the free-of-use. Presentation of the main material of the study. As data mining has evolved as a professional activity, it is necessary to distinguish it from previous statistical modeling activities and broader knowledge discovery activities. Data mining is defined as the use of machine learning algorithms to find weak patterns of relationship between data elements in large and disordered datasets, which can lead to actions to increase benefits in one form or another (diagnostics, profit, prediction, management, etc.) [1]. Data mining is also called knowledge discovery in databases (Knowledge Discovery in Databases — KDD), i.e. the process of discovering new and poten- tially useful information from large amounts of data. The definition of data mining was initially limited to the modeling process, but over time the data analysis tools have included processes to facilitate data preparation, as well as evaluation and visualization of models [2] (Fig. 1). The process of identifying knowledge in databases combines the mathematics used to identify patterns in the data with the whole process of data selection and the use of models to apply to other datasets and use the information for prede- termined purposes. This process combines the development of business systems, statistical methods, and digital technologies to identify the structure of socio- economic processes and systems (relationships, patterns, associations, and basic functions), not just their statistical parameters (averages, weights, etc.) [1]. Fig. 1. Algorithm of data analytics and tools of data mining T.V. Obelets ISSN 1681–6048 System Research & Information Technologies, 2022, № 4 70 The data mining algorithm begins with the definition of objectives in the data matrix design process and ends with the introduction of the identified knowledge. At the stage of designing the data matrix, preparatory work is carried out, goals are defined and strategic ideas are formed, for the achievement of which the process of knowledge discovery in databases begins. Understanding the strategic goal, a clear understanding of the end-user of the data, and understanding the environment in which the data will be disseminated, is a prerequisites for an adequate process of datamining. Data mining uses techniques from different fields of knowledge, such as sta- tistics, machine learning, pattern recognition, database and storage systems, in- formation search, visualization, algorithms, high-performance computing, and many application areas. Statistics examines the collection, analysis, interpretation or explanation, and presentation of data. Statistical models are widely used to model data and data classes. Multivariate graphical methods are used to research. analyze databases and present the results of data analysis [3]. Identification of data sources and searching for documents or information in documents is an important step in data mining. Documents can be text-based or multimedia can be on paper in archives and can be available electronically on the Internet. The main source of information for data mining for complex socio- economic processes and systems today is the Internet. Thus, Google Trends (GT) is an online tool that reports on the volume of search queries for a particular key- word or text. The use of GT data for the current forecast of social and economic variables was introduced in 2009 [4]. Social networking sites and blogs are spe- cifically designed to encourage users to express their feelings and opinions, which can potentially be used to predict social variables. Websites and programs (trans- actional platforms, opinion platforms, and dissemination of information) created on the Internet by enterprises, public organizations, charitable foundations, or multinational corporations inform about their products, services, organizational structure, and intentions. In addition to providing information, websites are used for transactions, e-commerce, and online services. According to the big data para- digm, this wide variety of sources requires specific tools for processing them. The selection, preparation and processing of data based on which the intel- lectual analysis will be carried out is a stage of creating opportunities. The data to be used to identify knowledge must meet the following requirements: the available data must be, firstly, reliable, secondly, up-to-date, thirdly, sufficient to present the information as fully as possible, and fourthly, optimally necessary so as not to overload research database systems and integrate all selected data into one set. The minimum set of available data, if necessary, can be extended with additional necessary data to identify nuances that will be taken into account when creating a model. Sometimes the presence of such minor accents can be funda- mental to the success of the knowledge discovery process in databases. A large number of nuances provides more opportunities to create a multidimensional model that will allow the most complete consideration of the studied phenomena and perform intellectual analysis. However, storing, organizing, and managing large and complex databases requires large resources, which are planned in ad- vance and often limited. Research on database systems and data stores focuses on creating, maintaining, and using databases for organizations and end-users. Database systems are known for their high scalability in processing very large, relatively structured datasets. Choosing the optimal dataset should balance the requirements Data mining tools for complex socio-economic processes and systems Системні дослідження та інформаційні технології, 2022, № 4 71 of sufficiency and necessity, so this stage of data mining creates the foundation for opportunities. In addition, the choice of data should be guided by their validity and relevance. Preparing data for further processing increases the validity of the data set. Preparation includes sorting and filtering data that will eventually be used as input data. This involves cleaning up the data by removing missing values and informational noise. Noise removal increases the chances of performing data mining most efficiently. Removing objects with noise is an important goal of data cleansing, as noise interferes with most types of data analysis. Most existing data cleaning methods focus on noise removal, which is the product of low-level data errors that are the result of an imperfect data collection process, but irrelevant data objects or of little relevance, can also significantly impede data analysis. Thus, if the goal is to improve data analysis as much as possible, these objects should also be considered noise, at least concerning to the main analysis. Therefore, there is a need for data cleansing techniques that eliminate both types of noise. Because datasets can contain a lot of noise, these methods should also be able to discard potentially much of the data [5]. Data processing is the process of converting raw data into useful information through electronic data processing, machining, or automated means. Data processing can take time depending on the complexity of the data and the amount of input data. The preparation step described above helps to make this process faster. Data processing is usually performed step by step: raw data is collected, filtered, sorted, analyzed, stored, and then provided in an accessible format, such as graphs, charts, and documents (Fig. 2). The data processing cycle consists of a series of steps in which raw data (in- put data) enters the process to obtain use- ful information (output). Each step is performed in a certain order, but the whole process can be repeated cyclically. The output of the first data processing cycle can be stored and presented as in- put for the next cycle. If the data ob- tained in the processing process is not used as input data for the next processing cycle, this complete process cannot be considered a cycle and will remain a one- time activity for data processing and in- formation obtaining. Information is processed and analyzed databases. The information obtained at this stage can be useful and become the basis for the formation of knowledge. This is how the data processing phase discovers knowledge in databases. In the next stages, knowledge of social processes is identified, economic phenomena should be formed and presented in such a way as to create a model of complex socio- economic processes and systems. At the stage of data discovery and recognition of useful information and further knowledge of databases, mathematical, statisti- cal methods, methods of artificial intelligence and machine learning are used. The evolution of the Internet and social media has led to a huge explosion in the volume and complexity of data, so-called big data. Thus, data mining has also Fig. 2. Data processing cycle T.V. Obelets ISSN 1681–6048 System Research & Information Technologies, 2022, № 4 72 gone beyond traditional data modeling, such as regression and statistical models. Information theory offers tools to make formal conclusions about complex mod- els of economic and social interaction. There are two main theoretical concepts of information that can help guide the observation of the relationship between the economic characteristics of a large number of people: entropy and mutual infor- mation. The concepts of entropy and mutual information make it possible to de- velop non-parametric characteristics of information associations present in the observed data generated by economic and broader social interactions [6]. Creating a model is a stage of data mining that requires special responsibility and diligence. Many algorithms can be used to model complex socio-economic processes and systems in accordance with a predetermined task. Different socio- economic processes and systems have different and complex causal relationships, so it is very important to determine the tactics of finding such connections and choose the optimal algorithm. This step involves choosing a specific method that will be used to find templates and data that most accurately describe the process or system under study. Currently, many algorithms are known to solve data mining problems: the method of reference vectors, the method of k-nearest neighbors, neural networks and decision trees. When choosing a specific method, it is necessary to take into account the strategic goal of data mining and, accordingly, to determine the priority characteris- tics of the model that will be created at this stage. On the one hand, we can con- sider such a characteristic as the accuracy of the model, and on the other – clarity and simplicity of perception. To create a simpler model that should be intuitive, the best choice may be to use the decision tree method, which is one of the most popular methods of solving classification and forecasting problems This is a way of demonstrating rules in a hierarchical, sequential structure, where each object corresponds to a single node that provides the solution. The decision tree method should be used in cases where symbolic representation and good classification are required; the problem does not depend on many attributes; a modest subset of attributes contains relevant information; linear combinations of features are not critical; important learning speed [7]. To create an accurate model, it is appropriate to use neural networks - an ex- tremely powerful method of modeling, which allows you to reproduce extremely complex relationships. For many years, linear modeling has been the main method of modeling in most areas, as it has well-developed optimization procedures. In problems where the linear approximation is unsatisfactory (and there are many of them), linear models work poorly. In addition, neural networks cope with the “curse of dimensionality”, which does not allow you to simulate linear relation- ships in the case of a large number of variables. The advantages of the neural network method are the following [8]: – nonlinearity, neural networks are nonlinear; – through controlled learning the network learns according to the examples: after receiving the primary information from the operator, the learning algorithm is started, which automatically perceives the data structure – adaptability, i.e. the network can adapt its synaptic scales even in real time, – response capability – in the context of template classification, the network not only provides template selection but also reliability of decision-making, – fault tolerance due to massive interconnections, Data mining tools for complex socio-economic processes and systems Системні дослідження та інформаційні технології, 2022, № 4 73 – integrated large scale, i.e. its parallelism makes it potentially faster for cer- tain tasks and thus captures complex patterns of behavior; – homogeneity in analysis and design, i.e. the same notation is used in all in all areas related to neural networks, – the analogy of neurobiology [9], in general, neural networks are self- adap- tive and nonlinear methods that collect data and do not require specific assump- tions about the basic model. For each strategy of data mining and modeling in this process, there are several possible methods by which you can achieve your goals. The choice of a particular method is explained by the efficiency of the algorithm in a particular problem. Thus, at this stage of data mining, the most acceptable method of modeling is selected in accordance with the conditions. All these algorithms study the data and create models that are closest to the characteristics of the studied data of complex socio-economic systems and processes. Models created during data mining can be of two types: predictive or descriptive (Fig. 3). The predictive model is a projection based on the data and information ob- tained in the earlier stages of data mining. As a rule, the forecast model is created on the basis of the directed analysis of data; that is, a top-down approach, where mappings from a vector input to a scalar output are obtained by applying a spe- cific one. For example, predictive modeling can be performed using a variety of historical and statistical data. When creating a prognostic model in the process of data mining, the following tasks are performed: regression, time series analysis, classification, and forecasting [10]. The prognostic model is also known as a statistical regression. It is a moni- toring method that involves explaining the relationship of several attribute values among themselves in similar elements and predicting the development of the model, that is, directional modeling based on these observations. As noted earlier, the two common methods of predictive modeling available in many data mining tools are neural networks and decision trees. The descriptive model presents in a concise form the main characteristics of the data set. In essence, it is a collection of data points that allows you to study important aspects of a data set. As a rule, the descriptive model is created by indi- rect data analysis; that is, a bottom-up approach where the data “speaks for itself”. Undirected data analysis finds patterns in the data set, but the patterns are inter- Fig. 3. Types of models, created in the process of data mining T.V. Obelets ISSN 1681–6048 System Research & Information Technologies, 2022, № 4 74 preted by analysts. Data mining specialists determine the usability of the found templates. The most characteristic tasks of descriptive modeling are the following:  clustering, i.e. decomposing or splitting a data set into groups;  generalization as the process of providing summary information from data in an easier to understand form;  association of rules – identification of causal relationships between different features in large data sets;  sequence detection, which involves the identification of patterns of interest to researchers in the data. Descriptive models and predictive models can (and often should) be used to- gether in data mining. For example, it seems logical and appropriate to first look for patterns in the data using non-directional methods. These descriptive models can offer segments of data sets and ideas that improve the results of directional modeling when creating predictive models. The modular design of neural network architecture facilitates the creation of models that simultaneously process data presented in different formats, such as creating text annotations from images, synthesizing language from the text, or through translation. This allows you to solve problems that go beyond traditional classification and regression, and is especially convenient when the data comes from different sources, which is often the case when working with big data. In addition, data obtained from different repositories or databases, presented in the form of object maps, can be reused in other contexts and, if necessary, further configured/ taught. Evaluation and analysis of data. The purpose of any predictive modeling is to apply the model to new data. Forecasting models are useful only insofar as the quality of their prediction is adequate, therefore, the principle is not the process of creating a model as such, but the creation of a high quality model. Both predictive models and descriptive models have their evaluation criteria. For forecast models, the evaluation criterion is the accuracy of the forecast, measured by the size of the forecast error, i.e. the difference between the forecast and the actual value of the studied indicator. For descriptive models, it is more difficult to define obvious evaluation criteria, but they usually capture a discrepancy between the observed data and the proposed model. Thus, at this stage of data mining, different strate- gies for assessing the quality of models can be used. Parametric methods for analyzing the accuracy of forecasts. According to the results of the ex-post-forecast, such indicators of forecast accuracy for m steps as the root mean square error are calculated, the root of standard error, mean absolute error, root of root mean square error in percentage, mean absolute rela- tive error in percentage (MARE). The smaller the value of these values, the higher the quality of the forecast. In practice, these characteristics are used quite often. This approach gives good results, if in the period of the retro forecast there are no fundamentally new patterns. To create a prognostic model of complex socio- economic systems and processes, each time the forecast is built in a new situation, therefore, the comparison of the numerical accuracy of forecasts made at different points in time is not entirely correct. These considerations led to the use of non- parametric methods of analysis of the accuracy of forecasts [11]. Non-parametric methods of forecast accuracy analysis have two types of non-parametric criteria: label criterion and rank criterion. The criterion of labels for comparing the accuracy of two sequences of predictions is based on the per- centage of cases when the method of determining the prediction A is better than Data mining tools for complex socio-economic processes and systems Системні дослідження та інформаційні технології, 2022, № 4 75 the method B. Such a comparison is made for individual predictions of the same events (variables). If the ranks of their criteria are applied, the numerical charac- teristic of accuracy (absolute error when estimating one forecast, or root mean square error when considering a sequence of predictions) is replaced by ranks, which are then checked for significance. For example, if the sequences of predictions of indicators A and B are obtained using k methods, then first calculate the root mean square error, then the values are ranked from smallest to largest. Although non-parametric methods have their advantages, it is important to realize that they ignore some of the available information. Thus, the criteria of labels and ranks do not take into account the numerical values of errors [11]. Data visualization. Created models used in the process of data mining of complex socio-economic systems and processes including large and complex pa- rameters. To solve the problem of size and complexity, the best methods are used to represent complex systems and data visualization (for example, advanced user interfaces). These technologies increase the level of abstraction, which helps users focus on the most important components and properties of complex models. In the world of big data, data visualization tools and technologies are needed to analyze large and complex amounts of information and make decisions based on the intellectual analysis of this data. Data visualization is a graphical represen- tation of information and data. Using visual elements such as charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, deviations, and patterns in data. Effective data visualization is a delicate balance between form and function. On the one hand, the simplest schedule can be both a very primitive transfer of information and a vision of the main core of information that is analyzed and must be presented and understood. On the other hand, the most complex visualization can overload information, and say about many details, but not convey the main essence of the message. Data and visual elements must work together to create a better understanding and awareness of information. There is a choice of visualization methods for efficient and interesting data presentation. Common types of data visualization: charts, tables, graphs, maps, infographics, dashboards, etc. Construction and illustration of relationships between different objects of the created models of complex socio-economic systems and processes can be done with the help of modern tools. Therefore, Draw.io is a free, intuitive browser- based flowchart builder where users can drag object shapes (including ellipses and parallelograms common to data models) onto a canvas, and then combine them into by means of through connecting lines. Lucidchart Chart Designer is similar to Draw.io, but it reproduces streams that are more complex and has more reliable data protection. SQuirreL is a free and open-source graphical tool sup- ported by most major relational databases. The most important trend in the field of data in recent years is the prolifera- tion of data catalogs, largely due to privacy rules such as the GDPR and the CCPA (General Data Protection Regulation of 2016; California Consumer Pri- vacy Act of 2018) [12]. This trend has not escaped the field of data mining and modeling. The line between data discovery tools and applications and data model- ing tools is increasingly blurred, as exemplified by Amundsen, a metadata-based data discovery platform developed by Lyft [13]. Open source Metabase is a GUI tool with some useful analytics visualiza- tions but does not support modeling tools [14]. Other notable data visualization tools include erwin, ER / Studio, SAP PowerDesigner, IBM InfoSphere Data Ar- chitect, and Microsoft SQL Server Management Studio, etc. T.V. Obelets ISSN 1681–6048 System Research & Information Technologies, 2022, № 4 76 Application of deep learning. The knowledge and models of processes cre- ated in the socio-economic systems created in the process of data mining will be popular only if they can be included in other complex systems in order to predict their development. Forecasting analytics is interesting and useful in the context of the possibility of making changes to the simulated system and presenting the long-term consequences of these changes. The real structure of complex socio- economic systems is dynamic, data characteristics may change over time, new parameters may appear that were not foreseen in the model, and others may disappear. Therefore, this stage of application of deep learning determines the success and effectiveness of the whole process of data mining. Progress in the field of deep learning has made it possible to use the power- ful capabilities of artificial neural networks in this process. They are a universal tool for data mining and effective for learning based on data presented in various formats. For example, neural networks have demonstrated their effectiveness in performing certain tasks of object or image recognition [15–16]. Recent ap- proaches have shown that deep learning can effectively learn based on data repre- sentations in variable length sequences (time series, sound, language, and text), graphs and networks, including social networks, natural language and even source code in computer programs [17–18]. Another aspect of deep neural networks that is closely related to big data is their ability to perform complex functional design. The problem of big data is of- ten related to the difficulty of making reliable predictions when training data needs to be represented, for example, to identify successfully relevant classes of solutions. An important requirement for the use of deep learning methods is the availability of large samples of training, as insufficient training data causes the problem of “overfitting” when the model does not summarize the information ob- tained during training, but simply remembers it. In this case, the model shows good results on educational data but does not show such accuracy on unfamiliar data [19]. Previously, the search for useful representations had to be conducted by ex- perts using manual design of characteristics or explicit methods of selection and construction of functions [20]. At the present stage of technology development, the most suitable architecture for data processing, which characterizes complex socio-economic processes and systems, and as a consequence to solve the prob- lem of data mining are convolutional neural networks, because they are designed to process data in the form of multidimensional arrays [21]. Some neural models on the ethane of deep learning allow you to synthesize features in the form of hidden variables with certain desired properties. Neural networks help automate the tasks set at the beginning of the data mining process complex socio-economic systems: the construction of the characteristics of these systems becomes an integral part of the process of deep learning, closely related to the search in space for new hypotheses for the development of socio-economic processes. CONCLUSIONS Thus, the study of the data mining process showed that the expansion of the data analysis tools in connection with the powerful development of technologies, the formation of big data sets creates the ability to track, evaluate, simulate, and ulti- mately include key economic and social changes and trends in complex processes and systems. An important step that has increased the efficiency of data mining has been the inclusion of steps to facilitate data production as well as model evaluation and visualization. Data mining tools for complex socio-economic processes and systems Системні дослідження та інформаційні технології, 2022, № 4 77 The descriptive and predictive models generated by the mining process can and should be used together. The logical sequence of the model application, which will improve the results of the directional modeling, is seen primarily in the search for patterns in the data with the help of descriptive models, and already based on the obtained ideas of directional modeling when creating predictive models of complex socio-economic processes and systems. At the present stage of technology development, machine learning is widely used in data mining to invent complex models and algorithms that serve to create descriptive and predictive models of complex socio-economic systems and processes. Machine learning gives computers the ability to “learn”, recognize complex patterns and make intelligent decisions without explicit programming based on large data samples. These opportunities are the basic application of deep learning methods, designed to process data presented in the form of multidimensional arrays, and allow you to create models of complex socio- economic processes and take into account possible changes to design and manage the development of complex systems. That is, the use of the above tools allows you to perform successfully and efficiently the tasks of data mining of complex socio-economic processes and systems. Therefore, digital tools are becoming relevant to maintain effective competi- tiveness, help model complex socio-economic processes and systems, effectively analyze and use existing large data sets for operational human resource manage- ment and strategic planning of complex socio-economic processes and systems. REFERENCES 1. R.R. Nisbet, G. Miner, and K. Yale, “Chapter 2 – Theoretical Considerations for Data Mining,” Editor(s): Robert Nisbet, Gary Miner, Ken Yale, Handbook of Statistical Analysis and Data Mining Applications (Second Edition), Academic Press, 2018, pp. 21–37. Available: https://doi.org/10.1016/B978-0-12-416632-5.00002-5 2. O. Maimon and L. Rokach, Data Mining and Knowledge Discovery Handbook, 2nd ed. Springer, January 2010, 1285 р. Available: https://doi.org/10.1007/978-0-387-09823-4 3. H. Jiawei, M.Kamber, and J. Pei, Data mining: concepts and techniques, 3rd ed. Morgan Kaufmann Publishers, 2012, pp. 23–27. 4. H. Choi and H. Varian, Predicting the Present with Google Trends, 2009. Available: http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en //googleblogs/pdfs/google_predicting_the_present.pdf 5. H. Xiong, G. Pandey, M. Steinbach, and V. Kumar, “Enhancing data analysis with noise removal,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, iss. 3, рр. 304–319, 2006. Available: http://datamining.rutgers.edu/publication/ tkdehcleaner.pdf 6. P.L. dos Santos, and N. Wiener, “Indices of Informational Association and Analysis of Complex Socio-Economic Systems,” Entropy, 21(4), 367, 2019. Available: https://doi.org/10.3390/e21040367 7. J. Stefanowski, Discovering Decision Trees. Institute of Computing Science. Poznań University of Technology, 2010, 45 р. Available: https://citeseerx.ist.psu.edu/view- doc/download?doi=10.1.1.176.1423&rep=rep1&type=pdf 8. E. Gómez-Ramos and F. Venegas-Martínez, “A Review of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series?” Analítika, Revista de análisis estadístico, vol. 6, no. 2, рр. 7–15, 2013. 9. R. Tadeusiewicz, Neural networks: A comprehensive foundation: by Simon HAYKIN. USA, New York: Macmillan College Publishing, 1995, 696 p. 10. U. Johansson, Obtaining Accurate and Comprehensible Data Mining Models – An Evolu- tionary Approach. Linköping, Sweden: Department of Computer and Information Sci- ence, Linköpings universitet, 2007, 272 р. Available: http://www.diva- portal.org/smash/get/diva2:23601/FULLTEXT01.pdf 11. G.V. Prisenko and E.I. Ravikovich, Forecasting of socio-economic processes: Textbook. K: KNEU, 2005, 378 p. T.V. Obelets ISSN 1681–6048 System Research & Information Technologies, 2022, № 4 78 12. “Comparing privacy laws: GDPR v. CCPA,” Data Guidance and Future of Privacy Fo- rum, 42 р. Available: https://fpf.org/wp-content/uploads/2018/11/GDPR_CCPA_ Com- parison-Guide.pdf 13. Amundsen. Open source data discovery and metadata engine. Available: https://www.amundsen.io/ 14. Metabase. Built for data. Available: https://www.metabase.com/ 15. A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convo- lutional neural networks,” Advances in neural information processing systems, 25, pp. 1097–1105, 2012. 16. Jiquan Ngiam et al., “Multimodal deep learning,” Proceedings of the 28th international conference on machine learning (ICML-11), 2011. 17. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, 521(7553), pp. 436–444, 2015. 18. J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, 61(C), pp. 85–117, 2015. 19. M. Lavreniuk, N. Kussul, and A. Novikov, “Deep Learning Crop Classification Ap- proach Based on Coding Input Satellite Data Into the Unified Hyperspace,” IEEE 38th International Conference on Electronics and Nanotechnology, pp. 239–244, 2018. 20. K. Krawiec, “Evolutionary feature selection and construction,” in S. Claude and G. Webb (Eds.) Encyclopedia of Machine Learning and Data Mining. Boston, MA: Springer, 2016. 21. M. Reichstein et al., “Deep learning and process understanding for data-driven Earth sys- tem science,” Nature, vol. 566, iss. 7743, pp. 195–204, 2019. Received 16.05.2022 INFORMATIONON THE ARTICLE Tetyana V. Obelets, ORCID: 0000-0002-1553-5150, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: obelectv@ukr.net ІНСТРУМЕНТАРІЙ ІНТЕЛЕКТУАЛЬНОГО АНАЛІЗУ ДАНИХ ДЛЯ СКЛАД- НИХ СОЦІАЛЬНО-ЕКОНОМІЧНИХ ПРОЦЕСІВ ТА СИСТЕМ / Т.В. Обелець Анотація. Розглянуто процес виявлення нової та потенційно корисної інфор- мації з великих обсягів даних, що актуалізує роль розроблення інструментарію інтелектуального аналізу даних для складних соціально-економічних процесів та систем на основі принципів цифрової економіки та їх оброблення за допо- могою мережевих застосунків. Окреслено етапи інтелектуального аналізу да- них для складних соціально-економічних процесів та систем. Розглянуто алго- ритм інтелектуального аналізу даних. Визначено, що використовувані раніше етапи інтелектуального аналізу даних, які обмежувалися лише процесом побу- дови моделі, можуть бути розширені завдяки використанню більш потужної обчислювальної техніки та появи у вільному доступі великої кількості багато- вимірних даних. До наявних етапів інтелектуального аналізу даних для склад- них соціально-економічних процесів та систем включено процеси полегшення підготовки даних, оцінювання та візуалізацію моделей, а також глибинне на- вчання. Визначено інструментарій інтелектуального аналізу даних для склад- них соціально-економічних процесів та систем у контексті технологічного прогресу та відповідно до парадигми великих даних. Досліджено циклічність оброблення даних; цей процес складається із серії кроків, починаючи із входу необроблених даних, закінчуючи виведенням корисної інформації. Отримані на етапі оброблення даних знання закладаються в основу створення моделей складних соціально-економічних процесів та систем. Окреслено два типи мо- делей (описову та прогностичну), що можуть бути створені у процесі інтелек- туального аналізу даних. Визначено алгоритми оцінювання та аналізу даних моделювання складних соціально-економічних процесів та систем відповідно до заздалегідь поставленого завдання. Проаналізовано ефективність запрова- дження нейронних мереж та методів глибинного навчання, що застосовуються у процесі інтелектуального аналізу даних. Визначено, що вони дозволять ефе- ктивно аналізувати та використовувати наявні великі масиви даних як для оперативного управління людськими ресурсами, так і стратегічного плануван- ня розвитку складних соціально-економічних процесів та систем. Ключові слова: інтелектуальний аналіз даних, складні соціально-економічні системи, прогностичне моделювання, нейронні мережі, глибинне навчання.
id journaliasakpiua-article-256771
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-07-17T10:27:48Z
publishDate 2022
publisher The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format ojs
resource_txt_mv journaliasakpiua/6b/e39c50aebe0eacf3aeeea1394103f26b.pdf
spelling journaliasakpiua-article-2567712023-05-21T20:04:38Z Data mining tools for complex socio-economic processes and systems ИНСТРУМЕНТАРИЙ ИНТЕЛЛЕКТУАЛЬНОГО АНАЛИЗА ДАННЫХ ДЛЯ СОСТАВНЫХ СОЦИАЛЬНО-ЭКОНОМИЧЕСКИХ ПРОЦЕССОВ И СИСТЕМ Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем Obelets, Tetyana інтелектуальний аналіз даних складні соціально-економічні системи прогностичне моделювання нейронні мережі глибинне навчання data mining complex socio-economic systems predictive modeling neural networks deep learning The paper considers discovering new and potentially useful information from large amounts of data that actualizes the role of developing data mining tools for complex socio-economic processes and systems based on the principles of the digital economy and their processing using network applications. The stages of data mining for complex socio-economic processes and systems were outlined. The algorithm of data mining was considered. It is determined that the previously used stages of data mining, which were limited to the model-building process, can be extended through the use of more powerful computer technology and the emergence of free access to large amounts of multidimensional data. The available stages of data mining for complex socio-economic processes and systems include the processes of facilitating data preparation, evaluation, and visualization of models, as well as in-depth learning. The data mining tools for complex socio-economic processes and systems in the context of technological progress and following the big data paradigm were identified. The data processing cycle has been investigated; this process consists of a series of steps starting with the input of raw data and ending with the output of useful information. The knowledge obtained at the data processing stage is the basis for creating models of complex socio-economic processes and systems. Two types of models (descriptive and predictive) that could be created in the data mining process were outlined. Algorithms for estimating and analyzing data for modeling complex socio-economic processes and systems in accordance with the pre-set task were determined. The efficiency of introducing neural networks and deep learning methods used in data mining was analyzed. It was determined that they would allow effective analysis and use of the existing large data sets for operational human resources management and strategic planning of complex socio-economic processes and systems. В работе рассмотрен процесс выявления новой и потенциально полезной информации из больших объемов данных, актуализирующих роль разработки инструментария интеллектуального анализа данных для сложных социально-экономических процессов и систем на основе принципов цифровой экономики и их обработки с помощью сетевых приложений. Обозначены этапы интеллектуального анализа данных для сложных социально-экономических процессов и систем. Рассмотрен алгоритм интеллектуального анализа данных. Определено, что ранее используемые этапы интеллектуального анализа данных, которые ограничивались только процессом построения модели, могут быть расширены благодаря использованию более мощной вычислительной техники и появлению в свободном доступе большого количества многомерных данных. В существующие этапы интеллектуального анализа данных для сложных социально-экономических процессов и систем включены процессы облегчения подготовки данных, оценка и визуализация моделей, а также глубинное обучение. Определен инструментарий интеллектуального анализа данных для сложных социально-экономических процессов и систем в контексте технологического прогресса и в соответствии с парадигмой больших данных. Исследована цикличность обработки данных, этот процесс состоит из серии шагов, начиная со входа необработанных данных, заканчивая выводом полезной информации. Полученные на этапе обработки данных знания закладываются в основу создания моделей сложных социально-экономических процессов и систем. Обозначены два типа моделей (описательная и прогностическая), которые могут быть созданы в процессе интеллектуального анализа данных. Определены алгоритмы оценки и анализа данных моделирования сложных социально-экономических процессов и систем в соответствии с заранее поставленной задачей. Проанализирована эффективность внедрения нейронных сетей и методов глубинного обучения, применяемых в процессе интеллектуального анализа данных. Определено, что они позволят эффективно анализировать и использовать большие массивы данных как для оперативного управления человеческими ресурсами, так и стратегического планирования развития сложных социально-экономических процессов и систем. Розглянуто процес виявлення нової та потенційно корисної інформації з великих обсягів даних, що актуалізує роль розроблення інструментарію інтелектуального аналізу даних для складних соціально-економічних процесів та систем на основі принципів цифрової економіки та їх оброблення за допомогою мережевих застосунків. Окреслено етапи інтелектуального аналізу даних для складних соціально-економічних процесів та систем. Розглянуто алгоритм інтелектуального аналізу даних. Визначено, що використовувані раніше етапи інтелектуального аналізу даних, які обмежувалися лише процесом побудови моделі, можуть бути розширені завдяки використанню більш потужної обчислювальної техніки та появи у вільному доступі великої кількості багатовимірних даних. До наявних етапів інтелектуального аналізу даних для складних соціально-економічних процесів та систем включено процеси полегшення підготовки даних, оцінювання та візуалізацію моделей, а також глибинне навчання. Визначено інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем у контексті технологічного прогресу та відповідно до парадигми великих даних. Досліджено циклічність оброблення даних; цей процес складається із серії кроків, починаючи із входу необроблених даних, закінчуючи виведенням корисної інформації. Отримані на етапі оброблення даних знання закладаються в основу створення моделей складних соціально-економічних процесів та систем. Окреслено два типи моделей (описову та прогностичну), що можуть бути створені у процесі інтелектуального аналізу даних. Визначено алгоритми оцінювання та аналізу даних моделювання складних соціально-економічних процесів та систем відповідно до заздалегідь поставленого завдання. Проаналізовано ефективність запровадження нейронних мереж та методів глибинного навчання, що застосовуються у процесі інтелектуального аналізу даних. Визначено, що вони дозволять ефективно аналізувати та використовувати наявні великі масиви даних як для оперативного управління людськими ресурсами, так і стратегічного планування розвитку складних соціально-економічних процесів та систем. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022-12-27 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/256771 10.20535/SRIT.2308-8893.2022.4.06 System research and information technologies; No. 4 (2022); 68-78 Системные исследования и информационные технологии; № 4 (2022); 68-78 Системні дослідження та інформаційні технології; № 4 (2022); 68-78 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/256771/270206
spellingShingle інтелектуальний аналіз даних
складні соціально-економічні системи
прогностичне моделювання
нейронні мережі
глибинне навчання
Obelets, Tetyana
Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
title Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
title_alt Data mining tools for complex socio-economic processes and systems
ИНСТРУМЕНТАРИЙ ИНТЕЛЛЕКТУАЛЬНОГО АНАЛИЗА ДАННЫХ ДЛЯ СОСТАВНЫХ СОЦИАЛЬНО-ЭКОНОМИЧЕСКИХ ПРОЦЕССОВ И СИСТЕМ
title_full Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
title_fullStr Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
title_full_unstemmed Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
title_short Інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
title_sort інструментарій інтелектуального аналізу даних для складних соціально-економічних процесів та систем
topic інтелектуальний аналіз даних
складні соціально-економічні системи
прогностичне моделювання
нейронні мережі
глибинне навчання
topic_facet інтелектуальний аналіз даних
складні соціально-економічні системи
прогностичне моделювання
нейронні мережі
глибинне навчання
data mining
complex socio-economic systems
predictive modeling
neural networks
deep learning
url https://journal.iasa.kpi.ua/article/view/256771
work_keys_str_mv AT obeletstetyana dataminingtoolsforcomplexsocioeconomicprocessesandsystems
AT obeletstetyana instrumentarijintellektualʹnogoanalizadannyhdlâsostavnyhsocialʹnoékonomičeskihprocessovisistem
AT obeletstetyana ínstrumentaríjíntelektualʹnogoanalízudanihdlâskladnihsocíalʹnoekonomíčnihprocesívtasistem