Data Science — дефініція та структурне подання

This article is a continuation of the discussion on the existing meanings and formalization of the definition of “Data Science” as an autonomous discipline, field of knowledge, clarification of its defining components, integration, and interaction processes between them. It is noted that most scient...

Full description

Saved in:

Bibliographic Details
Date:	2021
Main Authors:	Maslianko, Pavlo, Sielskyi, Yevhenii
Format:	Article
Language:	English
Published:	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2021
Subjects:	наука про дані діаграма Венна Дрю Конвея означення науки про дані структура науки про дані дані інформація знання
Online Access:	https://journal.iasa.kpi.ua/article/view/236712
Tags:	Add Tag No Tags, Be the first to tag this record!
Journal Title:	System research and information technologies
Download file:

Institution

System research and information technologies

_version_	1867334413633716224
author	Maslianko, Pavlo Sielskyi, Yevhenii
author_facet	Maslianko, Pavlo Sielskyi, Yevhenii
author_institution_txt_mv	[ { "author": "Pavlo Maslianko", "institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Yevhenii Sielskyi", "institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" } ]
author_sort	Maslianko, Pavlo
baseUrl_str	http://journal.iasa.kpi.ua/oai
collection	OJS
datestamp_date	2021-07-13T11:01:37Z
description	This article is a continuation of the discussion on the existing meanings and formalization of the definition of “Data Science” as an autonomous discipline, field of knowledge, clarification of its defining components, integration, and interaction processes between them. It is noted that most scientific results trace the data-centric nature of the presentation and analysis of this discipline, i.e. the emphasis on the word Data. Analysis of the frequency of use of key terms in the definitions of Data Science shows what our colleagues focus on, which terms of the definitions of Data Science they are based on. In this paper, we make and argue certain additions to Drew Conway’s Data Science Venn Diagram, which does not reflect all the resources of the components that define the applied side of Data Science, and, moreover, does not reveal the interaction of these resources not from the point of view of the data researcher, nor in its global understanding. We also propose a unified structural representation of Data Science in the format of an updated Drew Conway’s Venn diagram based on a property/attribute that establishes correspondences that provide integration/interoperability between the elements of the sets of Drew Conway’s Venn diagram. The new definition of Data Science as an interdisciplinary science and methodology of presenting activities for analysis and extraction of data, information, and knowledge is substantiated.
doi_str_mv	10.20535/SRIT.2308-8893.2021.1.05
first_indexed	2025-07-17T10:27:15Z
format	Article
fulltext	 P.P. Maslianko, Y.P. Sielskyi, 2021 Системні дослідження та інформаційні технології, 2021, № 1 61 UDC 004.6:33 DOI: 10.20535/SRIT.2308-8893.2021.1.05 DATA SCIENCE — DEFINITION AND STRUCTURAL REPRESENTATION P.P. MASLIANKO, Y.P. SIELSKYI Abstract. This article is a continuation of the discussion on the existing meanings and formalization of the definition of “Data Science” as an autonomous discipline, field of knowledge, clarification of its defining components, integration, and interac- tion processes between them. It is noted that most scientific results trace the data- centric nature of the presentation and analysis of this discipline, i.e. the emphasis on the word Data. Analysis of the frequency of use of key terms in the definitions of Data Science shows what our colleagues focus on, which terms of the definitions of Data Science they are based on. In this paper, we make and argue certain additions to Drew Conway’s Data Science Venn Diagram, which does not reflect all the re- sources of the components that define the applied side of Data Science, and, more- over, does not reveal the interaction of these resources not from the point of view of the data researcher, nor in its global understanding. We also propose a unified struc- tural representation of Data Science in the format of an updated Drew Conway’s Venn diagram based on a property/attribute that establishes correspondences that provide integration/interoperability between the elements of the sets of Drew Con- way’s Venn diagram. The new definition of Data Science as an interdisciplinary sci- ence and methodology of presenting activities for analysis and extraction of data, in- formation, and knowledge is substantiated. Keywords: Data Science, Drew Conway’s Data Science Venn Diagram, Data Sci- ence definition, Data Science structure, data, information, knowledge. `INTRODUCTION Starting from the 21st century, the phrase Data Science has begun to attract con- siderable attention from the world's academic and professional communities. Why the phrase? Despite dozens of savants trying to interpret its meaning in their own way, throughout numerous discussions about its components, this expression has not acquired the meaning of a clearly defined scientific term. This article aims to carry out research and continue the discussion on the ex- isting definitions and proper formalization of “Data Science” as an autonomous discipline, field of knowledge, clarification of its defining components, their char- acteristics of integration and interaction processes. Thus, Data Science is an ob- ject of analysis, which will be performed through in-depth study and synthesis of existing authoritative scientific results, articles and journals, blogs of well-known authors, and trusted publishers. We systematized the information from all studied sources in the table for further analysis by the following criteria (columns of the table): 0. Definition of Data Science. 1. Keywords of the definition. 2. Semantics of a definition — list of tools on which it is based (methods, models, algorithms, processes, disciplines, etc.), as well as their interaction. P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 62 3. Features of the definition — its purpose (theoretical, practical, special- ized, etc.), scope. 4. Discussion arguments and the uncertainties regarding the definition and understanding of Data Science, given in the source. 5. In total, 11 most common sources were analyzed and cited, which Data Science related key points are briefly described below. RELATED WORK The vast majority of scientific works on Data Science begin either with the fa- mous expression “Data Scientist: The Sexiest Job of the 21st Century” of Thomas Davenport and D.J. Patil [1], or with a reference to Drew Conway’s The Data Science Venn Diagram [2] (Fig. 1), to which we shall return. In some cases, you can even find links to both resources at once. It is worth noting that of all the works, dedicated to Data Science, this dia- gram is perhaps the only attempt at an in-depth presentation of the structure and content of the Data Science model, which nevertheless leaves a dry residue of uncertainty around some areas of the figure. What is a danger zone? What mean- ing does the author put into traditional research? Why did he choose machine learning as the intersection of mathematics and statistics with hacking skills? And, finally, can the last term be perceived as a scientifically justified, reason- able, and meaningful concept with an unambiguous interpretation? Indeed, how can a job that requires hacking skills not be considered as sexy? Engineers and scientists in any definition want to see and understand a certain reasoned meaning, with a solid scientific basis, especially if the very described notion contains the word “science” itself. Otherwise, such loud statements will only provoke excitement and fruitless controversy over the newly introduced term, which, in fact, happened. Due to the ambiguous emergence of Data Science, the debates over the in- terpretation of this name immediately began among the academic and profes- sional communities. In particular, the question arises about the similarity between Substantive Expertise Danger Zone Traditional Research Machine Learning Data Science Fig. 1. The Data Science Venn Diagram [2] Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 63 Data Science and classical, well-known statistics. For example, Cathy O’Neil and Rachel Shutt in their book Doing Data Science [3] (where, by the way, the entire 15 pages are devoted to the first part “Introduction: What is Data Science?”), re- fer to numerous protests by experts in the field of statistics against the uniqueness of Data Science, calling it a new-fashioned rebranding of their alma mater. The authors themselves claim their differences, emphasizing the specific processes created by the pioneers of Data Science, allowing to work with more data — the processes of Data Science [3]. In general, this resource covers more the profes- sional aspect of Data Science, explaining it from the point of view of data scien- tists as specialists in this field, and the skills that such positions require. Vasant Dhar, whose testimony can be found in Communications of the ACM's Data Science and Prediction article [4], also joined the defense in the case of Data Science vs. Statistics. The author focused on a whole list of differences [4]: 1. First of all, data — the main fuel of Data Science — is quickly becoming unstructured, diverse. Therefore, the analysis of “raw” data, as well as combining data of different types (feature engineering), demands additional interpretation and understanding, based on the foundations of multiple other disciplines (linguis- tics, sociology, etc.) [4]. 2. Nowadays, most of the data is produced by computers to be consumed by other computers [4]. In these realities, it is computers that make decisions that encourage their operators — data scientists — to retrain: to play as well the role of risk managers, to act as a guarantor-supervisor of developed system quality instead of the more classic duty of an expert in the context of statistics. 3. Machine learning, applied for creating unfailing predictive models, is an essential Data Science component, which is more and more concerned with fore- casting various values, events, phenomena [4]. “Data Science, … , is perhaps the best label we have for the cross- disciplinary set of skills that are becoming increasingly important in many appli- cations across industry and academia.” — this definition is given by Jake Vander- plas in the Python Data Science Handbook [5] (also with reference to Fig. 1), where he often uses the concept of “skills”, which, again, emphasizes a more pro- fessional application. “Multifaceted discipline” — say the authors of the book Data Science for the Layman: No Math Added [6] Annalyn Ng and Kenneth Soo, focusing on machine learning as a key component and citing a standard algorithm of carrying out re- search in the field of Data Science [6]: 1. Data processing and preparation for analysis. 2. Selection of potentially effective machine learning algorithms. 3. Optimization of (hyper-) parameters of algorithms: training, validation. 4. Construction of integral models (combination of certain algorithms or their separate usage) with their further comparison and selection of the best. In addition to applied specifics, there are definitions of a high level of ab- straction, more clear and intuitively perceived by the human mind. Well-known experts in the field of Data Science, Foster Provost and Tom Fawcett formulate the key activities of data scientists: extracting useful information and knowledge from data [7]. Hence, Data Science is also compared to Data Mining: “At a high level, Data Science is a set of fundamental principles that guide the extraction of P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 64 knowledge from data. Data mining is the extraction of knowledge from data, via technologies that incorporate these principles” [7]. In parallel, the authors focus on the analysis of the structure of Data Science in the context of effective solutions to real business problems [8]. Here, one of the principal processes — data-driven decision making and its progressive automa- tion. Often, and certainly not without reason, Data Science is closely linked to da- ta analysis. For example, Matthew Waller and Stanley Fawcett describe Data Sci- ence quite abstractly: “Generally, Data Science is the application of quantitative and qualitative methods to solve relevant problems and predict outcomes.” [9], but instead derive their own model of influencing data scientist’s performance by two interdependent components: domain knowledge and analytical skills. A Ukrainian specialist, Bohdan Pavlyshenko, agrees with Waller and Faw- cett, focusing on data analysis, the need for a proper understanding of the nature of data, and the specifics of a particular domain in problem-solving [10]. We are currently coming to a certain consensus on the applied essence of Data Science as a business tool, a profession. Most of the above resources trace the data-centric nature of the representation and analysis of the discipline, i.e., the emphasis on the word Data. And what about Science? What about the academic side of the coin? Jeff Leek answers these questions, listing a number of arguments in defense of science and the complexity of solving scientific problems [11], citing, in par- ticular, a quote of John Tukey, a pioneer in data analysis: “The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.” [12]. The author also accen- tuates the main reason for the outbreak of excitement around Data Science — the focus on data, proclaiming: “The long term impact of data science will be meas- ured by the scientific questions we can answer with the data.” [11]. Based on the aforementioned arguments, Fig. 2 shows the results of fre- quency analysis of the most commonly used key terms present in the various definitions of Data Science. This analysis is an example of one of the operations of semantic decomposition, carried out on the basis of the constructed table, the criteria of which are described in the introduction. These indicators help to better understand what our colleagues are focusing on, what terms the definitions of Data Science are based on: for example, the Fig. 2. Histogram of frequency analysis of terms used in the considered definitions of Data Science Term Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 65 most commonly used terms are “domain expertise/knowledge” and “statistics”, which do not fully reflect the components of the object of our research. Instead, the word “science” is mentioned only once, confirming the mostly data-centric nature of existing definitions. Such a simple but quite clear way of comparative analysis of the term use frequency approximately reflects the overall vision of examined authors on the structure and content of Data Science. Thus, on the basis of the results of even these brief studies of the publications of authoritative experts, an ambiguous, incomplete picture of the defining elements of Data Science is formed. Moreover, the obvious problem of a lack of compromise and a clear link between Science and Data is highlighted. SYNTHESIS Based on the preliminary conclusionі, on the above scientific results, we will make some clarifications of the interaction of entities and formalization of Data Science. Studies of the Data Science representation, analysis of the results of the selection and justification of its attributes, provide grounds for making adjustments to the definition and structure of Data Science. These rectifications imply some additions to the repeatedly mentioned Drew Conway’s Data Science Venn diagram (Fig. 1) [2], which does not reflect all the resources of the components that form the applied side of Data Science, and, moreover, does not reveal the interaction of these resources from the point of view of the data scientist, nor in its global sense. In this article, we propose an updated, refined version of Drew Conway’s Venn Data Science diagram and try to explain and justify not only the essence of its components but also the principles of their integration and interoperability (Fig. 3). Fig. 3. Structural representation of Data Science in the format of an updated Venn Data Science diagram P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 66 FORMAL REPRESENTATION OF ELEMENTS OF DATA SCIENCE RESOURCE SETS The set of formalized data resources — D (data), d is an element of set , D Dd  . The set of theoretical and applied data processing tools — I (instrument), i is an element of set , I Ii  . The set of business processes for data, information, and knowledge acquisi- tion by means of theoretical and applied data processing tools — P (process), p is an element of set ,P Pp  . Sets of integrated interdisciplinary resources 1–2, 1–3, and 2–3. Let's explain each entity separately:  data — raw materials, a key research resource that determines the features and models of the particular domains;  theoretical knowledge and applied tools — instruments, through which the process of extracting information and knowledge from data takes place. These include both exact sciences (mathematics, statistics, computer science, machine learning, data analysis, etc.) and software applications (programming languages, their libraries, frameworks, development environments, visualization tools, etc.);  business processes: here we are talking about the organization of research at the meta-level, setting goals and objectives, determining the main stages of work, their sequence, the nature and features of the evaluation of results, etc. Es- sentially, all the above actions will vary from one problem to another. Why “busi- ness”? To emphasize the need to optimize this component in order to maximize the benefits of the carried out researches/developed systems;  integrated interdisciplinary resources 1–2, 1–3, and 2–3: the essence of the corresponding intersections is not fully compatible with the classical defini- tion of the intersection operation in Set Theory, given the obvious fact that the nature of the elements of different sets is different. Therefore, we are talking about the existence of functional relationships between different types of re- sources: for example, theoretically, zone 1–2 includes existing data, known to mankind, that can be processed using existing theoretical knowledge and applied tools, but such interaction is not a subject to any existing business process. DECOMPOSITION OF ELEMENTS OF DATA SCIENCE RESOURCE SETS 1. Data (set D ). Plays the role of a kind of fuel for Data Science instruments and processes. Obviously, there is a lot of data in the world and every second more is created. So let's try to bring order to the ocean of this natural chaos, defin- ing the main elementary component of the entirety of data — a formalized data resource d . Thus, zone 1, shown in Fig. 3, consists of formalized data resources — information, collected, stored in a certain form, which can be classified, for example, by the following criteria:  by type of storage: distinguish digital types of data storage (hard drives, SSDs, USB-drives, etc.) and in contrast to them — more classic — rock, wall in- scriptions, carvings, books, magazines, newspapers, etc. We do not forget the Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 67 immaterial data — unrecorded real-time speech, thoughts, movement of any ob- jects;  by structure, data can be divided into structured (mainly numbers and numerical arrays, standard types of programming languages) and unstructured: audio, images, video, text;  by availability: confidential and public data.  In any case, in the context of area 1 we are talking about the set D of formalized data resources d : }{ . , dDDdd  . Some examples of data as formalized resources are listed below:  numeric data arrays: the simplest (at least for a computer) formalized rep- resentation of information. Arrays can be of different shapes, sizes and contain any number of elements. Note also that the numbers (scalars) themselves can be seen as formalized data resources;  digital images: in their structure — the same, in some sort of way, ordered numerical arrays (pixel values), but at the level of human perception of informa- tion (visualization), play a special, more significant role, so they can also be con- sidered as formalized resources;  audio files are another type of information that humans perceive by ear. Physically, an audio file is a specific set of frequencies — numbers that follow a strict order. Therefore, it is about data in the form of sequences, series, which are reproduced in time;  video files — a more complex case, which includes not only a set of im- ages but also audio. That is, we define the presence of two different types of se- quences, as well as the mechanism of their synchronization as integral elements of correct video playback; As a part of Data Science, all of the above and many other types of data are summarized in one, more extensive formalized data resource — a so-called data- set. In Data Science for Business, such terms as database table, worksheet (for example, a sheet of an .xsl file), and dataset are equated to each other; a more specific decomposition of the latter is presented [7]:  data sets consist of so-called examples (samples) or instances (table rows) [7];  each instance, in turn, is comprised of a fixed (in the classical representa- tion) number of features (columns of the table), the values of which uniquely identify instances [7]. 2. Theoretical knowledge and applied tools (set I ). Any instruments, mod- els, algorithms, human skills, formalized or materially implemented, aimed at car- rying out certain operations on data for their better understanding. For the purpose of formalization we will define an elementary component of this set as an instru- ment i : } { . , iIIii  . Such elements can acquire different levels of abstraction and different scales: for instance, individual clusters of knowledge can, in turn, be combined P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 68 into whole areas of knowledge. Here are the most interesting theoretical aspects for Data Science:  Statistics: let's start with it to pay tribute to fellow statisticians. Undoubt- edly, this is a vast science that includes many i’s, but only a certain part of them are used in Data Science, in particular, elements of descriptive statistics at the stages of exploratory data analysis (EDA) [13]: mode, median, mean, standard deviation. EDA is also a part of Data Analysis.  Data analysis: in order not to invent anything superfluous, we go back to the definition of the most reliable source — John Tukey [14], who provides a comprehensive list of components of the discipline: data analysis procedures, methods of interpreting their results, simplification and improvement of data analysis on the earliest stages of data collection, as well as all the techniques of statistics, that are applicable to data analysis [14]. That is, Data Analysis is close- ly related to the Statistics domain.  Artificial Intelligence (AI): since the inception of this term, the constant debate around its essence has never subsided. AI should be considered as a sepa- rate section of computer science, designed to program machines for human be- havior, thinking, and independent decision-making [15]. In the case of Data Sci- ence applications, decision-making is often based on predicting certain results.  Machine learning: this term originates from Arthur Samuel's article “Some studies in machine learning using the game of checkers” [16], where the author uses this phrase literally — the process of learning machines — program- ming computers for behavior as such that includes the learning process if it were inherent in humans or animals [16]. More specifically, it is about automated op- timization of computer performance, based on experience.  Deep learning [17]: the problems of applying machine learning tech- niques on unstructured data: texts, music, images, etc. are becoming more and more popular. Informative (for computers) representation of such raw data re- quires their automatic interpretation through step-by-step processing of numerical input arrays throughout many stages of data projection onto spaces of the higher levels of abstraction. Such a procedure is a key aspect of Deep Learning [17], i.e. learning the layers (stages) of neural networks on the data via the generalized learning process instead of explicitly developing the necessary projections by hand.  Big Data Analytics: this term should be taken literally — the field of knowledge about the application of advanced analytical methods on the big amounts of data, according to Philip Russom [18]. The presence of big data is manifested not only by their volumes but also by such characteristics as data variety and velocity (3 Vs).  Data Mining: recall the definition of Provost and Fawcett that Data Mining — the extraction of knowledge from data using technologies that embody the principles of Data Science [7]. This example allows us to trace the direct con- nection of Data Mining with Data Science as an integral part of it.  Data visualization — techniques for presenting data of different nature and dimensions in the most understandable and human-readable form — graphic [19]. In this set of tools, in addition to countless frameworks and software that implement the full range of possible charts and graphs (in Cartesian, polar coordi- Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 69 nates, scatter, line plots, histograms, bar and pie charts, 3D images, etc.), more complex machine learning methods of dimensionality reduction can be high- lighted here as well. A good example is the Principal Component Analysis (PCA). Let’s also notice the application tools — instruments that allow implement- ing algorithms and methods of the above theoretical knowledge in the form of (open-source or private) software applications, platforms, frameworks, libraries, systems, and so on. These include:  programming languages widely used in the domain of Data Science: here we can consider both: such programming languages as R [20], Python [5], which are used directly for the development of Data Science systems, for the implemen- tation of the higher-level interfaces and components of such systems; as well as C-family programming languages, used for the development of lower-level APIs (Application Programming Interfaces) in order to optimize and parallelize basic computations. As an example of such a hierarchy — TensorFlow [21] machine learning system from Google Brain;  whole systems of computer mathematics and algebra (MATLAB [22], MathCad [23]), statistics (STATISTICA [24]); machine and deep learning systems, big data systems and environments, which are distinctive by the presence of inter- faces for different programming languages (TensorFlow [21], Torch [25], Spark [26]) or even by embedded graphical user interfaces (eg Orange and KNIME [27]);  separate add-ons of the aforementioned systems of the highest level of ab- straction (Keras [28] for TensorFlow); specialized programming language librar- ies, modules, packages that provide ready-made software solutions for machine learning (Scikit-Learn [29]), data processing (e.g. NLTK [30] for text data), their visualization and interactive calculations (matplotlib [31, 32], pygal, Plotly, Pyvot [31], pandas, seaborn [32], etc.), and many others. This list is not exhaustive and can be extended with many other theoretical and applied instruments. 3. Business processes (set P ). The set of business processes for data, infor- mation, and knowledge acquisition by means of theoretical and applied data proc- essing tools. In order to formally represent the interaction of the two previous sets D and I , a third set P is introduced. To optimally extract knowledge and new informa- tion from the formalized data resource d using the instrument i , we subordinate the whole entirety of work that needs to be done to a certain process p : }{ ., pPPpp  . The relationship between d , i , and p will be demonstrated in more detail further, in the context of the sets of integrated interdisciplinary resources. So far, a basic example of the Data Science process is shown in Fig. 4, suggested by Cathy O’Neil and Rachel Shutt [3]. In this representation, 8 main stages of the process are identified: 1. Collecting raw data from any real-world resources. 2. Data Processing. 3. Their cleaning. 4. Exploratory Data Analysis (EDA). P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 70 5. Construction of statistical models and training of Machine Learning algo- rithms using collected, processed, cleaned data. 6. Stage of internal communication. Presentation of results to members of the development team of a specific Data Science system, as well as its stakeholders. 7. Creation/production of new data used in the real world. 8. Decision-making based on the obtained results. It is worth noting that there are direct links between certain stages: for ex- ample, EDA may reveal a lack of data that needs to be collected, or cleaned data can be visualized to explain its nature to colleagues. We compare the above-described Data Science process with the Cross In- dustry Standard Process for Data Mining (CRISP-DM), analyzed by Foster Pro- vost and Tom Fawcett [7] and presented in Fig. 5. Let us pay attention to its circular iterative-incremental nature, as well as the presence of two rather abstract, generalizing stages — Business Understanding and Data Understanding [7]:  the first embodies the need for a clear problem statement in accordance with the given task, the search for creative methods to achieve the goal, its opti- mal formalization, which would allow the application of already existing methods as effectively as possible;  while the second stage focuses on the strategic analysis of the main raw material Data Mining — data. Here it is essential to understand the basic struc- ture, pros, and cons of the involved data. The proper assessment of the potential sources of additional information, the necessary investment (both time and finan- cial) in their research and use, is also important. Whereafter is an integral Data Preparation procedure, which, by analogy with the process in Fig. 4, combines data processing, cleaning [7], and EDA, which in themselves can be a multi-iterative subprocess. The next stage of CRISP-DM — Modeling — also has a direct correspon- dence in the presentation of the Data Science process, where, again, more specific names are given. Fig. 4. Data Science Process [3] Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 71 Any development should be subject to quality control through regular verification and validation of the built models and the entire system. The authors also implicitly emphasize the need for communication, presentation of results, as well as the concepts of their simple and clear explanation to key stakeholders (investors) [7], who are responsible for making major business decisions. After all, in the context of business, the decisive factor is the successful Deployment of each system approved by management [7]. In essence, the solution to business problems is directly related to obtaining a certain material benefit. 4. Integrated interdisciplinary resources 1–2, 1–3, and 2–3. Here and later in this paper, integrated interdisciplinary resources 1–2, 1–3, and 2–3 are subsets formed on the basis of the presence of integration/interoperability properties between elements of sets D , I , and P . Integration/interoperability of elements of sets D , I , and P is the ability to process a certain data resource by means of a certain subset of instruments, following certain processes. Such a decomposition and generalized systematic representation of the elements of resource sets of Data Science shows and justifies both the complexity and the need for comprehensive research and ongoing discussion on existing definitions and formalizations of Data Science as an autonomous discipline and field of knowledge, clarification of its defining components and characteristics, their integration and interaction processes. DATA SCIENCE AS A SET OF INTERDISCIPLINARY RESOURCES OF SETS D, I, AND P To generalize the formal representation of interdisciplinary sets, we define, for example, an arbitrary entity A as a finite resource set A , where Aa  — ele- Fig. 5. CRISP-DM [7] P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 72 ments of the set A , and an arbitrary entity B as a finite resource set B , where Bb  — elements of the set B . Let us establish the rules of correspondence [33] abC and baC between the elements of the resource set A and the elements of the resource set B . That is, if abCbaBbAa ) ,(, ,  , then we say that the element b of the set B corresponds to the element a of the set A , given correspondence abC . And if baCabAaBb ) ,(, ,  , then we say that the element a of the set A corresponds to the element b of the set B , given correspondence baC . Hereinafter, the term “ xyC correspondence” must be understood as the fun- damental concept of set theory, which establishes, explains, and formalizes the relationships between the elements of the sets }{xX  and }{yY  [33]. Next, we consider the presence of “correspondence” as a common prop- erty/feature of pairs of elements of sets A and B . Let’s now construct some finite set of interdisciplinary elements M from all pairs of elements of the resource set A and elements of the resource set B , which have common properties/features and established on the basis of these features correspondences abC and baC . Definition 1. Set M , , Mm  mCbaBbAabam ab \| } ) ,(, , \| , { } ) ,(, , \| , { baCabAaBbab  , of interdisciplinary pairs of of arbitrary finite sets A and B — a set of pairs of elements formed by elements elements of set A , and elements of set B , having a common property/feature that establishes the correspondences abC and baC between these elements. We substantiate Definition 1 as a simple way to form a set M by combining pairs ),( ba and ),( ab BbAa ,  , selected by a common property/feature that establishes the correspondence abC and baC between these elements. To do so, we define and apply a common property/feature that establishes the corre- spondences abC and baC between the elements of sets A and B . Since the types of properties/features that determine the correspondence xyC can be set quite a lot, in this case, we introduce restrictions and specify the correspondence xyC between elements of sets A and B . In particular, we define such a necessary property/feature for us that pro- vides integration/interoperability between the elements BbAa ,  . According to this property/feature, a certain subset of pairs of elements of sets A and B can be distinguished, which has established correspondences abC and baC . Let the set of pairs of elements } ) ,(, , \| , { abCbaBbAaba  and the set of pairs of elements } ) ,(, , \| , { baCabAaBbab  be defined as such, that have the property/feature, which establishes correspondences abC and baC and provides integration/interoperability between elements a and b . Then the set M is defined as the union of pairs of elements of the set } ) ,(, , \| , { abCbaBbAaba  and the set } ) ,(, , \| , { baCabAaBbab  , Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 73 selected as having a common property/feature that determines the correspon- dences abC and baC . More formally:  } ) ,(, , \| , { abCbaBbAabaM } ),(, , \| , { baCabAaBbab  . Similarly, it is possible to form a set of interdisciplinary pairs of elements of any number of sequentially combined arbitrary finite sets , ,, DBA on the basis of a common property/feature defined for them, which provides integra- tion/interoperability between pairs of elements  , , , DdBbAa and estab- lishes the correspondences of xyC between the elements of adjacent sets. Definition 2. The set of interdisciplinary pairs of elements of an arbitrary number of finite sets LXDBA , , ,, ,  is the set of elements of successive pairs , ),, (, ...),(),,( lxdbba LlXxDdBbAa , , , , ,  such that they have a common property/feature that provides integration/interoperability be- tween the elements lxdba , , ...,, , and establishes xyC correspondences between the elements of adjacent sets: } ) ,(, , \| , [{  abCbaBbAabaM  }] ) ,(, , \| , { baCabAaBbab  } ) ,(, , \| , [{ bdCdbDdBbdb  }] ) ,(, , \| , { dbCbdBbDdbd  } ) ,(, , \| , [{ xlClxLlXxlx }] ) ,(, , \| , { lxCxlXxLlxl  . The practical application aims to solve the problem of forming common pairs, triples, quadruples, etc. of elements of any number of arbitrary finite sets LXDBA , , ..., ,, on the basis of their defined common property/feature, which provides integration/interoperability between the elements , , BbAa  , , ..., XxDd  Ll  and establishes the correspondences xlaabdC  and dbaalxC  between the elements of these sets. Definition 3. The set M , such that , Mm , , , , , , \| , , , , , , { LlXxDdBbAaalxdbam  , , \| , , , , , , { } \| ) , , , , , , ( LlAaabdxlamCalxdba xlaabd   } ) , , , , , , (, , , , ...dbaalxCabdxlaBbDdXx  of interdisciplinary pairs, triples, quadruples, etc. of elements — a set of pairs, triples, quadruples, etc. of elements that can be formed by elements of arbitrary finite sets LXDBA , , , , ,  having a common property/feature that determines the correspondences xlaabdC  and dbaalxC  between these elements. Hence: P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 74 , , , , , , \| , . , , , , { LlXxDdBbAaalxdbaM  , , \| , , , , , , {} ) , , , , , , ( ... LlAaabdxlaCalxdba xlaabd  } ) , , , , , , (, , , , ...dbaalxCabdxlaBbDdXx  . (1) In equation (1), subset , , , , , \| , , , , , , { XxDdBbAaalxdba  ALXDBACalxdbaLl xlaabd  } ) , , , , , , (, ... given correspon- dence xlaabdC ... , and subset , , ..., , , \| , , , , , , { DdXxLlAaabdxla  ABDXLACabdxlaBb dbaalx  } ) , , , , , , (, ... given correspon- dence dbaalxC ... . Thus, formula (1) will be rewritten: ), , , , , , , ( tuple { ......... xlaabddbaalxxlaabd GALXDBACCM  )}, , , , , , , ( tuple ...dbaalxGABDXLA  , (2) where xlaabdG ... and dbaalxG ... — graphs/diagrams/matrices of correspondences xlaabdC ... and dbaalxC ... respectively. Then, for three sets — components of Data Science: formalized data re- sources D ; theoretical and applied data processing tools I ; business processes of data, information and knowledge extraction by means of theoretical and applied data processing instruments P , we formalize the definition of Data Science on the basis of the updated Venn diagram (Fig. 3). Definition 4. The definition “Data Science — interdisciplinary science and methodology of representing activities for analysis and extraction of data, infor- mation, and knowledge” can be formalized as a set of triples of elements of inter- disciplinary resources from three resource sets: Data D , Instruments I , and Pro- cesses P , such that having a common property/feature that provides integration/interoperability between the elements PpIiDd , ,  , and estab- lishes the correspondences dipdC and dpidC between the elements of these sets. That is: } ) , , , (, , , \| , , , { dipdCdpidPpIiDddpidDS    } , , , ) , ,, (, , , \| , , , { dpidCdipddipdIiPpDddipd  . And expression (2) in the context of Data Science will look like: )}, , , , (tuple ), , , , (tuple { dpiddipddpiddipd GDIPDGDPIDCCM  , where dipdG і dpidG — graphs/diagrams/matrices of correspondences dipdC and dpidC respectively. Сonsequently, we formalize the structural representation of Data Science of the updated Venn diagram by the presence of a property/feature, that provides integration/interoperability between elements PpIiDd , ,  , and establishes the correspondences dipdC and dpidC between the elements of resource sets D , I , and P . Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 75 DYNAMICS OF DEVELOPMENT OF STRUCTURAL REPRESENTATION OF DATA SCIENCE IN THE FORMAT OF THE UPDATED VENN DIAGRAM Figure 6 depicts a part of the updated Venn Data Science diagram with sets of integrated interdisciplinary resources in the form of intersecting triangles. We will show how the areas of integrated interdisciplinary resources can be narrowed in favor of Data Science on the example of two extreme cases: 1. Figure 6 — structural representation of Data Science as a partial intersec- tion of integrated interdisciplinary resources (excluding areas of non-integrated resources 1, 2, and 3). For simplicity of visualization, triangles 1–2, 1–3, and 2–3 are equal, but, of course, in practice, the cardinalities of the corresponding sets may differ. Generally, the comparison of the sets of pairs of elements with corre- spondences of different nature is incorrect. Central area — Data Science can be expanded in one of three possible direc- tions by moving one of the three sides of the central triangle outward (see arrows on Fig. 6). The following transformations may take place:  or in the case of the emergence of a new third element that will cover the existing pair of interdisciplinary resources, belonging to the set adjacent to Data Science (an innovative business process, that allows to organize at the meta-level processing of existing data with existing theoretical knowledge and applied tools, has been discovered) — Data Science domain expansion with the advent of new resources;  or in the case of the appearance of a new pair of integrated resources that can be covered by an existing third (method of processing a certain new type of data, that can be subordinated to existing business processes, has been in- vented) — Data Science domain expansion with the emergence of new pairs of resources, links between them. 2. Theoretically, the option of a complete expansion of the Data Science domain with full correspondence and imposition of integrated interdisciplinary resource areas is possible as well (Fig. 7). Fig. 6. Data Science at the intersection of integrated interdisciplinary resources 1 – 2 Date and Theoretical knowledge + applied tools P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 76 Based on the cited scientific results of authoritative authors, as well as on the detailed decomposition and justification of the structural representation of Data Science, we propose the following definition of Data Science: Data Science — interdisciplinary science and methodology of representing activities for analysis and extraction of data, information, and knowledge. This definition of Data Science, from our point of view, more closely unites both Science and Data in the methodology of scientific and practical activities for the analysis and extraction of data, information, and knowledge. CONCLUSION 1. During this study, we examined the main scientific results on Data Sci- ence, the numerous debates about its right to exist as a separate field. The ambi- guity of the existing definitions of Data Science has been established, in particular the incompleteness of individual elements of Drew Conway's Data Science Venn diagram [2] and the vague meaning of its components, which does not fully re- flect the required set of skills for data scientists and engineers of Data Science systems. 2. We propose a unified structural representation of Data Science in the format of an updated Venn diagram based on a property/feature that establishes correspondences that provide integration/interoperability between the elements of the sets of the Venn diagram. 3. A unified diagram of the Data Science domain at the intersection of tri- angles of integrated interdisciplinary resources is presented and the potential for expansion of this domain is demonstrated. 4. The new definition of Data Science as an interdisciplinary science and methodology of representing activities for analysis and extraction of data, infor- mation, and knowledge is substantiated. REFERENCES 1. Thomas Davenport and D.J. Patil, “Data Scientist: The Sexiest Job of the 21st Cen- tury”, Harvard Business Review, October 2012. 2. Drew Conway, “The Data Science Venn Diagram”, Personal blog. September 30, 2010. Fig. 7. Idealistic example of the complete expansion of Data Science Date Science Data science — definition and structural representation Системні дослідження та інформаційні технології, 2021, № 1 77 3. Cathy O’Neil and Rachel Schutt, Doing data science: Straight talk from the front- line. O’Reilly Media, Inc., 2013. 4. Vasant Dhar, “Data science and prediction”, Communications of the ACM, 56.12, pp. 64–73, 2013. 5. Jake Vanderplas, Python data science handbook: Essential tools for working with data. O’Reilly Media, Inc., 2016. 6. Annalyn Ng and Kenneth Soo, “Data Science for the Layman: No Math Added”, Numsense!, 2017. 7. Provost Foster and Tom Fawcett, Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc., 2013. 8. Provost Foster and Tom Fawcett, “Data science and its relationship to big data and data-driven decision making”, Big data, 1.1, pp. 51–59, 2013. 9. Matthew A. Waller and Stanley E. Fawcett, “Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management”, Journal of Business Logistics, 34.2, pp. 77–84, 2013. 10. Bohdan Pavlyshenko, “Subjective view on Data Science in Ukraine”, dou.ua article, January 9, 2017. 11. Jeff Leek, “The key word in “Data Science” is not Data, it is Science”, Simply Statis- tics, December 12, 2013. 12. J.W. Tukey, “Sunset salvo”, The American Statistician, 40(1), pp. 72–76, 1986. 13. J.W. Tukey, Exploratory data analysis, 1977. 14. J.W. Tukey, “The future of data analysis”, The annals of mathematical statistics, 33(1), pp. 1–67, 1962. 15. N.J. Nilsson, The quest for artificial intelligence. Cambridge University Press, 2009. 16. A.L. Samuel, “Some studies in machine learning using the game of checkers”, IBM Journal of research and development, 3(3), pp. 210–229, 1959. 17. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning”, Nature, 521(7553), pp. 436–444, 2015. 18. P. Russom, “Big data analytics”, TDWI best practices report, fourth quarter, 19(4), pp. 1–34, 2011. 19. C.H. Chen, W.K. Härdle, and A. Unwin (Eds.), Handbook of data visualization. Springer Science & Business Media, 2007. 20. R. Ihaka and R. Gentleman, “R: a language for data analysis and graphics”, Journal of computational and graphical statistics, 5(3), pp. 299–314, 1996. 21. M. Abadi et al., “Tensorflow: A system for large-scale machine learning”, in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp. 265–283, 2016. 22. D.J. Higham and N.J. Higham, MATLAB guide. Society for Industrial and Applied Mathematics, 2016. 23. B. Maxfield, Essential PTC® Mathcad Prime® 3.0: A guide for new and current users. Academic Press, 2013. 24. J.P.M. De Sá, Applied statistics using SPSS, Statistica, MatLab and R. Springer Science & Business Media, 2007. 25. R. Collobert, S. Bengio, and J. Mariéthoz, Torch: a modular machine learning soft- ware library (No. REP_WORK). Idiap, 2002. 26. X. Meng et al., “Mllib: Machine learning in apache spark”, The Journal of Machine Learning Research, 17(1), pp. 1235–1241, 2016. 27. H. Wimmer and L.M. Powell, “A comparison of open source tools for data science”, Journal of Information Systems Applied Research, 9(2), pp. 4, 2016. 28. A. Gulli and S. Pal, Deep learning with Keras. Packt Publishing Ltd., 2017. 29. F. Pedregosa et al., “Scikit-learn: Machine learning in Python”, Journal of machine Learning research, 12, pp. 2825–2830, 2011. 30. E. Loper and S. Bird, “Nltk: The natural language toolkit”, arXiv preprint cs/0205028, 2002. 31. C. Adams, Learning Python data visualization. Packt Publishing Ltd., 2014. P.P. Maslianko, Y.P. Sielskyi ISSN 1681–6048 System Research & Information Technologies, 2021, № 1 78 32. C. Rossant, Learning IPython for interactive computing and data visualization. Packt Publishing Ltd., 2013. 33. A.N. Kolmogorov and S.V. Fomin, Introductory real analysis. Courier Corporation, 1975. Received 01.03.2021 INFORMATION ON THE ARTICLE Pavlo P. Maslianko, National Technical University of Ukraine “Igor Sikorsky Kyiv Poly- technic Institute”, Ukraine, e-mail: mppdom@i.ua Yevhenii P. Sielskyi, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: youdjin.sel15@gmail.com DATA SCIENCE — ДЕФІНІЦІЯ ТА СТРУКТУРНЕ ПОДАННЯ / П.П. Маслян- ко, Є.П. Сєльський Анотація. Робота є продовженням дискусії щодо існуючих означень та фор- малізації дефініції «наука про дані» (Data Science) як автономної дисципліни, галузі знань, уточнення її визначальних складових, інтеграції і процесів взає- модії між ними. Зазначено, що в більшості наукових результатів прослідкову- ється датацентричний характер подання й аналізу цієї дисципліни, тобто ак- центування на слові Data. Аналіз частості вживання ключових термінів в означеннях науки про дані (Data Science) показує, на що саме робиться основ- ний акцент та на які терміни означень науки про дані спираються. Внесено й аргументовано певні доповнення до діаграми Венна Дрю Конвея, яка не ві- дображає всіх ресурсів складових, що характеризують прикладний харак- тер науки про дані і не розкриває взаємодію цих ресурсів ані з точки зору дослідника даних, ані в її глобальному розумінні. Запропоновано уніфіковане структурне подання Data Science у форматі оновленої діаграми Венна Дрю Конвея на основі властивості/ознаки, яка встановлює відповідності, що забезпечують інтеграцію/інтероперабельність між елементами множин діаграми Венна Дрю Кон- вея. Обґрунтовано нову дефініцію «наука про дані» як міждисциплінарної науки і методології подання діяльності з аналізу і добування даних, інформації та знань. Ключові слова: наука про дані, діаграма Венна Дрю Конвея, означення науки про дані, структура науки про дані, дані, інформація, знання. DATA SCIENCE — ДЕФИНИЦИЯ И СТРУКТУРНОЕ ПРЕДСТАВЛЕНИЕ / П.П. Маслянко, Є.П. Сельский Аннотация. Работа является продолжением дискуссии о существующих опре- делениях и формализации дефиниции «наука о данных» (Data Science) как ав- тономной дисциплины, области знаний, уточнении ее определяющих состав- ляющих, интеграции и процессов взаимодействия между ними. Отмечено, что в большинстве научных результатов прослеживается датацентрический харак- тер представления и анализа этой дисциплины, т.е. акцентирование на слове Data. Анализ частоты употребления ключевых терминов в определениях науки о данных (Data Science) показывает, на что именно делается основной акцент и на какие термины определений науки о данные опираются. Внесены и аргу- ментированы определенные дополнения к диаграмме Венна Дрю Конвея, ко- торая не отражает всех ресурсов составляющих, характеризующих прикладной характер науки о данных и не раскрывает взаимодействие этих ресурсов ни с точки зрения исследователя данных, ни в ее глобальном понимании. Предло- жено унифицированное структурное представление Data Science в формате обновленной диаграммы Венна Дрю Конвея на основе свойства/признака, устанавливающего соответствия, которые обеспечивают интегра- цию/интероперабельность между элементами множеств диаграммы Венна Дрю Конвея. Обоснована новая дефиниция «наука о данных» как междисцип- линарной науки и методологии представления деятельности по анализу и из- влечению данных, информации и знаний. Ключевые слова: наука о данных, диаграмма Венна Дрю Конвея, определе- ние науки о данных, структура науки о данных, данные, информация, знания.
id	journaliasakpiua-article-236712
institution	System research and information technologies
keywords_txt_mv	keywords
language	English
last_indexed	2025-07-17T10:27:15Z
publishDate	2021
publisher	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format	ojs
resource_txt_mv	journaliasakpiua/76/8561cc2f98dd5e5be3996a423a37ab76.pdf
spelling	journaliasakpiua-article-2367122021-07-13T11:01:37Z Data Science — definition and structural representation Data Science — дефиниция и структурное представление Data Science — дефініція та структурне подання Maslianko, Pavlo Sielskyi, Yevhenii наука про дані діаграма Венна Дрю Конвея означення науки про дані структура науки про дані дані інформація знання Data Science Drew Conway’s Data Science Venn Diagram Data Science definition Data Science structure data information knowledge наука о данных диаграмма Венна Дрю Конвея определение науки о данных структура науки о данных данные информация знания This article is a continuation of the discussion on the existing meanings and formalization of the definition of “Data Science” as an autonomous discipline, field of knowledge, clarification of its defining components, integration, and interaction processes between them. It is noted that most scientific results trace the data-centric nature of the presentation and analysis of this discipline, i.e. the emphasis on the word Data. Analysis of the frequency of use of key terms in the definitions of Data Science shows what our colleagues focus on, which terms of the definitions of Data Science they are based on. In this paper, we make and argue certain additions to Drew Conway’s Data Science Venn Diagram, which does not reflect all the resources of the components that define the applied side of Data Science, and, moreover, does not reveal the interaction of these resources not from the point of view of the data researcher, nor in its global understanding. We also propose a unified structural representation of Data Science in the format of an updated Drew Conway’s Venn diagram based on a property/attribute that establishes correspondences that provide integration/interoperability between the elements of the sets of Drew Conway’s Venn diagram. The new definition of Data Science as an interdisciplinary science and methodology of presenting activities for analysis and extraction of data, information, and knowledge is substantiated. Работа является продолжением дискуссии о существующих определениях и формализации дефиниции "наука о данных" (Data Science) как автономной дисциплины, области знаний, уточнении ее определяющих составляющих, интеграции и процессов взаимодействия между ними. Отмечено, что в большинстве научных результатов прослеживается датацентрический характер представления и анализа этой дисциплины, т.е. акцентирование на слове Data. Анализ частоты употребления ключевых терминов в определениях науки о данных (Data Science) показывает, на что именно делается основной акцент и на какие термины определений науки о данные опираются. Внесены и аргументированы определенные дополнения к диаграмме Венна Дрю Конвея, которая не отражает всех ресурсов составляющих, характеризующих прикладной характер науки о данных и не раскрывает взаимодействие этих ресурсов ни с точки зрения исследователя данных, ни в ее глобальном понимании. Предложено унифицированное структурное представление Data Science в формате обновленной диаграммы Венна Дрю Конвея на основе свойства/признака, устанавливающего соответствия, которые обеспечивают интеграцию/интероперабельность между элементами множеств диаграммы Венна Дрю Конвея. Обоснована новая дефиниция "наука о данных" как междисциплинарной науки и методологии представления деятельности по анализу и извлечению данных, информации и знаний. Робота є продовженням дискусії щодо існуючих означень та формалізації дефініції "наука про дані" (Data Science) як автономної дисципліни, галузі знань, уточнення її визначальних складових, інтеграції і процесів взаємодії між ними. Зазначено, що в більшості наукових результатів прослідковується датацентричний характер подання й аналізу цієї дисципліни, тобто акцентування на слові Data. Аналіз частості вживання ключових термінів в означеннях науки про дані (Data Science) показує, на що саме робиться основний акцент та на які терміни означень науки про дані спираються. Внесено й аргументовано певні доповнення до діаграми Венна Дрю Конвея, яка не відображає всіх ресурсів складових, що характеризують прикладний характер науки про дані і не розкриває взаємодію цих ресурсів ані з точки зору дослідника даних, ані в її глобальному розумінні. Запропоновано уніфіковане структурне подання Data Science у форматі оновленої діаграми Венна Дрю Конвея на основі властивості/ознаки, яка встановлює відповідності, що забезпечують інтеграцію/інтероперабельність між елементами множин діаграми Венна Дрю Конвея. Обґрунтовано нову дефініцію "наука про дані" як міждисциплінарної науки і методології подання діяльності з аналізу і добування даних, інформації та знань. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2021-07-13 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/236712 10.20535/SRIT.2308-8893.2021.1.05 System research and information technologies; No. 1 (2021); 61-78 Системные исследования и информационные технологии; № 1 (2021); 61-78 Системні дослідження та інформаційні технології; № 1 (2021); 61-78 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/236712/235252
spellingShingle	наука про дані діаграма Венна Дрю Конвея означення науки про дані структура науки про дані дані інформація знання Maslianko, Pavlo Sielskyi, Yevhenii Data Science — дефініція та структурне подання
title	Data Science — дефініція та структурне подання
title_alt	Data Science — definition and structural representation Data Science — дефиниция и структурное представление
title_full	Data Science — дефініція та структурне подання
title_fullStr	Data Science — дефініція та структурне подання
title_full_unstemmed	Data Science — дефініція та структурне подання
title_short	Data Science — дефініція та структурне подання
title_sort	data science — дефініція та структурне подання
topic	наука про дані діаграма Венна Дрю Конвея означення науки про дані структура науки про дані дані інформація знання
topic_facet	наука про дані діаграма Венна Дрю Конвея означення науки про дані структура науки про дані дані інформація знання Data Science Drew Conway’s Data Science Venn Diagram Data Science definition Data Science structure data information knowledge наука о данных диаграмма Венна Дрю Конвея определение науки о данных структура науки о данных данные информация знания
url	https://journal.iasa.kpi.ua/article/view/236712
work_keys_str_mv	AT masliankopavlo datasciencedefinitionandstructuralrepresentation AT sielskyiyevhenii datasciencedefinitionandstructuralrepresentation AT masliankopavlo datasciencedefiniciâistrukturnoepredstavlenie AT sielskyiyevhenii datasciencedefiniciâistrukturnoepredstavlenie AT masliankopavlo datasciencedefínícíâtastrukturnepodannâ AT sielskyiyevhenii datasciencedefínícíâtastrukturnepodannâ

Data Science — дефініція та структурне подання

Institution

Similar Items