Big Data Analytics: principles, trends and tasks (a survey)
We review directions (avenues) of Big Data analysis and their practical meaning as well as problems and tasks in this field. Big Data Analytics appears a dominant trend in development of modern information technologies for management and planning in business. A few examples of real applications of B...
Збережено в:
| Дата: | 2019 |
|---|---|
| Автор: | |
| Формат: | Стаття |
| Мова: | Ukrainian |
| Опубліковано: |
PROBLEMS IN PROGRAMMING
2019
|
| Теми: | |
| Онлайн доступ: | https://pp.isofts.kiev.ua/index.php/ojs1/article/view/360 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
| Назва журналу: | Problems in programming |
| Завантажити файл: | |
Репозитарії
Problems in programming| Резюме: | We review directions (avenues) of Big Data analysis and their practical meaning as well as problems and tasks in this field. Big Data Analytics appears a dominant trend in development of modern information technologies for management and planning in business. A few examples of real applications of Big Data are briefly outlined. Analysis of Big Data is aimed to extract useful sense from raw data collection. Big Data and Big Analytics have evolved as computer society’s response to the challenges raised by rapid grows in data volumes, variety, heterogeneity, velocity and veracity. Big Data Analytics may be seen as today’s phase of researches and developments known under names ‘Data Mining’, ‘Knowledge Discovery in Data’, ‘intelligent data analysis’ etc. We suggest that there exist three modes of large-scale usage of Big Data: 1) ‘intelligent information retrieval; 2) massive “intermediate” data processing (concentration, mining), which may be performed during one or two scanning; 3) model inference from data; 4) knowledge discovery in data. Stages in data analysis cycle are outlined. Because of Big Data are raw, distributed, unstructured, heterogeneous and disaggregated (vertically splitted), this data should be prepared for deep analysis. Data preparation may comprise such jobs as data retrieval, access, filtering, cleaning, aggregation, integration, dimensionality reduction, reformatting etc. There are several classes of typical data analysis problems (tasks), including: cases grouping (clustering), predictive model inference (regression, classification, recognition etc.), generative model inference, extracting structures and regularities from data. Distinction between model inference and knowledge discovery is elucidated. We give some suggestion why ‘deep learning’ (one of the most attractive topic by now) is so successive and popular. One of drawbacks of traditional models is they disability to make prediction under incomplete list of predictors (when some predictors are missed) or under augmented list of predictors. One may overcome this drawback using causal model. Causal networks are illuminated in the survey as attractive in that they appear to be expressive generative models and (simultaneously) predictive models in strict sense. This means they pretend to explain how the object at hand is acting (provided they are adequate). Being adequate, causal network facilitates predicting causal effect of local intervention on the object.Methods used in Big Data Analytics will be reviewed in the next paper. |
|---|