Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce

Исследована эффективность использования различных языков для фреймворка Apache Hadoop с целью обработки больших коллекций данных на базе модели MapReduce. Акцент сделан на анализе скорости выполнения программ в Hadoop-кластере. Проведено сравнение различных проектов по экосистеме Hadoop для распреде...

Full description

Saved in:

Bibliographic Details
Published in:	Управляющие системы и машины
Date:	2016
Main Authors:	Глибовец, А.Н., Дмитрук, Я.О.
Format:	Article
Language:	Russian
Published:	Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України 2016
Subjects:	Программная инженерия и программные средства
Online Access:	https://nasplib.isofts.kiev.ua/handle/123456789/113403
Tags:	Add Tag No Tags, Be the first to tag this record!
Journal Title:	Digital Library of Periodicals of National Academy of Sciences of Ukraine
Cite this:	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce / А.Н. Глибовец, Я.О. Дмитрук // Управляющие системы и машины. — 2016. — № 5. — С. 84-92. — Бібліогр.: 10 назв. — рос.

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine

_version_	1862671681723564032
author	Глибовец, А.Н. Дмитрук, Я.О.
author_facet	Глибовец, А.Н. Дмитрук, Я.О.
citation_txt	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce / А.Н. Глибовец, Я.О. Дмитрук // Управляющие системы и машины. — 2016. — № 5. — С. 84-92. — Бібліогр.: 10 назв. — рос.
collection	DSpace DC
container_title	Управляющие системы и машины
description	Исследована эффективность использования различных языков для фреймворка Apache Hadoop с целью обработки больших коллекций данных на базе модели MapReduce. Акцент сделан на анализе скорости выполнения программ в Hadoop-кластере. Проведено сравнение различных проектов по экосистеме Hadoop для распределенных вычислений. Описанные эксперименты подтвердили преимущество использования Apache Spark. Установлено, что преимущество в скорости MapReduce-программ, написанных на Java- или другом JVM-языке, существенны. Досліджено ефективність використання різних мов програмування у фреймворку Apache Hadoop для обробки великих колекцій даних з використанням моделі MapReduce. Акцент зроблено на аналізі швидкості виконання програм у Hadoopкластері. Проведено порівняння різних проектів із екосистеми Hadoop для розподілених обчислень. Описано експерименти, які підтвердили переваги використання Apache Spark. Встановлено, що перевага у швидкості MapReduce-програм, написаних на Java- або іншій JVM-мові над іншими, є суттєвою. The effectiveness of the different languages for Apache Hadoop framework to process large data collections based on the MapReduce model is discussed. Apache Hadoop is used in many industrial projects all over world such as Facebook and Yahoo!. It provides the ability to process different tasks effectively and reliably on the cluster to handle the huge amounts of data. MR model allows the developers to ignore the complex architectures by cluster management, and immediately to develop a program. This work investigates the influence of the programming language on the speed of the program in the Apache Hadoop framework. The subject of comparison is the execution of programs in Java, Scala and Python that implements the solution of the simple problem: how long each word in the input collection of text documents is searched. All three programs, in spite of the language, is written in the same style, so that the comparison results are objective. For the experiments, we have chosen the image of ClouderaQuickstart VM virtual machine. The easy use of this virtual machine is that it is already established Hadoop, HDFS, and other services. Also, a cluster of three nodes is created for the study. CDH is elected as the distribution of Apache Hadoop and related projects. The desired configuration on each node is set. Each program is ran for the different size input: 8Mb, 34Mb, 61Mb, 106Mb and 203Mb. During the experiments, the best results is showed by the program that is written in the Apache Spark. In addition, it is found that the MR program in the Apache Hadoop is better to write in Java or any other JVM languages than Python. An advantage in speed is obvious. Also, experiments shows that the processing speed is larger at higher input collections. So, it is not necessary to use Hadoop to work with small data.
first_indexed	2025-12-07T15:33:56Z
format	Article
fulltext
id	nasplib_isofts_kiev_ua-123456789-113403
institution	Digital Library of Periodicals of National Academy of Sciences of Ukraine
issn	0130-5395
language	Russian
last_indexed	2025-12-07T15:33:56Z
publishDate	2016
publisher	Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
record_format	dspace
spelling	Глибовец, А.Н. Дмитрук, Я.О. 2017-02-07T20:48:53Z 2017-02-07T20:48:53Z 2016 Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce / А.Н. Глибовец, Я.О. Дмитрук // Управляющие системы и машины. — 2016. — № 5. — С. 84-92. — Бібліогр.: 10 назв. — рос. 0130-5395 https://nasplib.isofts.kiev.ua/handle/123456789/113403 681.3:658.56 Исследована эффективность использования различных языков для фреймворка Apache Hadoop с целью обработки больших коллекций данных на базе модели MapReduce. Акцент сделан на анализе скорости выполнения программ в Hadoop-кластере. Проведено сравнение различных проектов по экосистеме Hadoop для распределенных вычислений. Описанные эксперименты подтвердили преимущество использования Apache Spark. Установлено, что преимущество в скорости MapReduce-программ, написанных на Java- или другом JVM-языке, существенны. Досліджено ефективність використання різних мов програмування у фреймворку Apache Hadoop для обробки великих колекцій даних з використанням моделі MapReduce. Акцент зроблено на аналізі швидкості виконання програм у Hadoopкластері. Проведено порівняння різних проектів із екосистеми Hadoop для розподілених обчислень. Описано експерименти, які підтвердили переваги використання Apache Spark. Встановлено, що перевага у швидкості MapReduce-програм, написаних на Java- або іншій JVM-мові над іншими, є суттєвою. The effectiveness of the different languages for Apache Hadoop framework to process large data collections based on the MapReduce model is discussed. Apache Hadoop is used in many industrial projects all over world such as Facebook and Yahoo!. It provides the ability to process different tasks effectively and reliably on the cluster to handle the huge amounts of data. MR model allows the developers to ignore the complex architectures by cluster management, and immediately to develop a program. This work investigates the influence of the programming language on the speed of the program in the Apache Hadoop framework. The subject of comparison is the execution of programs in Java, Scala and Python that implements the solution of the simple problem: how long each word in the input collection of text documents is searched. All three programs, in spite of the language, is written in the same style, so that the comparison results are objective. For the experiments, we have chosen the image of ClouderaQuickstart VM virtual machine. The easy use of this virtual machine is that it is already established Hadoop, HDFS, and other services. Also, a cluster of three nodes is created for the study. CDH is elected as the distribution of Apache Hadoop and related projects. The desired configuration on each node is set. Each program is ran for the different size input: 8Mb, 34Mb, 61Mb, 106Mb and 203Mb. During the experiments, the best results is showed by the program that is written in the Apache Spark. In addition, it is found that the MR program in the Apache Hadoop is better to write in Java or any other JVM languages than Python. An advantage in speed is obvious. Also, experiments shows that the processing speed is larger at higher input collections. So, it is not necessary to use Hadoop to work with small data. ru Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України Управляющие системы и машины Программная инженерия и программные средства Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce Ефективність застосування мов програмування в фреймворку Apache Hadoop з використанням MapReduce The Effectiveness of Programming Languages in the Apache Hadoop MapReduce Framework Article published earlier
spellingShingle	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce Глибовец, А.Н. Дмитрук, Я.О. Программная инженерия и программные средства
title	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce
title_alt	Ефективність застосування мов програмування в фреймворку Apache Hadoop з використанням MapReduce The Effectiveness of Programming Languages in the Apache Hadoop MapReduce Framework
title_full	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce
title_fullStr	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce
title_full_unstemmed	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce
title_short	Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce
title_sort	эффективность применения языков программирования в фреймворке apache hadoop с использованием mapreduce
topic	Программная инженерия и программные средства
topic_facet	Программная инженерия и программные средства
url	https://nasplib.isofts.kiev.ua/handle/123456789/113403
work_keys_str_mv	AT glibovecan éffektivnostʹprimeneniââzykovprogrammirovaniâvfreimvorkeapachehadoopsispolʹzovaniemmapreduce AT dmitrukâo éffektivnostʹprimeneniââzykovprogrammirovaniâvfreimvorkeapachehadoopsispolʹzovaniemmapreduce AT glibovecan efektivnístʹzastosuvannâmovprogramuvannâvfreimvorkuapachehadoopzvikoristannâmmapreduce AT dmitrukâo efektivnístʹzastosuvannâmovprogramuvannâvfreimvorkuapachehadoopzvikoristannâmmapreduce AT glibovecan theeffectivenessofprogramminglanguagesintheapachehadoopmapreduceframework AT dmitrukâo theeffectivenessofprogramminglanguagesintheapachehadoopmapreduceframework

Эффективность применения языков программирования в фреймворке Apache Hadoop с использованием MapReduce

Institution

Similar Items