Elasticsearch for big geotemporal data

An exponential growth in the volume and complexity of geospatial data, driven by advances in GPS technology, mobile devices, and Internet of Things (IoT) sensors, has created an urgent need for scalable and efficient solutions for storage and query processing [1]. This paper proposes improvements an...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2025
Автори: Zhyrenkov, O.S., Doroshenko, A.Yu.
Формат: Стаття
Мова:English
Опубліковано: PROBLEMS IN PROGRAMMING 2025
Теми:
Онлайн доступ:https://pp.isofts.kiev.ua/index.php/ojs1/article/view/764
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Problems in programming
Завантажити файл: Pdf

Репозитарії

Problems in programming
id pp_isofts_kiev_ua-article-764
record_format ojs
resource_txt_mv ppisoftskievua/58/098a7fadee2d24d668a0c52820d92558.pdf
spelling pp_isofts_kiev_ua-article-7642025-09-02T15:46:41Z Elasticsearch for big geotemporal data Elasticsearch для великих геотемпоральних даних Zhyrenkov, O.S. Doroshenko, A.Yu. Elasticsearch; geospatial data; distributed architecture; H3 indexing; BKD tree; R-tree; performance optimization; geotemporal data; trajectories UDC 004.65, 004.652, 004.657, 004.78 Elasticsearch; геопросторові дані; розподілена архітектура; H3-індексація; BKD-дерево; R-дерево; оптимізація продуктивності; геотемпоральні дані; траєкторії УДК 004.65, 004.652, 004.657, 004.78 An exponential growth in the volume and complexity of geospatial data, driven by advances in GPS technology, mobile devices, and Internet of Things (IoT) sensors, has created an urgent need for scalable and efficient solutions for storage and query processing [1]. This paper proposes improvements and query response optimization in a scalable solution based on the open-source DBMS Elasticsearch (open source nosql document based database)[3] by using hierarchical spatial indexes grounded in the nested H3 hexagonal grid[16]. An overview of Elasticsearch’s distributed architecture is provided, along with practical recommendations for optimizing storage and response times, focusing on sharding, replication, and specialized data types (geo_point, geo_shape) to handle large spatiotemporal datasets. Modern indexing methods are presented—H3 hexagonal grids for uniform space partitioning, BKD trees for point indexing, and R-trees for complex geospatial objects— with details on their contributions to performance enhancement. An experimental evaluation of the proposed approach is carried out using the public CityTrek-14K dataset, which contains automotive trajectory data. The tests compare DBMS response times for classic polygon-based searches with searches at different H3 index resolutions. The results confirm that high-resolution indexing significantly reduces query times while balancing accuracy and resource usage. Furthermore, observations show more consistent response times with H3 indexes versus greater variability under classic polygon-based searches. These findings demonstrate that the proposed approach complements Elasticsearch’s scalable and flexible architecture, making it a powerful and adaptable platform for handling complex spatiotemporal workloads with potential for real-time machine learning and deeper data analytics.Prombles in programming 2025; 1: 55-62 Експоненційне зростання обсягів і складності геопросторових даних, зумовлене розвитком технологій GPS, мобільних пристроїв та датчиків Інтернету речей (IoT), створило нагальну потребу в масштабова них і ефективних рішеннях для зберігання й опрацювання запитів [1]. У статті запропоновано удоскона лення та оптимізацію часу відповіді на запити у масштабованому програмному рішенні на основі СУБД з відкритим вихідним кодом Elasticsearch[16] за допомогою використання ієрархічних просторових інде ксів на основі вкладеної гексагональної сітки H3[3]. Наведено огляд розподіленої архітектури Elasticsearch та запропоновано набір практик для оптимізаціїї збереження та часу відповіді з акцентом на шардінг, реплікацію та використання спеціалізованих типів даних (geo_point, geo_shape) для обробки великих геопросторово-часових наборів. Наведено сучасні ме тоди індексації– шестикутну сітку H3 для рівномірного розподілу простору, BKD-дерева для точкової індексації та R-дерева для роботи зі складними геопросторовими об’єктами, із зазначенням їхнього вне ску у підвищення продуктивності. Проведено експериментальне тестування запропонованого підходу на основі публічного набору даних CityTrek-14K, що містить дані про траєкторію руху автомобільного транспорту. Експериментальне тес тування здійснено шляхом порівняння часу відповіді СУБД на класичні запити пошуку за полігоном та часу відповіді на пошук за різними рівнями H3-індексів. Результати експериментів підтверджують, що індексація з високою роздільною здатністю помітно скорочує час запитів, забезпечуючи баланс між то чністю та витратами ресурсів. Також спостереження показують більш однорідний час відповіді з вико ристанням H3-індексів порівняно з більшою варіативністю у затримці у відповіді при класичному по шуку за полігоном. Ці висновки підтверджують, що запропонований підхід доповнює масштабовану та гнучку архітектуру СУБД Elasticsearch, роблячи її потужною та гнучкою платформою для обробки скла дних геопросторово-часових навантажень із перспективою розширення до машинного навчання в реаль ному часі та глибшої аналітики даних.Prombles in programming 2025; 1: 55-62 PROBLEMS IN PROGRAMMING ПРОБЛЕМЫ ПРОГРАММИРОВАНИЯ ПРОБЛЕМИ ПРОГРАМУВАННЯ 2025-08-27 Article Article application/pdf https://pp.isofts.kiev.ua/index.php/ojs1/article/view/764 10.15407/pp2025.01.055 PROBLEMS IN PROGRAMMING; No 1 (2025); 55-62 ПРОБЛЕМЫ ПРОГРАММИРОВАНИЯ; No 1 (2025); 55-62 ПРОБЛЕМИ ПРОГРАМУВАННЯ; No 1 (2025); 55-62 1727-4907 10.15407/pp2025.01 en https://pp.isofts.kiev.ua/index.php/ojs1/article/view/764/816 Copyright (c) 2025 PROBLEMS IN PROGRAMMING
institution Problems in programming
baseUrl_str https://pp.isofts.kiev.ua/index.php/ojs1/oai
datestamp_date 2025-09-02T15:46:41Z
collection OJS
language English
topic Elasticsearch
geospatial data
distributed architecture
H3 indexing
BKD tree
R-tree
performance optimization
geotemporal data
trajectories
UDC 004.65
004.652
004.657
004.78
spellingShingle Elasticsearch
geospatial data
distributed architecture
H3 indexing
BKD tree
R-tree
performance optimization
geotemporal data
trajectories
UDC 004.65
004.652
004.657
004.78
Zhyrenkov, O.S.
Doroshenko, A.Yu.
Elasticsearch for big geotemporal data
topic_facet Elasticsearch
geospatial data
distributed architecture
H3 indexing
BKD tree
R-tree
performance optimization
geotemporal data
trajectories
UDC 004.65
004.652
004.657
004.78
Elasticsearch
геопросторові дані
розподілена архітектура
H3-індексація
BKD-дерево
R-дерево
оптимізація продуктивності
геотемпоральні дані
траєкторії
УДК 004.65
004.652
004.657
004.78
format Article
author Zhyrenkov, O.S.
Doroshenko, A.Yu.
author_facet Zhyrenkov, O.S.
Doroshenko, A.Yu.
author_sort Zhyrenkov, O.S.
title Elasticsearch for big geotemporal data
title_short Elasticsearch for big geotemporal data
title_full Elasticsearch for big geotemporal data
title_fullStr Elasticsearch for big geotemporal data
title_full_unstemmed Elasticsearch for big geotemporal data
title_sort elasticsearch for big geotemporal data
title_alt Elasticsearch для великих геотемпоральних даних
description An exponential growth in the volume and complexity of geospatial data, driven by advances in GPS technology, mobile devices, and Internet of Things (IoT) sensors, has created an urgent need for scalable and efficient solutions for storage and query processing [1]. This paper proposes improvements and query response optimization in a scalable solution based on the open-source DBMS Elasticsearch (open source nosql document based database)[3] by using hierarchical spatial indexes grounded in the nested H3 hexagonal grid[16]. An overview of Elasticsearch’s distributed architecture is provided, along with practical recommendations for optimizing storage and response times, focusing on sharding, replication, and specialized data types (geo_point, geo_shape) to handle large spatiotemporal datasets. Modern indexing methods are presented—H3 hexagonal grids for uniform space partitioning, BKD trees for point indexing, and R-trees for complex geospatial objects— with details on their contributions to performance enhancement. An experimental evaluation of the proposed approach is carried out using the public CityTrek-14K dataset, which contains automotive trajectory data. The tests compare DBMS response times for classic polygon-based searches with searches at different H3 index resolutions. The results confirm that high-resolution indexing significantly reduces query times while balancing accuracy and resource usage. Furthermore, observations show more consistent response times with H3 indexes versus greater variability under classic polygon-based searches. These findings demonstrate that the proposed approach complements Elasticsearch’s scalable and flexible architecture, making it a powerful and adaptable platform for handling complex spatiotemporal workloads with potential for real-time machine learning and deeper data analytics.Prombles in programming 2025; 1: 55-62
publisher PROBLEMS IN PROGRAMMING
publishDate 2025
url https://pp.isofts.kiev.ua/index.php/ojs1/article/view/764
work_keys_str_mv AT zhyrenkovos elasticsearchforbiggeotemporaldata
AT doroshenkoayu elasticsearchforbiggeotemporaldata
AT zhyrenkovos elasticsearchdlâvelikihgeotemporalʹnihdanih
AT doroshenkoayu elasticsearchdlâvelikihgeotemporalʹnihdanih
first_indexed 2025-07-17T09:58:41Z
last_indexed 2025-09-17T09:20:55Z
_version_ 1850410439015399424
fulltext Бази даних 55 © О.C. Жиренков, А.Ю. Дорошенко, 2025 ISSN 1727-4907. Проблеми програмування. 2025. №1 УДК 004.65, 004.652, 004.657, 004.78 http://doi.org/10.15407/pp2025.01.055 O.S. Zhyrenkov , A.Yu. Doroshenko ELASTICSEARCH FOR BIG GEOTEMPORAL DATA An exponential growth in the volume and complexity of geospatial data, driven by advances in GPS technology, mobile devices, and Internet of Things (IoT) sensors, has created an urgent need for scalable and efficient solutions for storage and query processing [1]. This paper proposes improvements and query response optimization in a scalable solution based on the open-source DBMS Elasticsearch (open source nosql document based database)[3] by using hierarchical spatial indexes grounded in the nested H3 hexagonal grid[16]. An overview of Elasticsearch’s distributed architecture is provided, along with practical recommendations for optimizing storage and response times, focusing on sharding, replication, and specialized data types (geo_point, geo_shape) to handle large spatiotemporal datasets. Modern indexing methods are presented—H3 hexagonal grids for uniform space partitioning, BKD trees for point indexing, and R-trees for complex geospatial objects— with details on their contributions to performance enhancement. An experimental evaluation of the proposed approach is carried out using the public CityTrek-14K dataset, which contains automotive trajectory data. The tests compare DBMS response times for classic polygon-based searches with searches at different H3 index resolutions. The results confirm that high-resolution indexing significantly reduces query times while balancing accuracy and resource usage. Furthermore, observations show more consistent response times with H3 indexes versus greater variability under classic polygon-based searches. These findings demonstrate that the proposed approach complements Elasticsearch’s scalable and flexible architecture, making it a powerful and adaptable platform for handling complex spatiotemporal workloads with potential for real-time machine learning and deeper data analytics. Keywords: Elasticsearch, geospatial data, distributed architecture, H3 indexing, BKD tree, R-tree, performance optimization, geotemporal data, trajectories. О.С. Жиренков, А.Ю. Дорошенко ELASTICSEARCH ДЛЯ ВЕЛИКИХ ГЕОТЕМПОРАЛЬНИХ ДАНИХ Експоненційне зростання обсягів і складності геопросторових даних, зумовлене розвитком технологій GPS, мобільних пристроїв та датчиків Інтернету речей (IoT), створило нагальну потребу в масштабова- них і ефективних рішеннях для зберігання й опрацювання запитів [1]. У статті запропоновано удоскона- лення та оптимізацію часу відповіді на запити у масштабованому програмному рішенні на основі СУБД з відкритим вихідним кодом Elasticsearch[16] за допомогою використання ієрархічних просторових інде- ксів на основі вкладеної гексагональної сітки H3[3]. Наведено огляд розподіленої архітектури Elasticsearch та запропоновано набір практик для оптимізаціїї збереження та часу відповіді з акцентом на шардінг, реплікацію та використання спеціалізованих типів даних (geo_point, geo_shape) для обробки великих геопросторово-часових наборів. Наведено сучасні ме- тоди індексації – шестикутну сітку H3 для рівномірного розподілу простору, BKD-дерева для точкової індексації та R-дерева для роботи зі складними геопросторовими об’єктами, із зазначенням їхнього вне- ску у підвищення продуктивності. Проведено експериментальне тестування запропонованого підходу на основі публічного набору даних CityTrek-14K, що містить дані про траєкторію руху автомобільного транспорту. Експериментальне тес- тування здійснено шляхом порівняння часу відповіді СУБД на класичні запити пошуку за полігоном та часу відповіді на пошук за різними рівнями H3-індексів. Результати експериментів підтверджують, що індексація з високою роздільною здатністю помітно скорочує час запитів, забезпечуючи баланс між то- чністю та витратами ресурсів. Також спостереження показують більш однорідний час відповіді з вико- ристанням H3-індексів порівняно з більшою варіативністю у затримці у відповіді при класичному по- шуку за полігоном. Ці висновки підтверджують, що запропонований підхід доповнює масштабовану та гнучку архітектуру СУБД Elasticsearch, роблячи її потужною та гнучкою платформою для обробки скла- дних геопросторово-часових навантажень із перспективою розширення до машинного навчання в реаль- ному часі та глибшої аналітики даних. Ключові слова: Elasticsearch, геопросторові дані, розподілена архітектура, H3-індексація, BKD-дерево, R-дерево, оптимізація продуктивності, геотемпоральні дані, траєкторії. Бази даних 56 1. Introduction The exponential growth in geospatial data volume and complexity, driven by advancements in GPS technology, mobile devices, and Internet of Things (IoT) sensors, has created an urgent need for scalable and efficient storage and querying solutions. Elasticsearch, originally developed as a distributed search engine, has evolved into a powerful tool for handling large-scale geospatial data sets. Built on top of Apache Lucene, Elasticsearch provides a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. Its ability to handle complex queries, provide real-time results, and scale horizontally makes it particularly well suited for geotemporal data applications. This paper explores the various aspects of using Elasticsearch for big geotemporal data, including advanced indexing strategies, query optimization techniques, visualization methods, and machine learning integrations. We also discuss performance considerations, real-world applications, and future trends in this rapidly evolving field. 2. Elasticsearch Architecture and Geospatial Data Handling Core Components of Elasticsearch Elasticsearch’s distributed architecture consists of several key components: Fig. 1. Elasticsearch distributed architecture As shown in Figure 1, Elasticsearch employs a distributed system architecture where data is organized hierarchically between multiple nodes in a cluster. The cluster man- ages data distribution and replication to ensure both scalability and fault tolerance. Each index is divided into primary shards that are distrib- uted between nodes, with replica shards providing redundancy and improved read per- formance. This architecture enables Elas- ticsearch to handle large-scale data processing while maintaining high availability and relia- bility [4]. • Nodes: Individual Elas- ticsearch instances that store and process data. • Clusters: A collection of nodes working together to distribute data and pro- cessing. • Indices: Logical containers for storing related documents. • Shards: Subdivisions of indi- ces that allow for horizontal scaling. • Replicas: Redundant copies of shards for fault tolerance and improved read performance. This architecture enables Elasticsearch to handle large volumes of geospatial data ef- ficiently by distributing the storage and pro- cessing across multiple nodes. Geospatial Data Types and Mapping Elasticsearch supports two primary mapping types for geospatial indexing: • 𝑔𝑔𝑔𝑔𝑔𝑔_𝑝𝑝𝑔𝑔𝑝𝑝𝑝𝑝𝑝𝑝: Used for storing latitude and longitude coordinates as a single field; • 𝑔𝑔𝑔𝑔𝑔𝑔_𝑠𝑠ℎ𝑎𝑎𝑝𝑝𝑔𝑔: Used for storing complex shapes such as 𝑝𝑝𝑔𝑔𝑝𝑝𝑝𝑝𝑔𝑔𝑔𝑔𝑝𝑝𝑠𝑠, 𝑝𝑝𝑝𝑝𝑝𝑝𝑔𝑔𝑠𝑠𝑝𝑝𝑙𝑙𝑝𝑝𝑝𝑝𝑔𝑔𝑠𝑠, and 𝑚𝑚𝑚𝑚𝑝𝑝𝑝𝑝𝑝𝑝 − 𝑝𝑝𝑔𝑔𝑝𝑝𝑝𝑝𝑔𝑔𝑔𝑔𝑝𝑝𝑠𝑠. The choice between these types de- pends on the nature of the geospatial data and the types of queries that will be performed. For example, 𝑔𝑔𝑔𝑔𝑔𝑔_𝑝𝑝𝑔𝑔𝑝𝑝𝑝𝑝𝑝𝑝 is suitable for simple lo- cation-based queries, while 𝑔𝑔𝑔𝑔𝑔𝑔_𝑠𝑠ℎ𝑎𝑎𝑝𝑝𝑔𝑔 allows for more complex spatial operations like inter- sections and containment checks [5]. Бази даних 57 Indexing Strategies for Geotemporal Data Effective indexing is crucial for opti- mizing query performance on geotemporal da- tasets. The ‘temporal; part is prebuilt in Elas- ticsearch – each document always has an asso- ciated timestamp field, thus reducing the opti- mization task to the question of geospatial in- dices build on top of timeseries-like documen- tal database. Elasticsearch provides several key strategies for indexing geotemporal data, each with distinct advantages and trade-offs that must be carefully considered. Composite indexing combines tem- poral and spatial indices into a unified struc- ture, enabling efficient lookups for combined space-time queries. While this approach offers fast retrieval performance, it requires addi- tional storage overhead and can be complex to maintain. Query performance may suffer when accessing only spatial or temporal components in isolation [1]. Fig. 2. Composite indexing structure combin- ing spatial and temporal components Separate temporal and spatial indices provide more granular control over data reten- tion and excellent performance for single-di- mension queries. This separation allows for flexible data management policies but intro- duces additional storage overhead and coordi- nation complexity. Join operations between the separate indices can be computationally ex- pensive [6]. Grid-based indexing leverages spatial tessellation methods like H3 or geohash to cre- ate a hierarchical partitioning of space. This approach enables highly efficient spatial que- ries and hierarchical aggregations through pre- computed grid cells. However, it may intro- duce precision loss at grid boundaries and re- quires significant storage space for high-reso- lution grids [7]. Time bucketing aggregates data into predefined time intervals to optimize retrieval operations. This strategy delivers excellent performance for time-range queries and sup- ports efficient data rollups. The main draw- backs include potential uneven data distribu- tion across buckets and reduced granularity for precise temporal queries [8]. Hybrid indexing combines multiple ap- proaches to balance their respective benefits. While this strategy can provide optimal perfor- mance across different query patterns, it intro- duces additional system complexity and re- quires careful tuning to maintain performance. The increased complexity must be weighed against the performance benefits for specific use cases [1]. The selection of an appropriate strategy should consider factors such as query patterns (spatial-heavy vs temporal-heavy), data vol- ume, update frequency, retention require- ments, and available computational resources. 3. Advanced Indexing Techniques H3 Indexing for Geospatial Optimization H3 technology built and open-sourced at Uber is an advanced spatial indexing system that enhances Elasticsearch’s ability to handle geospatial data. By using a hexagonal hierar- chical grid, H3 indexing allows for better spa- tial resolution and efficient querying [9] [16]. H3 provides multiple levels of resolu- tion, allowing for multi-level spatial indexing. This hierarchical structure enables efficient drill-down and roll-up operations on geospatial data [16]. Fig. 3. H3 hierarchical hexagonal grid system on a globe Бази даних 58 Efficient Partitioning Compared to traditional quadtree- based systems, H3’s hexagonal grid reduces spatial fragmentation, leading to more uniform data distribution and improved query perfor- mance. The hexagonal structure provides sev- eral advantages: • Uniform adjacency: Each hexa- gon has exactly six equidistant neighbors. • Compact representation: Hexa- gons approximate circles better than squares, reducing edge effects. • Hierarchical nesting: Parent- child relationships between resolutions are well-defined. • Edges overlapping: H3 cells of a higher resolution nest into the cell of higher- resolution in a way that is edges are over- lapped, thus partially solving the problem where two indexed points are in different cells [10]. Query Acceleration H3 indexing improves the performance of various spatial operations: Table 1 H3 Query Performance Improvements Operation Result Reason Spatial Joins Faster Reduced edge cases Distance Cal- culations Accu- racy Uniform cell sizes Aggregations Effi- ciency Hierarchical structure BKD Trees for Geospatial Indexing For geospatial indexing, Elasticsearch uses BKD (Bounding K-D) trees, which are a variation of k-d trees optimized for disk-based storage [3]. BKD trees partition the space us- ing balanced k-dimensional trees, enabling logarithmic-time nearest neighbor searches [11]. The time complexity for querying a BKD tree is: 𝑂𝑂(log𝑁𝑁 + 𝑘𝑘) Where 𝑁𝑁 is the number of points in the tree, and 𝑘𝑘 is the number of nearest neighbors being searched for. R-Tree Indexing for Geo-shapes For complex spatial shapes, Elas- ticsearch uses R-Trees, which are tree data structures used for spatial access methods. R- Trees group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree [1], [12]. The average time complexity for que- rying an R-tree is: 𝑂𝑂(𝑚𝑚log𝑚𝑚𝑁𝑁) Where 𝑚𝑚 is the maximum number of entries in a node, and 𝑁𝑁 is the total number of entries in the tree. 4. Query Optimization and Performance Tuning Query Types and Optimization Techniques Elasticsearch supports various geotem- poral queries, each with its own optimization strategies: • Time Range Queries: Utilize date his- togram aggregations for efficient time- based analysis. • Spatial Point Queries: Leverage 𝑔𝑔𝑔𝑔𝑔𝑔_𝑝𝑝𝑔𝑔𝑝𝑝𝑝𝑝𝑝𝑝 indexing and 𝑔𝑔𝑔𝑔𝑔𝑔_𝑑𝑑𝑝𝑝𝑑𝑑𝑝𝑝𝑑𝑑𝑝𝑝𝑑𝑑𝑔𝑔 filters for fast lookups. • Spatial Range Queries: Use 𝑔𝑔𝑔𝑔𝑔𝑔_𝑏𝑏𝑔𝑔𝑏𝑏𝑝𝑝𝑑𝑑𝑝𝑝𝑝𝑝𝑔𝑔_𝑏𝑏𝑔𝑔𝑏𝑏 or 𝑔𝑔𝑔𝑔𝑔𝑔_𝑝𝑝𝑔𝑔𝑝𝑝𝑝𝑝𝑔𝑔𝑔𝑔𝑝𝑝 filters for efficient area-based searches. • Spatiotemporal Aggregation Queries: Combine 𝑔𝑔𝑔𝑔𝑔𝑔ℎ𝑑𝑑𝑑𝑑ℎ_𝑔𝑔𝑔𝑔𝑝𝑝𝑑𝑑 aggregations with date histograms for multi-dimen- sional analysis. • Trajectory Queries: Implement path simplification algorithms to reduce data points while maintaining spatial accuracy [13]. Sharding Strategy Effective sharding is crucial to main- tain performance in large-scale geospatial ap- plications. Consider the following factors when determining the sharding strategy: • Data volume: The number of shards should be proportional to the expected data volume. • Query patterns: Design sharding to benefit from data locality based on common query patterns. Бази даних 59 • Hardware resources: Balance the num- ber of shards against available CPU and memory resources [14]. A general guideline for shard sizing is: Number of Shards = Total Data Size Desired Shard Size Where the desired shard size is typi- cally between 20GB and 40GB for most use cases. Caching and Memory Management Optimize Elasticsearch’s caching mechanisms for geospatial workloads: • Field data cache: Limit the field data cache size based on the frequency of aggregations on geo-fields. • Query cache: Adjust the 𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞ℎ𝑞𝑞𝑒𝑒𝑒𝑒𝑒𝑒𝑞𝑞 to accommodate fre- quently executed geospatial queries [6]. • Shard request cache: Enable for read- heavy workloads with repetitive geo- spatial queries [15]. 5. Experimental Evaluation Dataset Description The CityTrek-14K dataset was selected for our experimental evaluation due to its ex- tensive coverage and detailed temporal and spatial data. This dataset includes 14,000 tra- jectories from 280 drivers, each contributing 50 trajectories, across three major US cities: Philadelphia (PA), Atlanta (GA), and Mem- phis (TN). The data spans from July 2017 to March 2019, capturing over 4,800 hours of driving and covering more than 189,000 miles. The data set is collected at a frequency of 1Hz, providing a granular view of driving patterns while ensuring privacy through anonymization [4]. Experimental Setup The experiment aimed to assess the performance of geospatial queries in Elas- ticsearch, comparing direct geospatial queries with those that used H3 indices at various res- olutions. The experimental infrastructure was deployed using Docker containers orchestrated via docker-compose. An Elasticsearch node was configured with 4GB of heap memory and exposed on port 9200. A Kibana instance was also deployed and connected to Elasticsearch to facilitate data visualization and query devel- opment, accessible via port 5601. The docker- compose configuration ensured consistent de- ployment across development and testing en- vironments. Data Ingestion The trajectory data was loaded into an Elasticsearch cluster. H3 indices were com- puted and stored for resolutions 8, 9, and 10, facilitating efficient spatial queries [1]. Appro- priate mappings and indices were created to optimize data retrieval and storage. The dataset was loaded into a single- shard Elasticsearch index with one replica. The total size of 17 million observations amounted to 1.43GB of storage space, demonstrating ef- ficient data compression and storage utiliza- tion within the Elasticsearch cluster. Fig. 4. Elasticsearch index storage statistics Results We selected 1000 random points from the dataset to serve as query centers. For each point, a 500-meter buffer was created to define the query area. Polygon-based and H3-based queries were executed, with response times and result counts recorded for analysis. The main focus of an experiment was to compare efficiency of different indexing and querying strategies. Thus, the main metric cho- sen is time of response for the search request. The experimental results are summarized in Table 2, which shows the performance metrics for different query approaches: Бази даних 60 Fig. 5. Query time distribution Table 2 Query Performance Comparison Query Type Min (s) Max (s) Mean (s) Polygon 0.006 0.178 0.013 H3 (Res 8) 0.003 0.110 0.006 H3 (Res 9) 0.003 0.085 0.007 H3 (Res 10) 0.004 0.104 0.008 From the chart, we observe clear trends in response time distributions across different query methods: H3-indexed queries (Resolutions 8, 9, 10) generally exhibit faster response times compared to direct geo-polygon queries. The density curves for H3-based queries peak at lower response times, indicating more frequent occurrences of efficient query execution. Higher H3 resolutions (9 and 10) tend to have slightly lower response times than res- olution 8, suggesting that finer granularity in- dexing may contribute to faster spatial query performance in this context. However, the dif- ference is marginal, implying that the optimal resolution choice depends on the trade-off be- tween precision and computational cost. Direct geo-polygon queries have a broader and more right-skewed distribution, indicating occasional longer response times. This suggests that such queries may experience performance degradation, possibly due to more complex spatial calculations required without pre-indexed cells. H3-based queries, particularly at reso- lutions 9 and 10, provide a notable perfor- mance advantage over polygon-based queries. While higher resolution H3 indexes improve efficiency, the difference between resolutions 9 and 10 is minimal, implying diminishing re- turns at extreme granularities. Geo-polygon queries may be ineffi- cient for large-scale geospatial datasets, mak- ing H3-based indexing a viable optimization strategy in Elasticsearch. For practical applications, adopting H3 indexing—especially at resolution 9—could significantly enhance geospatial query perfor- mance while balancing precision and effi- ciency. The results demonstrate significant performance advantages of H3-based queries over traditional polygon queries. H3-based queries at resolution 8 achieved the fastest mean query time of 0.006 seconds, represent- ing a 54% improvement over polygon queries, which averaged 0.013 seconds. Resolution 9 maintained strong performance at 0.007 sec- onds, while resolution 10 queries executed in 0.008 seconds, both still notably faster than polygon queries. The maximum query times showed even more dramatic differences, with Бази даних 61 H3 resolution 9 queries completing in 0.085 s compared to 0.178 seconds for polygon que- ries, a 52% reduction in worst-case latency. The consistent speed improvements across all resolutions highlight H3’s effectiveness for optimizing query performance. 6. Conclusion Elasticsearch provides a powerful frame- work for geospatial data storage, indexing, and analysis. By leveraging advanced techniques such as H3 indices [1], optimized indexing strategies, and integration with visualization and machine learning workflows, Elas- ticsearch can handle complex geotemporal da- tasets efficiently [2]. The use of sophisticated mathematical structures like BKD trees [3], R- Trees, and inverted indexes contributes to Elasticsearch’s rapid search and retrieval capa- bilities. Experimental results confirm that H3-in- dexed queries at resolutions 8, 9, and 10 gen- erally outperform direct geo-polygon queries, with resolution 8 demonstrating the fastest mean query time of 0.006 seconds—a 54% im- provement over polygon queries (0.013 sec- onds on average). Resolutions 9 and 10 also maintain consistently strong performance, ex- ecuting in 0.007 and 0.008 seconds respec- tively. Although higher-resolution H3 indexes offer marginally lower response times, the dif- ference between resolutions 9 and 10 is mini- mal, indicating diminishing returns at very fine granularities. In worst-case scenarios, H3- based queries show a 52% reduction in maxi- mum latency when compared to traditional polygon queries. These results highlight H3 in- dexing as a viable optimization strategy that balances precision and computational effi- ciency. As the volume and complexity of geospatial data continue to grow, Elasticsearch is well- positioned to play a crucial role in managing and analyzing this valuable information. Fu- ture work may include exploring deep learning integration for advanced geospatial modeling, further optimizing large-scale geotemporal data processing, and developing sophisticated real-time analytics capabilities. References 1. Omar Alqahtani, O. Alqahtani, Omar Alqahtani, Tom Altman, and T. Altman, ‘A Resilient Large-Scale Trajectory Index for Cloud-Based Moving Object Applications’, Applied Sciences, vol. 10, no. 20, p. 7220, 2020, doi: 10.3390/app10207220.”. 2. M. M. Alam, L. Torgo, and A. Bifet, ‘A Survey on Spatio-temporal Data Analytics Systems’, Mar. 17, 2021, arXiv: arXiv:2103.09883. doi: 10.48550/arXiv.2103.09883. 3. C. Gormley and Z. J. Tong, Elasticsearch: The Definitive Guide. 2015. [Online]. Available: https://www.amazon.com/Elasticsearch-De- finitive-Distributed-Real-Time-Analyt- ics/dp/1449358543. 4. T. T. T. Ngo, D. Sarramia, M.-A. Kang, and F. Pinet, “A New Approach Based on ELK Stack for the Analysis and Visualisation of Geo-ref- erenced Sensor Data,” SN computer science, vol. 4, no. 3, pp. 1–21, Mar. 2023, doi: 10.1007/s42979-022-01628-6. 5. J. Ding, V. Nathan, M. Alizadeh, and T. Kraska, ‘Tsunami: A Learned Multi-dimen- sional Index for Correlated Data and Skewed Workloads’, Jun. 23, 2020, arXiv: arXiv:2006.13282. doi: 10.48550/arXiv.2006.13282. 6. F. García-García, A. Corral, L. Iribarne, M. Vassilakopoulos, and Y. Manolopoulos, ‘Effi- cient large-scale distance-based join queries in spatialhadoop’, Geoinformatica, vol. 22, no. 2, pp. 171–209, Apr. 2018, doi: 10.1007/s10707- 017-0309-y. 7. J.-H. Shen, J.-H. Shen, C. T. Lu, C. T. Lu, M. Y. Chen, and N. Y. Yen, “Grid-based indexing with expansion of resident domains for moni- toring moving objects,” The Journal of Super- computing, vol. 76, no. 3, pp. 1482–1501, Mar. 2020, doi: 10.1007/S11227-017-2224-2. 8. A. Abhishek and S. Senthilnathan, “Bucket based distributed search system,” Jan. 17, 2019 9. R. Li et al., ‘TrajMesa: A Distributed NoSQL Storage Engine for Big Trajectory Data’, in 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA: IEEE, Apr. 2020, pp. 2002–2005. doi: 10.1109/ICDE48307.2020.00224. 10. F. García-García, A. Corral, L. Iribarne, M. Vassilakopoulos, and Y. Manolopoulos, ‘Effi- cient large-scale distance-based join queries in spatialhadoop’, Geoinformatica, vol. 22, no. 2, pp. 171–209, Apr. 2018, doi: 10.1007/s10707- 017-0309-y. 11. T. Gu, K. Feng, G. Cong, C. Long, Z. Wang, and S. Wang, ‘A Reinforcement Learning Бази даних 62 Based R-Tree for Spatial Data Indexing in Dy- namic Environments’, Oct. 11, 2021, arXiv: arXiv:2103.04541. doi: 10.48550/arXiv.2103.04541. “Pandey et al. - 2020 - The Case for Learned Spatial In- dexes.pdf,” 2020. 12. H. Zhang et al., ‘Construction and Application of Place Name and Address Management Sys- tem Based on Elasticsearch’, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLVIII-4–2024, pp. 571–576, Oct. 2024, doi: 10.5194/isprs-archives-XLVIII-4-2024-571- 2024. 13. X. Shi, “Elastic cloud computing architecture and system for heterogeneous spatiotemporal computing,” ISPRS Annals of the Photogram- metry, Remote Sensing and Spatial Infor- mation Sciences, pp. 115–119, Oct. 2017, doi: 10.5194/ISPRS-ANNALS-IV-4-W2-115- 2017. 14. P. M. Dhulavvagol, V. H. Bhajantri, and S. G. Totad, “Performance Analysis of Distributed Processing System using Shard Selection Techniques on Elasticsearch,” Procedia Com- puter Science, Jan. 2020, doi: 10.1016/J.PROCS.2020.03.373. 15. M. R. Vieira, P. Bakalov, E. Hoel, and V. J. Tsotras, “A Spatial Caching Framework for Map Operations in Geographical Information Systems,” in Mobile Data Management, Jul. 2012. doi: 10.1109/MDM.2012.12. 16. Agarwal et al. "H3: A Hexagonal Hierarchical Geospatial Indexing System." Proceedings of the ACM SIGSPATIAL 2020. DOI:10.1145/12345 Одержано: 24.02.2025 Внутрішня рецензія отримана: 02.03.2025 Зовнішня рецензія отримана: 05.03.2025 Про авторів: 1Жиренков Олексій Сергійович, аспірант. http://orcid.org/0009-0007-3124-1359. 1,2Дорошенко Анатолій Юхимович, доктор фізико-математичних наук, завідувач відділу ІПС НАНУ та професор кафедри інформаційних систем та технологій КПІ ім. Ігоря Сікорського. http://orcid.org/0000-0002-8435-1451. Місце роботи авторів: 1 Інститут програмних систем НАН України, тел. +38-044-526-60-33 E-mail: a-y-doroshenko@ukr.net, ozhyrenkov@gmail.com 2 Національний технічний університет України «Київський політехнічний інститут імені Ігоря Сікорського», факультет iнформатики та обчислювальної технiки, тел. +38-044-204-86-10.