Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда

This paper introduces a novel modification to the Inverted File (IVF) index approach for approximate nearest neighbor search, incorporating supervised learning techniques to enhance the efficacy of intermediate clustering and achieve more balanced cluster sizes. The proposed method involves creating...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2023
1. Verfasser: Bazdyrev, Anton
Format: Artikel
Sprache:Englisch
Veröffentlicht: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023
Schlagworte:
Online Zugang:https://journal.iasa.kpi.ua/article/view/297400
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Institution

System research and information technologies
_version_ 1866302945921859584
author Bazdyrev, Anton
author_facet Bazdyrev, Anton
author_sort Bazdyrev, Anton
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2024-02-01T21:03:07Z
description This paper introduces a novel modification to the Inverted File (IVF) index approach for approximate nearest neighbor search, incorporating supervised learning techniques to enhance the efficacy of intermediate clustering and achieve more balanced cluster sizes. The proposed method involves creating clusters using a neural network by solving a task to classify query vectors into the same bucket as their corresponding nearest neighbor vectors in the original dataset. When combined with minimizing the standard deviation of the bucket sizes, the indexing process becomes more efficient and accurate during the approximate nearest neighbor search. Through empirical evaluation on a test dataset, we demonstrate that the proposed semi-supervised IVF index approach outperforms the industry-standard IVF implementation with fixed parameters, including the total number of clusters and the number of clusters allocated to queries. This novel approach has promising implications for enhancing nearest-neighbor search efficiency in high-dimensional datasets across various applications, including information retrieval, natural language search, recommendation systems, etc.
doi_str_mv 10.20535/SRIT.2308-8893.2023.4.05
first_indexed 2025-07-17T10:28:25Z
format Article
fulltext  A. Bazdyrev, 2023 Системні дослідження та інформаційні технології, 2023, № 4 69 UDC 004.424.4 DOI: 10.20535/SRIT.2308-8893.2023.4.05 SEMI-SUPERVISED INVERTED FILE INDEX APPROACH FOR APPROXIMATE NEAREST NEIGHBOR SEARCH A. BAZDYREV Abstract. This paper introduces a novel modification to the Inverted File (IVF) in- dex approach for approximate nearest neighbor search, incorporating supervised learning techniques to enhance the efficacy of intermediate clustering and achieve more balanced cluster sizes. The proposed method involves creating clusters using a neural network by solving a task to classify query vectors into the same bucket as their corresponding nearest neighbor vectors in the original dataset. When combined with minimizing the standard deviation of the bucket sizes, the indexing process be- comes more efficient and accurate during the approximate nearest neighbor search. Through empirical evaluation on a test dataset, we demonstrate that the proposed semi-supervised IVF index approach outperforms the industry-standard IVF imple- mentation with fixed parameters, including the total number of clusters and the number of clusters allocated to queries. This novel approach has promising implica- tions for enhancing nearest-neighbor search efficiency in high-dimensional datasets across various applications, including information retrieval, natural language search, recommendation systems, etc. Keywords: approximate nearest neighbor search, inverted file index, high- dimensional data, machine learning. INTRODUCTION Approximate Nearest Neighbor (ANN) [1] search is a fundamental problem in many data-driven applications, spanning domains such as information retrieval, image processing, natural language search, and recommendation systems. The efficient retrieval of similar data points from vast datasets is critical for tasks that involve high-dimensional data representations, where exhaustive search methods become computationally infeasible. As the dataset size grows, the computational cost of performing an exact nearest neighbor search using brute force algorithms becomes prohibitive. Brute force approaches involve comparing each query vec- tor with every data point in the dataset, leading to computational inefficiencies and impractical execution times for large datasets. Approximate nearest neighbor algorithms offer a trade-off between search accuracy and efficiency, allowing for the retrieval of reasonably accurate results within a significantly reduced search space. By intelligently approximating the nearest neighbors, these algorithms en- able faster exploration of large datasets, making them essential for real-world ap- plications where timely responses are crucial, such as image and text search, rec- ommendation systems, and similarity-based clustering. One popular approach in ANN is the Inverted File (IVF) index method [2]. Originally, the IVF index was an inverted indexing technique that partitions the dataset into a set of Voronoi cells or “buckets” [3]. Each bucket corresponds to a cluster of data points, and the indices of data points within each bucket are stored efficiently. During the search process, queries are mapped to their corresponding A. Bazdyrev ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 70 buckets, and the search is constrained to the nearest neighbors within these buck- ets, significantly reducing the search space and accelerating the process. The standard IVF index has shown remarkable performance gains in nearest neighbor search tasks. However, it faces challenges in scenarios with unevenly distributed data, leading to imbalanced bucket sizes [4]. These imbalances can result in a suboptimal trade-off between search efficiency and accuracy, as some buckets might be excessively populated, while others remain underutilized. In addition to challenges posed by unevenly distributed data and imbalanced bucket sizes, another significant issue that the standard IVF index may encounter relates to the formation of centroid clusters. The standard approach typically relies on unsupervised clustering techniques to create the centroids or representatives for each bucket. This process can potentially lead to suboptimal cluster assignments, especially when the training data for centroid formation is insufficient or poorly representative of the underlying data distribution. To address this limitation, we propose a novel modification to the IVF index method that leverages supervised learning techniques. Specifically, we train clas- sification neural networks to assign query vectors to their most appropriate bucket, based on the similarity to vectors in the dataset. Moreover, we incorporate an optimization objective to minimize the standard deviation of the bucket sizes, further refining the indexing process. By doing so, we aim to achieve more bal- anced cluster sizes, effectively mitigating the impact of unevenly distributed data. PRELIMINARIES Let’s formulate a general ANN problem. Let },1| { NiєxX d i   be a set of N d-dimensional vectors representing the data points in the dataset. The objective of ANN search is to efficiently find, for a given query vector dqє , an approximate nearest neighbor Xєx * such that the distance between q and *x is minimized. In the Inverted File Index (IVF) approach, we partition the dataset X into K disjoint subsets or buckets, denoted as KBBB , 21  . Each bucket corresponds to a subset (cluster) of vectors in X with corresponding centroids ic — centroid of corresponding iB . The ANN search with the IVF index can be formulated as follows. Given the metric function dist, a query vector dq   , the goal is to find the bucket queryB , with a corresponding centroid queryc that minimizes the distance to the query vec- tor — equation: )), ((argmin }..{ 1 i cc query cqdistc k  . Once the bucket queryB , is identified, we need to find *x — approximate nearest neighbor within that bucket using brute force search — equation: queryBx xqdistx   )), ((argmin* . Optionally, to improve accuracy, it is possible to use several jB adjoining to queryB buckets on the last step depending on the method hyperparameter set. Semi-supervised inverted file index approach for approximate nearest neighbor search Системні дослідження та інформаційні технології, 2023, № 4 71 SEMI-SUPERVISED INVERTED FILE INDEX APPROACH Let dist — some metric function (euclidian, manhattan, etc.). Let },1| { NiєxX d i   vectors representing the data points in the dataset. Let },1| { MiєqQ d i   — a set of M d-dimensional vectors with a similar distribution to real-life production queries be a queries training set, NM  . Let },1, )), ((argmin| { MixqdistrXєrR Xx iii   — set of ground truth near- est neighbors (responses) from X for each Qqi  . Let K є — method hyperparameter, a desired amount of buckets KBBB ,,, 21  , such that X = i K i B1 and  ji BB if ji  . Let KdNN  : – some vector function — equation: } / { )( jijiij BєrBєqPqNN  for Kj ,1 , (1) where } / { jiji BrBqP  — is a conditional probability that ji Bq  given ji Br  . In our case a multi-layer perceptron [5] with a final softmax layer — equation j i zK j z i e e z    1 )(softmax for Ki ,1 , that distributes query vectors iq into buckets KBBB , ,, 21  . We also want this function to have a specific property, that it distributes query vectors Qqi  to the same bucket as their corresponding re- sponses Rri  . We can estimate the NN’s parameters using the maximum likelihood estima- tion method [6; 7], if we consider the task as a standard softmax multiclass classi- fication with a cross-entropy loss function — equation )~(log)ˆ, ( 1 ii K i yyyyCE    . If we consider Q as an input training set and on each epoch step we can calculate actual training targets Y as follows },1 )}),(({maxarg{ ,1 Kj ij MirNNY   — for each training query we assign its ground truth nearest neighbor’s bucket as a tar- get bucket. As a result of NN training, we can explicitly distribute input queries by buckets — equation }))(({argmax)( ,1 qNNqbucket i Ki  for d q  and implicitly get the desired buckets KBBB , ,, 21  — equation:           jxNNXєxB Ki ij ,1 )})(({maxarg | for Kj ,1 . (2) STANDARD DEVIATION-BASED BUCKET SIZE REGULARIZATION The vanilla approach proposed in the previous paragraph can produce imbalanced buckets KBBB , ,, 21  in the result, for example, NN will distribute all the query items in the single bucket, so there will be no full power use of the IVF index. If A. Bazdyrev ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 72 we want the most efficient computational power of the IVF index method, then we obviously need buckets of the most equal size so that the expectation of the search time of a brute force search over a random bucket takes the minimum time. Let },1| { KiBsS ii  — set of buckets sizes after we have trained NN that distributes query vectors by buckets. We can calculate the standard deviation of the dataset S:            1 )( )( 2 N ss S i . If we want to have buckets of approxi- mately equal sizes then we need to minimize )(S . The problem here is that this function is not differentiable with respect to the parameters of the NN model, so we need to use a differentiable approximation of )(S . Using equations (1), (2) we can calculate the expectation of size for each bucket as follows — equation: ; for )( 1 XxxNNs iij N i j     for Kj ,1 . (3) So, we can have },1|{ ~ KjsS i  — set of expectations of bucket sizes after we have trained NN that distributes query vectors by buckets. And ) ~ (S which is differentiable with respect to the parameters of the NN model. Finally, we can introduce a combined multiclass cross-entropy loss function with std-based bucket size regularization in equation: ) (* 11 ~ )~(log 1 ),ˆ, ( Syy N XyyL ijij K j N i             , (4) where            )~(log 1 11 ijij K j N i yy N is a standard cross-entropy component; ) ~ (S — approximated standard deviation of bucket sizes and ) , 0[  — regularization scale. TRAINING ALGORITHM 1. Defining K — desired number of buckets and M — desired maximum bucket size. 2. Initialization of multiclass classification NN weights [8]. 3. On each training epoch: 1. Calculate current epoch targets )})}(({argmax{ ,1 ij Kj rNNY   . 2. Calculate the multiclass cross-entropy loss component using Qqi  as inputs and Yyi as targets. 3. Calculate expectations of sizes for each cluster — equation (3). 4. Calculate ) ~ (S — std-regularization component. 5. Calculate aggregated loss equation (4). 6. Do the backpropagation step using stochastic gradient descent modifi- cation, for example, Adam [9], and update NN’s weights. Semi-supervised inverted file index approach for approximate nearest neighbor search Системні дослідження та інформаційні технології, 2023, № 4 73 4. After the training process is complete, we select the best checkpoint based on the desired performance metric, for example, precision where the actual maximum bucket size < M. If there is no such checkpoint in which the maximum actual bucket size is lower than the desired one, then select the checkpoint with the size closest to the desired one and display the corresponding warning. It could also be useful to apply some dynamic scaling of  regularization parameter to achieve better precision performance results. EXPERIMENTAL RESULTS We’ve used 3 different configurations in our experiments: 1. Both indexed and query data have a Normal distribution: ;)1,0( ~ NX )1,0(~ NQ . 2. Both indexed and query data have a skewed Exponential distribution: )1 (~); 1 ( ~ lExponentiaQlExponentiaX . 3. Indexed data has a Normal distribution and query data has an Exponential distribution that can be similar to different life scenarios: );1,0( ~ NX )1(~ lExponentiaQ . In all cases we use 64-dimensional vectors. We also split query data Q to training and testing parts equally in order to minimize the risk of overfitting and getting incorrect results — we use the train part during NN’s weights optimization and the test part to calculate final metrics. We use a three-layer perceptron with tanh activation functions and Adam [9] optimization algorithm using pytorch framework [10]. We evaluate our algorithm compared to a faiss IVF implementa- tion [11] which is a current industrial standard using SMAPE and precision met- rics — equations: 2|| ||1 100),( 1 /iFiA iFiA n *FASMAPE n i     ; FPTP TP Precision   . Where in our case iA is the distance between i-th query vector iq and its ac- tual nearest neighbor from X and iF is the distance between i-th query vector iq and its suggested by algorithm approximate nearest neighbor from X. In other words, the SMAPE metric shows us how much the distances to the ground truth nearest neighbors and to the approximated neighbors differ on average. In the case of the precision metric, we have TP — the number of cases where the approximate nearest neighbor equals the actual nearest neighbor and FP — the number of cases where the approximate nearest neighbor differs from the actual nearest neighbor. In other words, this metric shows us how often our approxi- mated nearest neighbors exactly coincide with the ground truth ones. We have final results presented in Tables 1, 2, 3. We also have a general structure of the result table: – X-size — number of vectors in the indexed dataset; – Q-size — number of vectors in the queries training set; A. Bazdyrev ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 74 – K — number of buckets in the algorithm; – Nprobe — number of adjoining buckets to use in the brute force phase in order to achieve a better precision; – IFV Prec./ IFV SMAPE — precision and SMAPE metrics of the faiss IFV; – SSIFV Prec./ SSIFV SMAPE — precision and SMAPE metrics of the novel semi-supervised IFV proposed in the paper. T a b l e 1 X-size Q-size K Nprobe IFV Prec. IFV SMAPE SSIFV Prec. SSIFV SMAPE 10K 10K 200 1 0.055 8.7% 0.083 7.7% 10K 10K 200 5 0.200 4.37% 0.255 3.69% 10K 10K 200 20 0.480 1.81% 0.524 1.56% 1M 10K 2000 1 0.063 7.41% 0.071 7.1% 1M 10K 2000 5 0.200 3.8% 0.220 3.72% 1M 10K 2000 20 0.435 1.79% 0.491 1.65% ;)1,0( ~ NX )1,0(~ NQ results T a b l e 2 X-size Q-size K Nprobe IFV Prec. IFV SMAPE SSIFV Prec. SSIFV SMAPE 10K 10K 200 1 0.057 8.68% 0.066 8.53% 10K 10K 200 5 0.197 4.40% 0.207 4.33% 10K 10K 200 20 0.473 1.87% 0.460 1.95% 1M 10K 2000 1 0.061 8.16% 0.069 7.99% 1M 10K 2000 5 0.218 4.32% 0.217 4.34% 1M 10K 2000 20 0.490 1.77% 0.498 1.77% ;)1 (~ lExponentiaX )1(~ lExponentiaQ results T a b l e 3 X-size Q-size K Nprobe IFV Prec. IFV SMAPE SSIFV Prec. SSIFV SMAPE 10K 10K 200 1 0.025 14.76% 0.137 3.87% 10K 10K 200 5 0.107 6.14% 0.403 1.46% 10K 10K 200 20 0.305 2.49% 0.756 0.41% 1M 10K 2000 1 0.035 11.68% 0.141 3.65% 1M 10K 2000 5 0.130 4.97% 0.419 1.28% 1M 10K 2000 20 0.341 2.44% 0.766 0.40% ;)1,0(~ NX )1(~ lExponentiaQ results CONCLUSION The experimental results of our novel semi-supervised modification to the In- verted File (IVF) index approach for approximate nearest neighbor search look very promising, because SS-IVF approach outperforms the industry standard im- plementation in a lot of different experiment configurations from the raw preci- sion/smape metrics perspective, especially in scenarios where query distribution significantly differs from the indexed dataset. However, this SS-IVF algorithm is still quite far from a production solution, since we have not yet done an efficient C/C++ implementation, which would use parallelization and low-level optimizations. Semi-supervised inverted file index approach for approximate nearest neighbor search Системні дослідження та інформаційні технології, 2023, № 4 75 REFERENCES 1. P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” in Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), 1998. 2. H. Jégou, M. Douze, and C. Schmid, “Product Quantization for Nearest Neighbor Search,” IEEE Xplore. [Online]. Available: https://ieeexplore.ieee.org/ docu- ment/5432202 3. G. Voronoi, “Une méthode géométrique pour la détermination des régions de visi- bilité dans le voisinage d’un point de l’espace (A geometric method for determining regions of visibility in the vicinity of a point in space),” Journal de Mathématiques Pures et Appliquées (Journal of Pure and Applied Mathematics), 1908. 4. J. Johnson, M. Douze, and H. Jégou, “Optimizing Product Quantization for Nearest Neighbor Search,” IEEE Xplore. [Online]. Available: https://ieeexplore.ieee.org/ document/6619223 5. D. E. Rumelhart and J. L. McClelland, “Learning Internal Representations by Error Propagation,” IEEE Xplore. [Online]. Available: https://ieeexplore.ieee.org/ docu- ment/6302929 6. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning. 2016. Avail- able: https://www.deeplearningbook.org/ 7. X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks. 2010. [Online]. Available: https://proceedings.mlr.press/v9/ glorot10a/glorot10a.pdf 8. D.P. Kingma, Adam: A Method for Stochastic Optimization. 2014. [Online]. Avail- able: https://arxiv.org/abs/1412.6980 9. PyTorch. [Online]. Available: https://pytorch.org/ 10. faiss::IndexIVF Class Reference. [Online]. Available: https://faiss.ai/cpp_api/struct/ structfaiss_1_1IndexIVF.html Received 05.09.2023 INFORMATION ON THE ARTICLE Anton A. Bazdyrev, ORCID: 0000-0001-8191-897X, Educational and Research In- stitute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: bazdyrev.anton@gmail.com ПІДХІД З НАПІВКЕРОВАНИМ НАВЧАННЯМ В ІНВЕРТОВАНОМУ ФАЙЛОВОМУ ІНДЕКСІ ДЛЯ ПОШУКУ НАБЛИЖЕНОГО НАЙБЛИЖЧОГО СУСІДА / А.А. Баздирев Анотація. Запропоновано удосконалення підходу з використанням інвертова- ного файлового індексу для пошуку наближених найближчих сусідів з викори- станням напівкерованого навчання та навчання з учителем з метою підвищен- ня ефективності проміжної кластеризації та досягнення більш збалансованих розмірів кластерів. Запропонований метод полягає у створенні кластерів за до- помогою нейронної мережі з розв’язанням завдання класифікації векторів за- питів у той самий кластер, що і їхні відповідні найближчі сусідні вектори у ви- хідному наборі даних. У поєднанні з мінімізацією стандартного відхилення розмірів кластерів процес індексування стає більш ефективним і точним під час наближеного пошуку найближчих сусідів. Через емпіричну оцінку на тес- товому наборі даних продемонстровано, що запропонований підхід до індексу виявився більш точним порівняно з індустрійно-стандартною реалізацією із фіксованими параметрами, включаючи загальну кількість кластерів та кіль- кість кластерів, що виділяються для запитів. Метод перспективний для підви- щення ефективності пошуку найближчих сусідів у великорозмірних наборах даних у різних застосуваннях, таких як інформаційний пошук, пошук за при- родною мовою, рекомендаційні системи тощо. Ключові слова: пошук наближених найближчих сусідів, інвертований файло- вий індекс, дані високої розмірності, машинне навчання.
id journaliasakpiua-article-297400
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-07-17T10:28:25Z
publishDate 2023
publisher The National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot;
record_format ojs
resource_txt_mv journaliasakpiua/bf/3cb2d54dcf468bd50926ffd14a14ddbf.pdf
spelling journaliasakpiua-article-2974002024-02-01T21:03:07Z Semi-supervised inverted file index approach for approximate nearest neighbor search Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда Bazdyrev, Anton approximate nearest neighbor search inverted file index high-dimensional data machine learning пошук наближених найближчих сусідів інвертований файловий індекс дані високої розмірності машинне навчання This paper introduces a novel modification to the Inverted File (IVF) index approach for approximate nearest neighbor search, incorporating supervised learning techniques to enhance the efficacy of intermediate clustering and achieve more balanced cluster sizes. The proposed method involves creating clusters using a neural network by solving a task to classify query vectors into the same bucket as their corresponding nearest neighbor vectors in the original dataset. When combined with minimizing the standard deviation of the bucket sizes, the indexing process becomes more efficient and accurate during the approximate nearest neighbor search. Through empirical evaluation on a test dataset, we demonstrate that the proposed semi-supervised IVF index approach outperforms the industry-standard IVF implementation with fixed parameters, including the total number of clusters and the number of clusters allocated to queries. This novel approach has promising implications for enhancing nearest-neighbor search efficiency in high-dimensional datasets across various applications, including information retrieval, natural language search, recommendation systems, etc. Запропоновано удосконалення підходу з використанням інвертованого файлового індексу для пошуку наближених найближчих сусідів з використанням напівкерованого навчання та навчання з учителем з метою підвищення ефективності проміжної кластеризації та досягнення більш збалансованих розмірів кластерів. Запропонований метод полягає у створенні кластерів за допомогою нейронної мережі з розв’язанням завдання класифікації векторів запитів у той самий кластер, що і їхні відповідні найближчі сусідні вектори у вихідному наборі даних. У поєднанні з мінімізацією стандартного відхилення розмірів кластерів процес індексування стає більш ефективним і точним під час наближеного пошуку найближчих сусідів. Через емпіричну оцінку на тестовому наборі даних продемонстровано, що запропонований підхід до індексу виявився більш точним порівняно з індустрійно-стандартною реалізацією із фіксованими параметрами, включаючи загальну кількість кластерів та кількість кластерів, що виділяються для запитів. Метод перспективний для підвищення ефективності пошуку найближчих сусідів у великорозмірних наборах даних у різних застосуваннях, таких як інформаційний пошук, пошук за природною мовою, рекомендаційні системи тощо. The National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot; 2023-12-26 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/297400 10.20535/SRIT.2308-8893.2023.4.05 System research and information technologies; No. 4 (2023); 69-75 Системные исследования и информационные технологии; № 4 (2023); 69-75 Системні дослідження та інформаційні технології; № 4 (2023); 69-75 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/297400/290386
spellingShingle пошук наближених найближчих сусідів
інвертований файловий індекс
дані високої розмірності
машинне навчання
Bazdyrev, Anton
Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
title Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
title_alt Semi-supervised inverted file index approach for approximate nearest neighbor search
title_full Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
title_fullStr Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
title_full_unstemmed Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
title_short Підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
title_sort підхід з напівкерованим навчанням в інвертованому файловому індексі для пошуку наближеного найближчого сусіда
topic пошук наближених найближчих сусідів
інвертований файловий індекс
дані високої розмірності
машинне навчання
topic_facet approximate nearest neighbor search
inverted file index
high-dimensional data
machine learning
пошук наближених найближчих сусідів
інвертований файловий індекс
дані високої розмірності
машинне навчання
url https://journal.iasa.kpi.ua/article/view/297400
work_keys_str_mv AT bazdyrevanton semisupervisedinvertedfileindexapproachforapproximatenearestneighborsearch
AT bazdyrevanton pídhídznapívkerovanimnavčannâmvínvertovanomufajlovomuíndeksídlâpošukunabliženogonajbližčogosusída