Fast filtered vector search (without a database engine)

Today many AI/ML systems like recommenders, image search, or text search rely on finding the most similar vectors for a given vector. As exact brute force search is linear in the number of vectors to search in, brute force search is too slow for systems with millions of vectors when single digit milliseconds retrieval time on cheap hardware is desired. Many open source libraries exist for approximate solutions (Approximate Nearest Neighbours – ANN). A good overview of existing ANN library implementations and their performance-accuracy tradeoffs is ann-benchmarks.com.

The top performing ANN libraries often rely on Hierarchical Navigable Small World graphs (HNSW) and it is easy to achieve good retrieval time at acceptable recall as long as the search spans the entire vector set. However, real world applications often require searching in a fraction of the vectors e.g. a fashion store might want to recommend the best matches of the summer collection to a website visitor. Once you try to search in a fraction of the vector set, ANN retrieval time increases. This observation is counter-intuitive and may come unexpected. Below plot shows an example of the last.fm dataset with approximately 300k vectors. Brute force search on the full vector set is arround 100 milliseconds, whereas ANN yields comparable recall in just a few milliseconds. On the other extreme, when we search in only a small subset of the full vecotr set (e.g. a small music genre), brute force is fast and HNSW based ANN becomes very slow (>300ms).

Thus at some cutoff, where the subset becomes very small, one must switch from ANN to brute force search, but what if you search subspace is larger, e.g. 50% of the vecotrs? In this instance brute force becomes slow and and ANN has an edge.

If the ANN library's API offers only searching in all vectors, one could try to search in the full set and then filter down search results. This so called post-filtering, willl not yield good latency, when the vectors of the subset to search in are far away from the search vector, because a potentially very large number of vectors will be processed before reaching the desired subspace.

At this point, one might consider to move from an embedded ANN library to one of the search engines supporting filtered ANN search, of which there are many available today. For large scale production applications this can be the best option but it has drawbacks:

Using a database engine capable of vector search increases system complexity and thus development cost;
Increased operating cost for the engine;
Increased latencies (by a few milliseconds) for the communication with the engine as oposed to an in memory library embedded in the application.

Until recently there were no options for using an ANN library and doing fast filtered vector search. Since it's recent 0.7 release hnswlib offers this capability. This opens up the possibility of building extremely fast low cost systemes for filtered ANN search. The above graphs are produced with hnswlib and you can finde the code here.

Let us conclude this article with a Python code example of how to do filtered vector search with hnswlib. In this example we use random data for simplicity, but please note, that the behaviour of HNSW indces and filtering on random data is not representative for real world data. Also, a fallback to brute force search is not shown in the example but highly recommended for real world applications. While the code might be similar, for real world applications the HNSW parameters (space, M, efConstruction, efSearch) and the cutoff when switching to brute force search must be tuned.

import numpy as np

import hnswlib

dim = 16
num_elements = 10000

# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))

# Declaring index
hnsw_index = hnswlib.Index(space='l2', dim=dim)

# Initiating index
hnsw_index.init_index(max_elements=num_elements, ef_construction=100, M=16)

# Controlling the recall by setting ef:
# higher ef leads to better accuracy, but slower search
hnsw_index.set_ef(10)

hnsw_index.set_num_threads(4)  # by default using all available cores

# Add data to index
hnsw_index.add_items(data)

# Query the even elements:
filter_function = lambda id: id%2 == 0
# Warning: search with a filter works slow in python in multithreaded mode, therefore we set num_threads=1
labels, distances = hnsw_index.knn_query(data, k=1, num_threads=1, filter=filter_function)