How Filtering Transforms Vector Search

and

Sep 16, 2024

∙ Paid

When you are working with vector search, you can’t directly use something like a “WHERE” statement to find the data you want. Semantic search will find likely similar “things” as your target, but cannot guarantee the accuracy like SQL queries. This is where “filtering” comes into play.

To build a user-friendly search experience, filtering is a crucial feature that can dramatically improve the precision and relevance of your search results. In the context of vector search, filtering plays a vital role in ensuring that your customers can find exactly what they're looking for, whether it's a specific product, service, or piece of information.

Understanding Filtering in Vector Search

Imagine you're running an e-commerce website that sells a wide range of kids toys, including legos, outdoor toys, stuffed animals and so on. Your customers might be searching for the perfect gift for an eight year old boy. Let’s say the customer wants to search the key words “toy cars for 8 year old boy”. A simple vector search alone here might not be enough to surface the exact results they need. This is an example use case where filtering can help.

Filtering allows you to apply additional constraints to your vector search, ensuring that the results not only match the semantic similarity of the query but also adhere to specific criteria, such as price, category, or other metadata. By combining the power of vector search with the precision of filtering, you can create a search experience that truly caters to your customers' needs. In this example, we might want to try filter on the “age” metadata of the toys.

Pre-Filtering and Post-Filtering

Traditionally, there’s pre-filtering and post-filtering in the context of vector search. They refer to the order in which filtering and vector search are performed.

Pre-Filtering: In pre-filtering, the search engine first narrows down the dataset based on specific metadata values, and then performs the vector search within that filtered subset. This helps reduce unnecessary computation and can be particularly effective when dealing with small datasets.

Post-Filtering: In post-filtering, the search engine first performs the vector search to find the most similar results, and then applies the filters to those results. This approach can be problematic when using low-cardinality filters, as it can lead to discarding a large portion of the initial vector search results.

Filterable vector index

Qdrant, a leading vector search engine, has developed a more advanced approach to filtering that addresses the limitations of traditional pre-filtering and post-filtering methods. Qdrant's solution is the filterable vector index, which maintains specialized links between data points that have been filtered out, allowing the vector search to still traverse these points and find the most relevant results.

This approach is particularly beneficial when

Keep reading with a 7-day free trial

Subscribe to The MLnotes Newsletter to keep reading this post and get 7 days of free access to the full post archives.