Open SourceImportance: Medium

Large vector search can be costly to run yourself

r/RagJun 12, 2026 · 22h ago

Vector Space Day, hosted by Qdrant, showed a shift from “vector database” toward “search engine.” Storing vectors is becoming a basic feature, while real differences now come from what happens at search time, such as hybrid retrieval, score control, and configurable execution. HubSpot stores more than 20 billion vectors on self-hosted Qdrant and built an internal “Vectors as a Service” platform with Kafka indexers in front of its clusters.

At that scale, Helm was not enough, so HubSpot built its own Kubernetes operator to watch cluster state, call APIs, react to metrics, and rebalance shards. The operator checks the system every 60 seconds.

Quantization results differed by embedding model, meaning some models kept search quality better than others when vectors were compressed. Salesforce was also noted as still using Solr for search.

Key points

The market is moving from plain vector storage toward richer search behavior.
HubSpot runs more than 20 billion vectors on self-hosted Qdrant.
Large self-hosted systems may need Kafka indexers and a Kubernetes operator, not just basic deployment tools.
Helm could not handle state-aware automation like reacting to metrics or rebalancing shards.
Quantization can save space, but each embedding model may lose search quality differently.

Quick term guide

vector database: A special type of storage that saves text as numbers so similar meanings can be found quickly, commonly used for AI memory
search engine: A website like Google or Bing that helps you find information on the internet.
hybrid retrieval: A search method that combines keyword matching with meaning-based search.
self-hosted: Run on your own server instead of managed by another company.
Kubernetes operator: Software that watches a cluster and automatically handles specific operations.
quantization: A way to shrink an AI model by reducing the precision of its numbers, trading a little quality for a much smaller file.
embedding model: An AI model that turns text into an embedding.
vector search: A search method that finds text with similar meaning, not only the same words.

Read original ↗