Vespa logo

Vespa

Free tier

AI Search Platform for large-scale vector search, ranking, and real-time inference

Free tier available·Technical·API available·Open source

Key strengths

Hybrid vector + text + structured search in a single platformNative tensor support for complex ML-driven rankingReal-time inference at sub-100ms latency at billions-of-document scaleStreaming search mode for personal/private data (20x cheaper than indexing)Fully managed cloud offering (Vespa Cloud) plus open-source self-hosting
Free tier + paid plans
Oslo, Norway
Founded 2017
Self-hostable
No ratings yet
  • Hybrid ANN + BM25 retrieval — Combine dense vector ANN search with sparse BM25 keyword matching in a single YQL query for state-of-the-art retrieval quality.
  • In-engine ML model inference — Deploy ONNX/TensorFlow/PyTorch models directly into Vespa's ranking pipeline to score documents without external model serving hops.
  • Multi-vector & ColBERT-style retrieval — Use Vespa's tensor formalism to implement late interaction models (e.g., ColBERT) and multi-vector representations natively.
  • Real-time data ingestion at scale — Feed billions of documents with continuous updates using Vespa's document API while maintaining live query serving with no downtime.
  • Streaming search for partitioned data — Implement per-user search over private datasets using Vespa's streaming mode, avoiding global ANN index construction costs.
  • Multi-phase ranking pipelines — Configure cheap first-phase retrieval followed by expensive second-phase re-ranking with full ML models, all within the Vespa serving layer.