Vespa
Free tierAI Search Platform for large-scale vector search, ranking, and real-time inference
Free tier available·Technical·API available·Open source
Key strengths
Hybrid vector + text + structured search in a single platformNative tensor support for complex ML-driven rankingReal-time inference at sub-100ms latency at billions-of-document scaleStreaming search mode for personal/private data (20x cheaper than indexing)Fully managed cloud offering (Vespa Cloud) plus open-source self-hosting
Free tier + paid plans
Oslo, Norway
Founded 2017
Self-hostable
No ratings yet
Developer Documentation
Vespa's developer documentation is hosted at docs.vespa.ai and covers everything from schema design and query language to tensor expressions and ML model deployment.
Key technical starting points:
- Schema & Document Model — Define document types with fields for text, structured data, and tensors; schemas drive both indexing and ranking configuration.
- Query Language (YQL) — Vespa uses a SQL-like query language (YQL) for combining full-text search, ANN vector search, and structured filters in a single query.
- Tensor Formalism — Vespa's first-class tensor type enables inline evaluation of neural ranking models (ONNX, TensorFlow, PyTorch) at query time.
- Ranking Profiles — Define multi-phase ranking pipelines using built-in rank features (BM25, nativeRank) and custom ML models without external round-trips.
- Streaming Search Mode — A special streaming retrieval mode designed for personal/private data access patterns, reducing vector index costs by up to 20x.
- Vespa CLI & REST API — Deploy and manage applications via the Vespa CLI, or use the document and query REST APIs for data ingestion and retrieval.
- Vespa Cloud & AWS Integration — Deploy to Vespa's managed cloud or configure self-hosted clusters on AWS; automated scaling and zero-downtime upgrades are supported in both modes.
