Cartesia
Free tierArchitecting AI that learns and interacts like humans — ultra-low latency voice AI
Free tier available·All audiences·Powered by Cartesia·API available
Key strengths
Ultra-low latency real-time voice models built on State Space Models (SSMs)Full-stack voice platform: STT (Ink), TTS (Sonic), and voice agents (Line)Flexible deployment: cloud, on-premise, and on-deviceEnterprise-grade compliance with in-region data residency supportPioneer of Mamba & H-Net architectures for efficient large-scale inference
Free tier + paid plans
Self-hostable
No ratings yet
Developer Documentation
API Access
Cartesia provides a REST/streaming API for Sonic (TTS) and Ink (STT). Authenticate with your API key and start making requests immediately:
curl -X POST https://api.cartesia.ai/tts/stream \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, how can I help you?", "voice_id": "sonic-3.5", "output_format": "pcm_16000"}'
SDKs & Integration
- Official SDKs available for Python, Node.js, and other major languages.
- Line (voice agents) integrates with existing telephony and enterprise systems via standard interfaces.
- Supports streaming audio output for real-time, low-latency pipelines.
Deployment Options
- Cloud: Deploy via regional API endpoints (in-region processing for data residency compliance).
- On-premise: Deploy in your own VPC or customer environment for full infrastructure control.
- On-device: Edge deployment on mobile, PC, and robotics with fully private, offline inference.
Key Parameters
voice_id: Select from Sonic model versions (e.g.,sonic-3.5)output_format: Audio encoding, e.g.,pcm_16000,mp3language: Supported multi-language input/outputstream: Boolean flag for real-time streaming responses
