Modal logo

Modal

Free tier

High-performance AI infrastructure with sub-second cold starts and instant autoscaling

Free tier available·Technical·API available

Key strengths

Sub-second cold starts with instant container boot timesAutoscale from 0 to 1000+ GPUs on demand with no capacity planningPython-native SDK — define infrastructure and logic in a single fileFull support for inference, training, sandboxes, and batch processingSOC2 & HIPAA compliant with battle-tested isolation and data residency controls
Free tier + paid plans · from $30 USD/mo
San Francisco, USA
Founded 2021
No ratings yet
  • LLM inference serving — deploy any HuggingFace or custom model behind a modal.web_endpoint() with token streaming, WebSocket support, and sub-10ms overhead latency via globally distributed compute.
  • Multi-node distributed training — configure gang-scheduled multi-node runs on up to 128 B200s with 3200 Gbps Infiniband using Modal's cluster API in a single Python file.
  • Batch & async inference pipelines — process large-scale embedding generation, re-ranking, or dataset synthesis jobs across thousands of parallel GPU workers with no job orchestration overhead.
  • Sandbox execution for RL rollouts — programmatically instantiate hundreds of thousands of concurrent modal.Sandbox environments for reinforcement learning trajectory collection, keeping GPU inference resources saturated.
  • Parallel hyperparameter sweeps — use .map() or .starmap() to fan out hundreds of training experiments simultaneously, with automatic resource cleanup and per-second billing.
  • Secure agent execution environments — build background or coding agents that run in fully isolated sandboxes with custom images, injected secrets, and controlled network access.