NVIDIA DGX Cloud Lepton
Connect developers to a global network of GPU compute for building and deploying AI
Paid·Technical·Powered by NVIDIA·API available
Key strengths
Global GPU compute network unificationDesigned for AI-native teams and model buildersPowered by NVIDIA DGX-class hardware (Blackwell, Hopper)Supports full ML lifecycle: build, train, deployNo infrastructure management required
Paid only
Santa Clara, USA
No ratings yet
Developer Documentation: NVIDIA DGX Cloud Lepton
Platform Access
Authenticate via your NVIDIA account. API keys and CLI tooling are available to interact with the Lepton platform programmatically.
Key Concepts
- Unified Compute: Lepton aggregates GPU resources across multiple data centers and cloud providers into a single control plane.
- Job Scheduling: Submit training or inference jobs via API or UI; the platform handles GPU allocation and cluster orchestration using NVIDIA Run:ai under the hood.
- GPU Architectures Supported: H100 (Hopper), GB200/GB300 NVL72, Blackwell-class GPUs — all with NVLink and NVSwitch interconnects for multi-GPU workloads.
Example Workflow (CLI)
# Install Lepton CLI
pip install leptonai
# Authenticate
lep login
# Deploy a model as an inference endpoint
lep photon run --name my-model --model hf:meta-llama/Llama-3-8b
# List running deployments
lep deployment list
Key Parameters
--model: Specifies the model source (HuggingFace, custom, etc.)--resource-shape: Select GPU type and count (e.g.,gpu.a10,gpu.h100)--replicas: Number of inference replicas for horizontal scaling
Integrations
Works with NVIDIA AI Enterprise Suite, CUDA-X libraries, Base Command Manager, and NVIDIA Run:ai for orchestration.
