Developer Documentation: NVIDIA DGX Cloud Lepton

Platform Access

Authenticate via your NVIDIA account. API keys and CLI tooling are available to interact with the Lepton platform programmatically.

Key Concepts

Unified Compute: Lepton aggregates GPU resources across multiple data centers and cloud providers into a single control plane.
Job Scheduling: Submit training or inference jobs via API or UI; the platform handles GPU allocation and cluster orchestration using NVIDIA Run:ai under the hood.
GPU Architectures Supported: H100 (Hopper), GB200/GB300 NVL72, Blackwell-class GPUs — all with NVLink and NVSwitch interconnects for multi-GPU workloads.

Example Workflow (CLI)

# Install Lepton CLI
pip install leptonai

# Authenticate
lep login

# Deploy a model as an inference endpoint
lep photon run --name my-model --model hf:meta-llama/Llama-3-8b

# List running deployments
lep deployment list

Key Parameters

--model: Specifies the model source (HuggingFace, custom, etc.)
--resource-shape: Select GPU type and count (e.g., gpu.a10, gpu.h100)
--replicas: Number of inference replicas for horizontal scaling

Integrations

Works with NVIDIA AI Enterprise Suite, CUDA-X libraries, Base Command Manager, and NVIDIA Run:ai for orchestration.