RunPod logo

RunPod

Free tier

The AI Developer Cloud — experiment, train, fine-tune, deploy, and scale on one platform

Free tier available·Technical·API available

Key strengths

Sub-200ms cold starts with FlashBoot — no warm-up tax30+ GPU SKUs across 31 global regionsAutoscaling from 0 to thousands of workers in under 250msZero idle cost on Serverless endpointsFull AI lifecycle: pods, serverless, and multi-node clusters in one account
Free tier + paid plans
Moorestown, United States
No ratings yet

Technical Setup & API Usage

Serverless Endpoint (Handler Pattern)

Write a Python handler function and push your Docker container to RunPod Serverless:

import runpod

def handler(job):
    job_input = job["input"]
    # Your inference logic here
    result = my_model.predict(job_input["prompt"])
    return {"output": result}

runpod.serverless.start({"handler": handler})

Key Deployment Steps

  • Containerize your model with any framework (PyTorch, TensorFlow, JAX, etc.) using a standard Dockerfile.
  • Push the image to a container registry (Docker Hub, GHCR, etc.) and reference it in the RunPod console or via the API.
  • Configure autoscaling — set min/max worker counts, concurrency, and execution timeout per endpoint.
  • Cold start optimization — RunPod's FlashBoot technology reduces cold start times to sub-200ms, eliminating the need for keep-warm hacks.

REST API

All Serverless endpoints expose a standard REST interface:

curl -X POST https://api.runpod.ai/v2/{endpoint_id}/run \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": {"prompt": "Hello, world!"}}'

Key Parameters

ParameterDescription
RUNPOD_API_KEYAuth token from your RunPod dashboard
endpoint_idUnique ID for your Serverless endpoint
inputJSON payload passed to your handler
min_workers / max_workersAutoscaling bounds