Technical Setup & API Usage

Serverless Endpoint (Handler Pattern)

Write a Python handler function and push your Docker container to RunPod Serverless:

import runpod

def handler(job):
    job_input = job["input"]
    # Your inference logic here
    result = my_model.predict(job_input["prompt"])
    return {"output": result}

runpod.serverless.start({"handler": handler})

Key Deployment Steps

Containerize your model with any framework (PyTorch, TensorFlow, JAX, etc.) using a standard Dockerfile.
Push the image to a container registry (Docker Hub, GHCR, etc.) and reference it in the RunPod console or via the API.
Configure autoscaling — set min/max worker counts, concurrency, and execution timeout per endpoint.
Cold start optimization — RunPod's FlashBoot technology reduces cold start times to sub-200ms, eliminating the need for keep-warm hacks.

REST API

All Serverless endpoints expose a standard REST interface:

curl -X POST https://api.runpod.ai/v2/{endpoint_id}/run \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": {"prompt": "Hello, world!"}}'

Key Parameters

Parameter	Description
`RUNPOD_API_KEY`	Auth token from your RunPod dashboard
`endpoint_id`	Unique ID for your Serverless endpoint
`input`	JSON payload passed to your handler
`min_workers` / `max_workers`	Autoscaling bounds