Ollama logo

Ollama

Free tier

The easiest way to build and run open-source AI models locally

Free tier available·All audiences·Powered by Open-source models (Llama, Mistral, Gemma, etc.)·API available·Open source

Key strengths

Run open-source LLMs fully offline and locallyOne-command install and model management via CLIOpenAI-compatible REST API for easy integrationOptional cloud tier for larger, faster modelsPrivacy-first — data is never used for training
Free tier + paid plans · from $20 USD/mo
San Francisco, United States
Founded 2023
Self-hostable
No ratings yet

Developer Documentation

Installation

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version

Key CLI Commands

ollama pull llama3          # Download a model
ollama run llama3           # Run interactive session
ollama list                 # List installed models
ollama serve                # Start the local API server (default: port 11434)
ollama launch openclaw      # Launch a compatible app

REST API (OpenAI-compatible)

Ollama exposes an HTTP API at http://localhost:11434. It is compatible with the OpenAI Chat Completions API format:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Key Parameters

ParameterDescription
modelModel name (e.g., llama3, mistral, gemma)
messagesChat history array
streamBoolean — enable token streaming
optionsModel options: temperature, num_ctx, top_p, etc.

Hardware Acceleration

  • Apple Silicon: Metal (automatic)
  • NVIDIA: CUDA (automatic if CUDA drivers present)
  • AMD: ROCm
  • Fallback: CPU inference