Banana logo

Banana

GPU inference hosting for AI teams who ship fast and scale faster

Paid·Technical·API available

Key strengths

Automatic GPU autoscaling with pass-through, zero-markup compute pricingFull DevOps platform: GitHub integration, CI/CD, CLI, rolling deploysBuilt-in observability with real-time traffic, latency, and error monitoringPowered by Potassium, an open-source HTTP framework for writing inference backendsAutomation API with SDKs and CLI for programmatic deployment management
Paid only · from $1200 USD/mo
San Francisco, USA
Founded 2021
No ratings yet

Technical Setup & API Usage

Potassium Framework (Python)

Banana inference servers are written using Potassium, an open-source Python HTTP framework:

from potassium import Potassium, Request, Response
from transformers import pipeline

app = Potassium("my_app")

@app.init
def init():
    model = pipeline('fill-mask', model='bert-base-uncased', device=0)
    return {"model": model}

@app.handler("/")
def handler(context, request):
    model = context.get("model")
    prompt = request.json.get("prompt")
    outputs = model(prompt)
    return Response(status=200, json={"outputs": outputs[0]})

app.serve()

Key Concepts

  • @app.init: Runs once at cold start; load models and return a context dict.
  • @app.handler("/route"): Handles incoming HTTP requests; reads from context and request.json.
  • Autoscaling: Banana monitors GPU demand and scales replicas automatically — no manual configuration required.
  • Automation API: Use the REST API or SDKs to trigger deployments, manage projects, and query endpoint metadata programmatically.
  • CLI: Deploy and manage endpoints from the terminal; integrates into CI/CD pipelines.
  • Custom GPU Types: Team and Enterprise plans support custom GPU configurations for specialized workloads.