Banana
GPU inference hosting for AI teams who ship fast and scale faster
Paid·Technical·API available
Key strengths
Automatic GPU autoscaling with pass-through, zero-markup compute pricingFull DevOps platform: GitHub integration, CI/CD, CLI, rolling deploysBuilt-in observability with real-time traffic, latency, and error monitoringPowered by Potassium, an open-source HTTP framework for writing inference backendsAutomation API with SDKs and CLI for programmatic deployment management
Paid only · from $1200 USD/mo
San Francisco, USA
Founded 2021
No ratings yet
Technical Setup & API Usage
Potassium Framework (Python)
Banana inference servers are written using Potassium, an open-source Python HTTP framework:
from potassium import Potassium, Request, Response
from transformers import pipeline
app = Potassium("my_app")
@app.init
def init():
model = pipeline('fill-mask', model='bert-base-uncased', device=0)
return {"model": model}
@app.handler("/")
def handler(context, request):
model = context.get("model")
prompt = request.json.get("prompt")
outputs = model(prompt)
return Response(status=200, json={"outputs": outputs[0]})
app.serve()
Key Concepts
@app.init: Runs once at cold start; load models and return a context dict.@app.handler("/route"): Handles incoming HTTP requests; reads fromcontextandrequest.json.- Autoscaling: Banana monitors GPU demand and scales replicas automatically — no manual configuration required.
- Automation API: Use the REST API or SDKs to trigger deployments, manage projects, and query endpoint metadata programmatically.
- CLI: Deploy and manage endpoints from the terminal; integrates into CI/CD pipelines.
- Custom GPU Types: Team and Enterprise plans support custom GPU configurations for specialized workloads.
