How to Self-Host LiteLLM

A production-ready LiteLLM proxy in 15 minutes: Docker Compose with PostgreSQL for spend tracking, virtual team API keys with budgets, multi-provider routing, and rate limits. Every config is copy-pasteable — verify against the official docs for your LiteLLM version.

By the LLM Academy team · Reviewed June 2026 · Tested with LiteLLM v1.x proxy and PostgreSQL 16

Why self-host LiteLLM?

Self-hosting LiteLLM gives you a single OpenAI-compatible endpoint that routes to 100+ providers — OpenAI, Anthropic, Google, AWS Bedrock, Azure, and your own self-hosted vLLM replicas — with zero per-request markup. You pay providers directly, and LiteLLM tracks every cent. For teams spending meaningful money on LLM APIs, the control and cost transparency justify the modest operational overhead of running one Docker container and a Postgres database.

When to self-host vs. use a managed gateway

If you want zero infrastructure, OpenRouter or Portkey's SaaS are faster to start. If you want cost control, data residency, and no markup, self-hosted LiteLLM wins. See our gateway comparison for the full tradeoff.

Prerequisites

You need a host with Docker and Docker Compose installed — any cloud VM or on-prem server works. You need provider API keys for whichever models you want to route to (at minimum, an OpenAI key). For production, allocate a PostgreSQL database (the Docker Compose below includes one) so LiteLLM can persist spend tracking, team keys, and usage logs across restarts.

Step 1 — Create config.yaml

The config.yaml is the heart of a LiteLLM deployment. It declares which models are available, which provider keys to use, and the master key for admin access. This minimal config exposes GPT-4o and GPT-4o-mini through OpenAI, and a self-hosted Llama through a vLLM endpoint — LiteLLM presents all three behind one unified API.

# config.yaml
model_list:
  - model_name: gpt-4o            # the name your app calls
    litellm_params:
      model: openai/gpt-4o        # the actual provider model
      api_key: os.environ/OPENAI_API_KEY

  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

  - model_name: llama-self-hosted
    litellm_params:
      model: openai/meta-llama/Llama-3.1-70B-Instruct
      api_base: http://vllm:8000   # your self-hosted vLLM
      api_key: os.environ/VLLM_KEY

router_settings:
  routing_strategy: usage-based-routing   # balance load across replicas

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY   # admin access
  database_url: os.environ/DATABASE_URL       # Postgres for spend tracking

The os.environ/ prefix pulls values from environment variables, keeping secrets out of the config file. The router_settings block enables load balancing when you list multiple deployments under one model_name.

Step 2 — Run with Docker Compose

The official Docker quick start bundles LiteLLM with PostgreSQL. This docker-compose.yml brings up both, mounts your config, and wires the database connection. The STORE_MODEL_IN_DB flag tells LiteLLM to persist team keys and spend in Postgres rather than in-memory (which would reset on restart).

# docker-compose.yml
version: "3.9"
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"           # proxy endpoint
    volumes:
      - ./config.yaml:/app/config.yaml
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - VLLM_KEY=${VLLM_KEY}
      - LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY}
      - DATABASE_URL=postgresql://litellm:litellm@db:5432/litellm
      - STORE_MODEL_IN_DB=True
    command: --config /app/config.yaml --port 4000 --num_workers 4
    depends_on:
      - db

  db:
    image: postgres:16
    environment:
      - POSTGRES_USER=litellm
      - POSTGRES_PASSWORD=litellm
      - POSTGRES_DB=litellm
    volumes:
      - litellm_db:/var/lib/postgresql/data
    ports:
      - "5432:5432"

volumes:
  litellm_db:

Bring it up with docker-compose up -d. The TECHSY 2026 setup guide notes that this Compose file — with PostgreSQL, virtual team keys, budgets, cost tracking, and rate limits — is the production baseline; the in-memory quick start is only for local testing. Verify the proxy is healthy at http://localhost:4000/health.

[IMAGE: Docker Compose topology — app → LiteLLM proxy → Postgres (spend DB) + providers]

Step 3 — Create virtual team keys with budgets

Instead of handing every team your raw OpenAI key, mint virtual keys per team with their own budgets and rate limits. LiteLLM tracks spend against each key and rejects requests once the budget is exhausted. Use the master key to authenticate the admin call.

# Create a team key capped at $50/month
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "marketing-team",
    "max_budget": 50.00,
    "budget_duration": "1mo",
    "rpm_limit": 100,
    "models": ["gpt-4o-mini", "gpt-4o"]
  }'

# Response: { "key": "sk-litellm-abc123...", ... }

The returned sk-litellm-... key is what the marketing team uses in their app. They can only call the models you allow, they are capped at $50/month, and rate-limited to 100 requests/minute. Every call is logged to Postgres with token counts and cost, so you get per-team spend dashboards for free — this is the budget governance lever from our cost optimization playbook.

Step 4 — Call the proxy from your app

LiteLLM exposes an OpenAI-compatible API, so any OpenAI SDK client works unchanged. Point base_url at your proxy and pass the virtual key.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000",
    api_key="sk-litellm-abc123...",   # the virtual team key
)

response = client.chat.completions.create(
    model="gpt-4o-mini",              # routes via config.yaml
    messages=[{"role": "user", "content": "Summarize this article."}],
)

Behind the scenes, LiteLLM translates this to the provider's native format, tracks the token cost against the marketing team's budget, applies the rate limit, and logs the request. Your app code never changes when you swap providers or add fallbacks — that all lives in config.yaml.

Production checklist

Health checks: wire /health into your load balancer; LiteLLM reports unhealthy if no models are reachable.
Secrets: keep all keys in environment variables or a secret manager — never commit them in config.yaml.
Backups: the Postgres database holds spend history and team keys — back it up regularly.
Fallbacks: configure fallback chains in config.yaml so a provider outage fails over automatically.
Observability: LiteLLM logs support export to Langfuse or LangSmith for tracing — see our observability cluster (coming).
Scaling: run multiple LiteLLM replicas behind a load balancer; the DB-backed state makes them stateless.

Common pitfalls

Spend resets on restart means you forgot STORE_MODEL_IN_DB=True or the DATABASE_URL is wrong — team keys and budgets live in Postgres. 429 from providers means your rate limits are tighter than the provider's, or you need fallbacks configured. Cost tracking shows zero usually means the model name in your call does not match a model_name in config.yaml, so LiteLLM cannot resolve pricing.

Related deep dives

LiteLLM vs Portkey vs OpenRouter — confirm LiteLLM is the right gateway choice
LLM Cost Optimization Playbook — the 5 levers LiteLLM enables
Deploy vLLM in Production — pair self-hosted inference with LiteLLM routing

Sources

LiteLLM Documentation, "Docker Quick Start" and "Deploy (Docker, Helm, Terraform)," 2026 (docs.litellm.ai)
TECHSY, "LiteLLM Proxy: 1 API for 100+ LLMs (15-min Setup)," 2026
tanyongsheng, "LiteLLM Proxy for High-Availability LLM Services," 2026
BerriAI/litellm GitHub repository, accessed June 2026

LiteLLM ships frequent releases; config keys and Docker image tags change. Verify against docs.litellm.ai for your installed version before deploying these configs to production.