Building a Local LLM Playground with OpenWebUI and LiteLLM

I had multiple frustrations with ChatGPT and Claude:

No good way to save and reuse prompts
$20/month per service for occasional use
Chat history scattered across services
No way to compare model responses side-by-side

No single pain was unbearable, but together they pushed me to build my own setup.

Why I Care About This

I experiment with different models—open source and proprietary—to understand their strengths and weaknesses. But this experimentation has a high mental and monetary cost.

What I Liked About Claude/ChatGPT

The subscription services have powerful features:

Chat history that persists
Memory that references past conversations
Multi-modal inputs (PDFs, images, etc.)
Multi-modal outputs (images, voice)
Tool support (web search, code execution)

These are great. But at $20/month per service, the costs add up fast—and scattered chat history across services undermines the memory features I actually wanted.

What I Tried Before

I initially wrote custom code to interact with model APIs directly. Problems:

No nice UI/chat interface
Lots of work to reproduce features ChatGPT/Claude already have
Not shareable with non-technical friends or family

Not sustainable. Time to explore what’s already out there.

What I Was Looking For

I wanted a single tool that could:

Unified chat interface with the features I loved from ChatGPT/Claude
Support for multiple models via API keys (including local models)
Built-in prompt management
Tool support (web search, custom tools)
Agent workflows

I expected I’d need to compromise—maybe sacrifice some features or build custom integrations. I hoped I wouldn’t have to build everything from scratch.

What I Actually Found

No single tool did everything. I needed to compose two:

OpenWebUI for the interface (includes chat history, prompt management, memory via plugins)
LiteLLM as a multi-model API gateway (translates between different provider APIs)

This got me 4 out of 5 requirements. The missing piece: agent support. OpenWebUI has pipelines that could theoretically support agentic workflows, but it’s not in their roadmap yet. I’ll need to find another solution for that.

The Actual Setup

The architecture is straightforward:

OpenWebUI (Docker container on a $5 VPS)
LiteLLM (API gateway, also dockerized)
Tailscale (VPN to secure access—only authorized devices can reach it)
API keys for Claude, GPT-4, and Ollama for local models

OpenWebUI talks to LiteLLM, which translates requests to the appropriate API format for each provider (Claude, OpenAI, Ollama).

Setup time: ~2 hours, mostly reading documentation and generating API keys.

Monthly costs:

$5 VPS (Linode/Hetzner)
$5-15 API usage (depends on Claude/GPT-4 usage)
Total: ~$10-20/month vs. $40/month for subscriptions

The setup was surprisingly smooth—no gotchas, no debugging sessions, just straightforward configuration. Rare for self-hosted projects.

A screenshot showing Side-By-Side Comparison of three model responses — Side-By-Side Comparison

Side-by-side model comparison in OpenWebUI - asking the same question to Claude, GPT-3.5, and two local models - gpt-oss and gemma3

What I Learned

Terminology mismatch: OpenWebUI’s concept of “model” includes the base LLM + custom system prompt. Coming from thinking of “model” as just the base LLM (GPT-4, Claude, etc.), this took me a minute to understand.
Needed two tools, not one: I expected one tool would handle everything. Reality: OpenWebUI handles the UI, LiteLLM handles the multi-API gateway. Neither works well alone.
The real value isn’t cost: I built this to save money, but the actual benefits are:
- Unified prompt library across all models
- Side-by-side comparison in one interface
- Complete chat history in one searchable place
- Privacy and data control
Bonus discovery: This setup naturally supports multiple users (family sharing) without additional costs or compromising privacy. Haven’t implemented this yet, but it’s architecturally possible—something I didn’t consider when I started.

Should You Do This?

Worth it if you:

Are comfortable with Docker and basic VPS management
Use multiple LLMs regularly (not just occasionally)
Care about privacy and want control over your data
Want all your chat history in one searchable place
Love experimenting with the latest AI models (often available via API before official chat releases)

Skip this if you:

Use ChatGPT casually (few times a week)
Don’t want to maintain infrastructure
Are happy jumping between services to find chats
Just need something that works without caring about the details

My honest take:

The 2-hour setup was worth it. I now have one place to access and compare all the LLM models I use. The savings ($40 → $10-20/month) are nice, but the real win is unified chat history. Being able to search across every conversation—regardless of which model I used—unlocks the memory features in a way scattered chat histories never could.

If I did this again? I wouldn’t change a thing. The process was painless, which is rare for self-hosted projects.

What’s Next

This is the first post in a series on my LLM playground setup. I’m planning to write about:

Cost analysis with real API usage data
My prompt library workflow
How I compare models side-by-side
Possibly: family sharing setup and local models deep-dive

No fixed schedule—I’ll write these as I have time and as they become useful to document. If you’re building something similar or have questions, reach out.

Get Updates