In modern AI applications, understanding how much you’re spending on large language model (LLM) calls — and why— is no longer optional. Developers and platform teams alike often struggle to answer questions such as:
-
Which models are costing us the most?
-
How much does each team or feature contribute to the bill?
-
Can we centralize logs and costs from multiple providers?
LiteLLM — an open-source LLM gateway — paired with a PostgreSQL backend and its built-in UI provides an elegant self-hosted solution to answer these questions with clarity and control.
What Is LiteLLM and Why It Matters
LiteLLM acts as a proxy between your applications and multiple LLM providers like OpenAI, Anthropic, Azure, and others. It standardizes API calls, offers unified spend tracking, and supports advanced features such as budgets, rate limits, and tagging — all without needing separate tooling for each provider.
The benefits include:
-
Centralized visibility across providers
-
Automatic cost attribution per API request
-
Governance with budgets and rate limits
-
Built-in UI for exploring logs, token usage, and spend patterns
This approach means you no longer have to stitch together custom dashboards or query individual logs from every model service you use.
Step-by-Step Setup
1. Provision a PostgreSQL Database
First, deploy a PostgreSQL instance — whether locally, in Docker, or via a hosted service. This database will store all:
You can use tools like pgAdmin alongside Postgres to inspect or query this data visually.
2. Configure LiteLLM with PostgreSQL
Create a configuration file (e.g., litellm_config.yaml) where you define:
-
Your database connection string
-
LLM models you want to proxy
-
The master API key your clients will use
Example snippet:
With this in place, LiteLLM will log every request into PostgreSQL so that spend and usage can be analyzed later.
3. Launch LiteLLM Proxy
Run the LiteLLM process, pointing it to your config file. If you’re using Docker, mount the config and expose the proxy port — typically 4000 — so your applications can connect:
Once running, all LLM requests sent through this proxy will be logged into PostgreSQL with usage details and cost info.
4. Send Test Requests
You can then send requests through LiteLLM using your favorite client — whether it’s Python, curl, or frameworks like LangChain:
Every request will now be reflected in PostgreSQL, capturing token counts, costs, model used, and any metadata you attach — like project tags or user IDs.
5. Explore Spend with the Built-In Dashboard
LiteLLM comes with a helpful web interface accessible at:
In this UI, you can:
-
View token usage and cost over time
-
Break down spend by model, user, tag, or API key
-
Spot anomalies or cost spikes quickly
-
Drill into individual request logs
This dashboard makes it easy for platform engineers or cost accountants to visualize exactly what’s happening in their AI stack without writing complex SQL queries.
Making It Even More Powerful
Once live, you can extend the setup further:
- Attach tags to your requests to categorize spend by team or feature.
- Set budgets or rate limits for specific tags so departments stay within allocated spend.
- Integrate telemetry systems (e.g., PostHog, Datadog) for deeper analytics across performance and cost.
Tracking LLM usage and cost doesn’t have to be ad-hoc or opaque. By routing all AI calls through LiteLLM with PostgreSQL storage and its UI, you build a transparent, extensible, and governable cost monitoring pipeline.
This approach equips teams with the data they need to optimize spending, analyze usage patterns, and build more cost-efficient AI systems — all on a self-hosted stack they control.