Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Tracking LLM Usage and Cost with LiteLLM + PostgreSQL

By Wendy Munoz posted 10 hours ago

  

In today's AI world, keeping a close eye on how much you're spending on large language model (LLM) usage isn't just a “nice to have” — it's essential. Whether you’re a platform team managing access or a development group optimizing costs, LiteLLM paired with a PostgreSQL backend can give you clear, granular visibility into your usage patterns and spending. In this post, we'll walk through how to set it up and leverage its built-in tools to make data-driven decisions.

What Is LiteLLM, and Why Use It?

LiteLLM is an LLM gateway / proxy that supports 100+ models from various providers (OpenAI, Azure, Anthropic, etc.). LiteLLM+1
It offers:

  • Unified API access across multiple LLM providers

  • Cost tracking (models, users, API keys) LiteLLM+1

  • Budgeting and rate limiting via tags LiteLLM

  • Observability tools and logging integrations

Because LiteLLM is compatible with many LLM providers, you don’t need to build custom spend-tracking for each one: you centralize it.

Setting Up the Infrastructure

1. Deploy PostgreSQL

Start by deploying a PostgreSQL database. You can use Docker + docker-compose (or your preferred setup) and optionally pgAdmin if you want a UI for exploring data. The database will store:

  • Users, teams, organizations

  • Virtual API keys

  • Budget configurations

  • Detailed usage logs for each API request LiteLLM+1

2. Configure LiteLLM to Use Postgres

In your litellm_config.yaml, set up the database URL to point to your Postgres instance. For example:

general_settings: master_key: "YOUR_MASTER_KEY" database_url: "postgresql://user:password@host:5432/litellm_db" model_list: - model_name: "gpt-4" litellm_params: api_key: "…” api_base: "https://api.openai.com/v1" rpm: 10 max_tokens: 400 temperature: 0.7

Make sure that master_key matches what your clients will use to authenticate with the LiteLLM proxy. Medium+1

3. Run the LiteLLM Proxy

Once configured, start LiteLLM (for example, via Docker). It will proxy all your LLM requests, log them to Postgres, and compute cost per call automatically. Medium+1

4. Send Test Requests

You can send test LLM requests via Python (LiteLLM SDK), curl, or LangChain. For example, using Python:

import openai client = openai.OpenAI(api_key="sk-…", base_url="http://your-litellm-host") response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello, LLM!"}], user="user-id-123", extra_body={"metadata": {"tags": ["project:alpha", "team:frontend"]}} ) print(response)

By passing metadata tags in your requests, you can attribute spend to specific teams, projects, or use-cases. LiteLLM

Analyzing Spend & Usage

LiteLLM provides APIs for exploring usage:

  • Daily spend breakdown: You can query /user/daily/activity over a date range to get per-day metrics: spend, prompt tokens, completion tokens, number of requests, and a breakdown by model, provider, or API key. LiteLLM+1

  • User-level totals: Get a summary of total spend for a user and their associated API keys. LiteLLM

  • UI dashboard: LiteLLM includes a built-in web UI (usually at http://<proxy-host>/ui) where you can explore logs, cost by model, user, and tags. Medium

Advanced Options: Tag Budgets & Rate Limits

You can enforce budgets based on tags: for example, limit spending for "marketing" vs "engineering" or "chatbot-research" vs "summarization". LiteLLM
This gives you strong cost governance. If you hit a “soft” budget, you can throttle or block requests, or raise alerts.

Scaling Considerations

If your LiteLLM usage logs grow very large (e.g., 1M+ rows), querying might become slow. LiteLLM
To mitigate:

  • Export logs from Postgres to a data warehouse (S3, GCS, Snowflake, etc.)

  • Perform analysis in a separate analytics system (Redash, Databricks, etc.)

  • Optionally, disable real-time logging in the proxy for long-term production loads

Complementary Tools

  • llm-usage-tracker: A Python library that can monkey-patch LLM calls (OpenAI, LiteLLM, Gemini) to automatically compute token usage and cost. PyPI

  • llm-accounting: A package to track LLM usage, costs, tokens, and more, with support for PostgreSQL or SQLite as backend. PyPI

  • PostHog integration: You can send LiteLLM usage events to PostHog for behavioral analytics, combining cost tracking with user journeys. LiteLLM

Why This Setup Is Powerful

  1. Centralized cost attribution — All LLM calls go through LiteLLM, so you have a single source of truth for spend.

  2. Granular visibility — By using metadata tags, you can break down usage by project, user, team, or feature.

  3. Governance & control — You can enforce budgets, rate limits, or other rules per tag or user.

  4. Open architecture — Because you own the Postgres DB, you can export, aggregate, or analyze data however you want.

  5. Scalability — While LiteLLM's built-in UI is great for smaller workloads, you can scale to big data workflows using standard analytics tools.

If you're building or operating LLM-based systems, cost visibility is not just a back-office concern — it's essential for optimizing performance, managing risk, and scaling responsibly. Using LiteLLM with PostgreSQL gives you a self-hosted, transparent, and flexible foundation for tracking LLM usage and spend. Once it's in place, you can monitor, enforce, and optimize — making your AI infrastructure both powerful and cost-conscious.

0 comments
8 views

Permalink