DevOps Automation

DevOps Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

A Battle-Tested Guide: Migrating Grafana from SQLite to PostgreSQL on Kubernetes

By Hiren Dave posted 8 days ago

  

The Real-World Guide: Migrating Grafana from SQLite to PostgreSQL on Kubernetes

Upgrading your Grafana backend from the default SQLite to a production-grade PostgreSQL database is a critical step for scalability, reliability, and high availability. While the concept is simple, the journey on a live Kubernetes environment is often fraught with subtle pitfalls—from data migration woes to cryptic database errors.

This guide provides a complete, battle-tested walkthrough for migrating a Grafana 8 instance. We won't just show you the ideal path; we'll show you how to do it safely by cloning your production environment first, and then we'll dive deep into the real-world errors you're likely to face and exactly how to solve them.

   

Why Migrate from SQLite to PostgreSQL?

If you're running a serious Grafana setup, the default SQLite database is a bottleneck. Migrating to PostgreSQL unlocks:

  • High Availability (HA): Run multiple Grafana replicas simultaneously against a shared database.
  • Scalability: Effortlessly handle thousands of users, dashboards, and alert rules.
  • Robust Backups: Integrate your Grafana data into standard, reliable database backup and recovery workflows.
  • Performance: Eliminate the concurrency limitations of a single-file database.'

    

The Game Plan: A Four-Phase Approach

We'll tackle this migration in four distinct phases to ensure safety and success.

  1. Phase 1: Clone Production. We'll create a perfect, isolated clone of your Grafana 8 (SQLite) instance in a new namespace. This is our risk-free sandbox.
  2. Phase 2: The pgloader Migration. We will deploy a new PostgreSQL database and use the powerful pgloader tool to migrate the data from the cloned SQLite database.
  3. Phase 3: The Real-World Strikes Back. We'll confront and solve the inevitable schema mismatch errors that trip up most migrations.
  4. Phase 4: Finalize and Clean Up. We'll configure the cloned Grafana to be fully stateless and ready for a true HA deployment.

     


Phase 1: Clone Your Production Grafana (The Safe Way)

Never operate on a live system. Our first step is to replicate the production environment, including its data, in a new, isolated namespace.

Let's assume your production Grafana runs in the default namespace, and we will create our clone in a new namespace called grafana-migration.

    

1. Prepare the New Namespace and Manifests

First, get the YAML definitions for all your existing production components.

# Create a dedicated directory
mkdir grafana-clone && cd grafana-clone

# Backup all relevant production components from the 'default' namespace
kubectl get deployment grafana -n default -o yaml > deployment.yaml
kubectl get pvc grafana-db-volume -n default -o yaml > grafana-pvc.yaml
kubectl get cm grafana-config -n default -o yaml > grafana-config.yaml
# ... and so on for services, secrets, etc.

Now, create the new namespace and update your backup files to use it.

export NEW_NAMESPACE="grafana-migration"
kubectl create namespace $NEW_NAMESPACE

# Use sed to replace 'namespace: default' with our new namespace in all YAML files
for file in *.yaml; do
  sed -i.bak "s/namespace: default/namespace: $NEW_NAMESPACE/g" "$file"
done

    

2. Copy the SQLite Database (grafana.db)

This is the most critical part of the cloning process. We need to copy the grafana.db file from the production PVC to a new PVC for our clone. We'll use a temporary "utility" pod that mounts both volumes.

  1. Deploy the New PVC: Apply the modified grafana-pvc.yaml to create an empty volume for the clone.

    # Make sure the PVC name in grafana-pvc.yaml is unique, e.g., 'grafana-db-clone-volume'
    # Then apply it:
    kubectl apply -f grafana-pvc.yaml -n $NEW_NAMESPACE
    
  2. Create a Utility Pod: Create a file named utility-pod.yaml with the following content. This pod mounts the original PVC (read-only) and the new clone PVC (read-write).

    apiVersion: v1
    kind: Pod
    metadata:
      name: grafana-copier-pod
      namespace: grafana-migration
    spec:
      containers:
      - name: utility
        image: busybox
        command: ["/bin/sh", "-c", "sleep 3600"]
        volumeMounts:
        - name: source-grafana-data
          mountPath: /source
        - name: dest-grafana-data
          mountPath: /dest
      volumes:
      - name: source-grafana-data
        persistentVolumeClaim:
          claimName: grafana-db-volume # The NAME of your ORIGINAL PVC in the 'default' namespace
          readOnly: true
      - name: dest-grafana-data
        persistentVolumeClaim:
          claimName: grafana-db-clone-volume # The NAME of your NEW PVC
    
  3. Deploy the Pod and Copy the Data:

    kubectl apply -f utility-pod.yaml -n $NEW_NAMESPACE
    
    # Wait for the pod to be running
    kubectl wait --for=condition=ready pod/grafana-copier-pod -n $NEW_NAMESPACE
    
    # Exec into the pod and copy the data
    kubectl exec -n $NEW_NAMESPACE grafana-copier-pod -- cp -a /source/. /dest/
    
    # Clean up the utility pod
    kubectl delete pod grafana-copier-pod -n $NEW_NAMESPACE
    

    

3. Deploy and Verify the Cloned Grafana Instance

Now, deploy the cloned Grafana using your modified manifests. Make sure deployment.yaml points to the new grafana-db-clone-volume PVC.

kubectl apply -f grafana-config.yaml -n $NEW_NAMESPACE
kubectl apply -f deployment.yaml -n $NEW_NAMESPACE
kubectl apply -f service.yaml -n $NEW_NAMESPACE

Access the new Grafana instance via its service IP or LoadBalancer. Log in and confirm it's a perfect replica of production. You now have a safe environment for the real migration.

     


Phase 2: The pgloader Migration

With our cloned Grafana 8 running on SQLite, we can now begin the migration to PostgreSQL. We'll use pgloader, a powerful tool for this exact purpose.

    

1. Deploy a New PostgreSQL Instance

First, deploy a new Postgres database in your grafana-migration namespace. You can use a simple StatefulSet for this.

postgres-statefulset.yaml:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-ss
  namespace: grafana-migration
spec:
  serviceName: postgres-svc
  replicas: 1
  selector:
    matchLabels:
      app: postgres-ss
  template:
    metadata:
      labels:
        app: postgres-ss
    spec:
      containers:
      - name: postgres
        image: postgres:14
        env:
        - name: POSTGRES_DB
          value: "grafana"
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Deploy the StatefulSet along with a corresponding Secret and Service.

    

2. The pgloader Workflow

This process ensures we get a perfect schema and data transfer.

  1. Stop the Cloned Grafana Server:

    kubectl scale deployment grafana --replicas=0 -n $NEW_NAMESPACE
    
  2. Get the SQLite Database File: We need the grafana.db file from our cloned PVC.

    # Find the name of the Grafana pod (even if it's terminated, its PVC is still there)
    # Use 'kubectl get pvc -n $NEW_NAMESPACE' to find the volume name.
    # We will assume we can access this file at a known path for pgloader.
    # For a local test, you can 'kubectl cp' the file to your machine.
    kubectl cp $NEW_NAMESPACE/grafana-pod-name:/var/lib/grafana/grafana.db ./grafana.db
    
  3. Prepare the Postgres Schema: This is a crucial step. We let Grafana create the schema first, then we use that as a perfect template.

    • Temporarily modify the Grafana deployment YAML to point to the new Postgres DB.
    • Scale Grafana up: kubectl scale deployment grafana --replicas=1 -n $NEW_NAMESPACE.
    • Wait for the pod to start and then crash (it will). This is enough to create the initial tables.
    • Dump the schema: kubectl exec -n $NEW_NAMESPACE postgres-ss-0 -- pg_dump --schema-only -U postgres grafana > grafana-schema.sql.
    • Scale Grafana back down: kubectl scale deployment grafana --replicas=0 -n $NEW_NAMESPACE.
  4. Wipe and Recreate the Postgres Database:

    # Exec into the pod
    kubectl exec -n $NEW_NAMESPACE -it postgres-ss-0 -- bash
    
    # Inside the pod:
    dropdb -U postgres grafana
    createdb -U postgres grafana
    exit
    
  5. Restore the Clean Schema:

    cat grafana-schema.sql | kubectl exec -n $NEW_NAMESPACE -i postgres-ss-0 -- psql -U postgres -d grafana
    
  6. Migrate Data with pgloader: Create a file named main.load on your local machine:

    LOAD DATABASE
      FROM sqlite:///path/to/your/grafana.db
      INTO postgresql://postgres:YOUR_PASSWORD@localhost:5432/grafana
    WITH data only, reset sequences
    SET work_mem to '16MB', maintenance_work_mem to '512 MB';
    

    To run this, we'll port-forward the Postgres service and run pgloader locally.

    # In one terminal, run port-forward
    kubectl port-forward svc/postgres-svc 5432:5432 -n $NEW_NAMESPACE
    
    # In another terminal, run pgloader
    pgloader main.load
    

    Wait for the migration to complete.

  7. Start Grafana and Verify:

    • Ensure your Grafana deployment permanently points to the Postgres service DNS (postgres-svc.grafana-migration.svc.cluster.local:5432).
    • Scale the deployment back up: kubectl scale deployment grafana --replicas=1 -n $NEW_NAMESPACE.

At this point, you expect victory. But this is where the real-world issues begin.

     


Phase 3: The Real-World Strikes Back - Troubleshooting the BIGINT Error

Even after following the pgloader process, we were hit with the dreaded invalid input syntax for type bigint: "true" error in our Grafana logs.

The Diagnosis: The schema dumped by Grafana 8 for a fresh Postgres install still defined boolean-like columns (is_admin, is_disabled, etc.) as BIGINT. The pgloader process correctly loaded 0s and 1s into these columns. But the Grafana application code itself, when performing updates, was trying to send the keyword true, causing PostgreSQL to reject the query.

The Definitive Solution: Fix the Schema with ALTER TABLE The pgloader method got us 90% of the way there, but we had to manually correct the schema to match what the application code expected.

Connect to your Postgres pod again (kubectl exec ...) and run the full set of schema correction commands.

-- Fix the 'alert_notification' table
ALTER TABLE alert_notification ALTER COLUMN is_default DROP DEFAULT;
ALTER TABLE alert_notification ALTER COLUMN is_default TYPE BOOLEAN USING (is_default::integer = 1);
ALTER TABLE alert_notification ALTER COLUMN is_default SET DEFAULT false;
-- ... (repeat for send_reminder and disable_resolve_message) ...

-- Fix the 'user_auth_token' table
ALTER TABLE user_auth_token ALTER COLUMN auth_token_seen TYPE BOOLEAN USING (auth_token_seen::integer = 1);

-- Fix the 'user' table
ALTER TABLE "user" ALTER COLUMN is_disabled DROP DEFAULT;
ALTER TABLE "user" ALTER COLUMN is_disabled TYPE BOOLEAN USING (is_disabled::integer = 1);
ALTER TABLE "user" ALTER COLUMN is_disabled SET DEFAULT false;

ALTER TABLE "user" ALTER COLUMN is_admin TYPE BOOLEAN USING (is_admin::integer = 1);
ALTER TABLE "user" ALTER COLUMN is_admin SET DEFAULT false;

ALTER TABLE "user" ALTER COLUMN email_verified TYPE BOOLEAN USING (email_verified::integer = 1);
ALTER TABLE "user" ALTER COLUMN email_verified SET DEFAULT false;

After running these commands and restarting the Grafana pod one last time, the errors vanished. Grafana started successfully against a fully migrated and schema-corrected PostgreSQL database.

     


Phase 4: Final Cleanup

Your Grafana is now running on Postgres, but the deployment YAML still has a mount for the old SQLite PVC. Let's remove it to make our Grafana deployment truly stateless.

  1. Edit the Deployment:

    kubectl edit deployment grafana -n $NEW_NAMESPACE
    
  2. Remove the Volume Mount: Delete the entry for /var/lib/grafana/ from the volumeMounts section.

  3. Remove the Volume: Delete the corresponding grafana-database volume from the volumes section.

  4. Save and Exit. Kubernetes will roll out a new pod without the PVC attached.

  5. Delete the Obsolete PVC:

    kubectl delete pvc grafana-db-clone-volume -n $NEW_NAMESPACE
    

You now have a clean, scalable Grafana instance ready for high availability. You can safely scale the deployment to 2 or more replicas.

     


Bonus: Upgrading the Clone to Grafana 9

With our clone running, we could safely test the upgrade to Grafana 9.

1. Migrate to Unified Alerting (On Grafana 8)

Grafana 9 removes legacy alerting. You must migrate your alerts before upgrading.

  1. Enable the Feature Toggle: Edit your grafana-config ConfigMap and add the ngalert feature toggle to grafana.ini.
    [feature_toggles]
    enable = ngalert
    
  2. Restart Grafana: kubectl rollout restart deployment/grafana -n grafana-test-clone
  3. Run the Migration Tool: Log in to the Grafana 8 UI, go to the Alerting page, and use the now-visible migration tool to convert your legacy alerts to Grafana-managed alerts.

2. Perform the Upgrade

This is the easy part. Simply edit the deployment and change the image tag.

kubectl edit deployment/grafana -n grafana-test-clone

Change this line:

# FROM:
image: grafana/grafana:8.5.26
# TO:
image: grafana/grafana:9.5.21

Save, and Kubernetes will roll out Grafana 9.

       


Conclusion

Migrating Grafana on Kubernetes is a prime example of where theory meets reality. While tools like pgloader are invaluable, they don't always account for application-level expectations or subtle schema differences. By adopting a safe cloning strategy and being prepared to dive into the database to correct the schema, you can navigate the migration confidently. The result is a robust, production-ready monitoring platform that can grow with your needs.

     


References

    0 comments
    29 views

    Permalink