IBM Guardium

IBM Guardium

Join this online user group to communicate across Security product users and IBM experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Long Term Retention in Guardium Data Protection 12.2.1 - low cost way to store and query years of cold data

By POLLY LAU posted Mon February 02, 2026 06:59 PM

  

A simpler, low‑cost way to store and query years of cold data

The use case

As someone who works with Guardium clients around the world, I hear this challenge from time-to-time: “We need to keep years of audit data for compliance, but we rarely access it. And when we do need it, the process of digging it out can be tedious.”

If you are a Guardium user, this probably sounds familiar.  Maybe Legal has asked for logs from four or five years ago, or Compliance needs historical visibility for a case that resurfaced long after the fact.

With Guardium Data Protection 12.2.1, we introduced a new approach to solve exactly that problem — in a way that is easier to understand and far easier to retrieve.

We call it Long‑Term Retention (LTR).

LTR gives you a modern, cloud‑friendly way to store cold historical data in your own S3‑compatible storage, while still being able to query it on demand — no restores, no archive recovery. Think of it as a lightweight data lakehouse layer purpose‑built for your Guardium audit data: low‑cost storage, modern formats, and online querying when you need it.

How it actually works

The architecture introduces two new pieces:

1.    Your S3 storage

This is your infrastructure — AWS S3, MinIO, whatever you choose. This is where the actual datamart files live, converted to efficient Parquet format for fast querying.

2.    New LTR appliance

This is a new Guardium unit type that sits between your Central Manager and the S3 storage. Think of it as the query engine that makes sense of all that data sitting in S3.  It houses:

-       Trino Query Engine (the coordinator and workers that actually run your cold queries)

-       Hive Metastore Catalog (keeps track of where all your data lives in S3)

Think of this as your query powerhouse that knows how to make sense of years of data sitting in object storage.

Architecture: How Cold Data Flows Through LTR

When your Guardium collectors capture database activity, here is what happens:

  1. Collectors upload datamart files to S3 — The data gets converted to Parquet files (format optimized for long-term data queries) and lands in your S3 bucket.
  2. Collectors notify the Central Manager —The CM gets queue record about this new cold storage ingestion.
  3. CM updates the LTR metastore — The Central Manager tells the LTR appliance about the new datamart file, and the Hive Metastore gets updated with the metadata. Now the LTR unit knows this data exists and where to find it.

It's a continuous flow: data lands in S3, metadata gets updated, and everything stays indexed and ready to query.

Reporting on Cold Data

Users can access Data Lake Reports page on Central Manager with:

  • Six predefined long‑term reports based on most common use cases
  • Build your own custom report option
  • Email delivery with CSV download link (no Guardium Login required)

Cold data reporting experience stays within the CM, just like everything else in GDP.

You initiate a data lake report from the page, enter your time range, add recipients.  Results are shown when report finishes running.  If your internal teams need audit data but are not Guardium users, they get an email with a direct download link.  Simple.


Sample screenshot of the Data Lake Reports page:

Data Lake Reports page

 

With Data Lake reports, you are not only limited to the 6 predefined reports.  A Create custom report option allows you to customize from a template and adjust your own query.

image

What happens underneath the covers? – The technical flow

When you need to run a report on historical data, here is what happens:

  1. You kick off a report in the CM UI — Pick your predefined report or create a custom one, set your time range, add recipients.
  2. CM hands off to the LTR appliance — The Central Manager tells the LTR unit: "Run this query for me."
  3. Trino does its thing — The Trino coordinator (on LTR unit) plans the query (figures out which data to scan, how to optimize it), then the Trino worker(s) execute it across your S3 data. This is distributed query processing — multiple workers crunching through your data in parallel.
  4. Results land in an Iceberg table — The query results get stored in a new output table (using Apache Iceberg format for efficient data management), and a CSV export is generated.
  5. CM displays the results — The Guardium UI queries that output table to show you the results. Recipients get an email with a download link for the CSV — no Guardium login required.

The whole flow is transparent to the user. You deal with CM as usual. Reports are shown on CM / sent via email when ready.

image

Design Considerations

  • You control the S3 layer — HA, DR, and performance depend on your provider.
  • Storage tier matters — colder tiers save money but slow down queries.
  • Keep LTR close to the S3 bucket — same region avoids latency and egress charges.

The Bottom Line

Guardium 12.2.1 LTR gives you a clean, cloud‑friendly way to retain years of cold historical data while still being able to query it on demand. You can:

  • Use your own S3‑compatible storage
  • Store multi‑year data at low cost
  • Run long‑term queries directly on Parquet files
  • Share results easily via emailed CSV links
  • Maintain control over retention and performance

It’s a modern, flexible replacement for classic archive workflows — perfect for compliance‑heavy environments or anyone modernizing their Guardium data strategy.

No extra Guardium license needed for LTR

If you're already licensed for Guardium Data Protection, the great news is that LTR is included — no additional Guardium software license required. Just upgrade to 12.2.1, deploy your LTR appliance(s), connect your S3 cold‑data storage, and you’re ready to start running long‑term queries.

If managing on‑premise S3 and an LTR appliance is not ideal for you, there is another option:

Guardium Data Security Center (GDSC) DDR SaaS provides long‑term reporting as a fully hosted cloud service — No LTR nodes, no S3 admin, no on-premise infrastructure.

You get the same long‑term visibility into historical activity, but without owning any of the storage or compute behind it. Just send your data and let the service handle the rest.


#community-stories2

0 comments
69 views

Permalink