Scheduling High-Volume Database cleanup tasks with Kubernetes
In the 11.0.1.0 release of IBM Verify Identity Access, administrators gained the ability to run High-Volume Database (HVDB) cleanup tasks via API or CLI requests.
This article will demonstrate how you can use this capability along with the Kubernetes CronJob resource definition to schedule HVDB cleanup during particular times of the day, or days of the week, ect.
Container run mode
The configuration container was updated with a new role for running the HVDB cleanup tasks. If the HVDB_CLEANUP_TASKS
environment property is set when the configuration container is bootstrapping, the management interface only listens on localhost
and the given cleanup tasks are run (until the HVDB_CLEANUP_TIMEOUT
is reached, or 1 hour).
Trace is output to the container's log file while the cleanup tasks are run, recording how many rows were removed and how long the SQL commit operation took. Administrators can also use monitoring tools like Instana or Glowroot to collect additional metrics about cleanup task performance.
Defining the cleanup job
The trick to getting this job to run is bootstrapping the configuration container with an existing configuration snapshot. Typically configuration snapshots are generated by the configuration container, published, then unpacked by runtime containers. For this example, the snapshot will provide the configuration container with the database connection information needed to run the cleanup tasks.
To bootstrap a configuration container, you will need to set the SOURCE_CONFIG_SERVICE_URL
, SOURCE_CONFIG_SERVICE_USER_NAME
, SOURCE_CONFIG_SERVICE_USER_PWD
and SOURCE_CONFIG_SERVICE_TLS_CACERT
environment variables. For a full list of all the options available, see the configuration container knowledge center documentation. For this example we will rely on the Verify Access Operator to host the configuration snapshot, and use the verify-access-operator
secret generated by the operator to supply the source authentication information.
We also need to define the cleanup tasks that we need to run. In this example we will just run the entire list of available tasks, but you should run tasks that are relevant to your deployment needs. The example CRON schedule is set for 3am on a Sunday night, typically a low traffic period for most enterprises. You can update this schedule to best suit your deployment. Additionally, administrators could add a HVDB_CLEANUP_TIMEOUT
property to limit how long cleanup tasks are run for, if the default 1 hour limit is not sufficient.
The resulting Kubernetes CronJob yaml looks like:
apiVersion: batch/v1
kind: CronJob
metadata:
name: ivia-hvdb-clean
spec:
schedule: "0/3 * * * *"
jobTemplate:
spec:
template:
spec:
volumes:
- name: iviaconfigvol
emptyDir: {}
- name: opeartor-cert
secret:
secretName: verify-access-operator
items:
- path: verify-access-operator.crt
key: tls.cert
containers:
- name: ivia-hvdb-cleanup
image: icr.io/ivia/ivia-config:11.0.1.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /var/shared
name: iviaconfigvol
- mountPath: /tmp/verify-access-operator.crt
name: opeartor-cert
subPath: verify-access-operator.crt
command:
- "bash"
- "-c"
- "mkdir -p /var/shared/snapshots && /usr/sbin/bootstrap.sh"
env:
#- name: SNAPSHOT_ID
# value: "idp-published"
- name: HVDB_CLEANUP_TASKS
value: mmfaTransactions,mmfaAuthenticators,oauthToken,dmapCache,deviceRegistrations,authsvcSessionCache
#- name: HVDB_CLEANUP_TIMEOUT
# value: "90"
- name: SOURCE_CONFIG_SERVICE_URL
valueFrom:
secretKeyRef:
name: verify-access-operator
key: url
- name: SOURCE_CONFIG_SERVICE_TLS_CACERT
value: file:/tmp/verify-access-operator.crt
- name: SOURCE_CONFIG_SERVICE_USER_NAME
valueFrom:
secretKeyRef:
name: verify-access-operator
key: user
- name: SOURCE_CONFIG_SERVICE_USER_PWD
valueFrom:
secretKeyRef:
name: verify-access-operator
key: ro.pwd
restartPolicy: OnFailure
Note: The cron schedule is set to run every three minutes! This is probably too frequently for any deployment but it means you won't have to wait very long for the schedule to start. Make sure you update the schedule to something less frequent before deploying to production.
Verifying cleanup with trace
Once a cleanup tasks has successfully run, a summary of how many records were removed and the time taken are returned as JSON data. Administrators can process these log messages to verify tasks ran successfully, and act accordingly:
{"type":"bootstrap","host":"ivia-hvdb-clean-29199072-tgkgb","ibm_userDir":"\/","ibm_serverName":"runtime","message":"WGAWA1007I The High Volume Database cleanup tasks completed successfully.","ibm_threadId":"1","ibm_datetime":"2025-07-08T03:13:46+0000","ibm_messageId":"WGAWA1007I","module":"/usr/sbin/bootstrap.sh","loglevel":"AUDIT"}
{"mmfaTransactions":{"lastRan":"Tue Jul 08 03:12:55 GMT 2025","runTime":"670 ms","recordsRemoved":0,"status":"IDLE"},"mmfaAuthenticators":{"lastRan":"Tue Jul 08 03:13:04 GMT 2025","runTime":"106 ms","recordsRemoved":0,"status":"IDLE"},"authsvcSessionCache":{"lastRan":"Tue Jul 08 03:13:14 GMT 2025","runTime":"0 ms","recordsRemoved":0,"status":"IDLE"},"deviceRegistrations":{"lastRan":"Tue Jul 08 03:13:25 GMT 2025","runTime":"396 ms","recordsRemoved":0,"status":"IDLE"},"oauthToken":{"lastRan":"Tue Jul 08 03:13:34 GMT 2025","runTime":"147 ms","recordsRemoved":0,"status":"IDLE"},"dmapCache":{"lastRan":"Tue Jul 08 03:13:44 GMT 2025","runTime":"103 ms","recordsRemoved":0,"status":"IDLE"}}
If the cleanup threads determine that they are being run on both the management interface and aac/fed runtime, a warning message is logged in the management servers log file, suggesting that the automatic cleanup thread be disabled (by setting it frequency to -1):
{"type":"liberty_message","host":"ivia-hvdb-clean-29199072-tgkgb","ibm_userDir":"\/opt\/ibm\/wlp\/usr\/","ibm_serverName":"default","message":"FBTFDB015W TokenCleanupThread cleanup thread appears to be running in multiple places, this could lead to High-Volume Database errors.","ibm_threadId":"0000007a","ibm_datetime":"2025-07-08T03:13:34.678+0000","module":"TokenCleanupThread","loglevel":"WARNING","ibm_methodName":"run","ibm_className":"TokenCleanupThread","ibm_sequence":"1751944414678_00000000000AD","ext_thread":"Thread-54"}
Tuning the cleanup tasks
The 11.0.1 release also added a number of configuration properties for controlling how frequently SQL commit operations are performed. All of the cleanup tasks now have the option of specifying the SQL batch size to use when removing rows. SQL batch operations concatenate multiple similar statements together, and update the database in one single transaction lock. This is often more performant, particularly when a large volume of data needs to be removed.
For all cleanup tasks, it is recommended to use batch mode cleanup of at least 1000 transactions/rows where possible. For other tuning parameters, such as maximum number of transactions per user, ect. limits will depend on your business/regulatory requirement needs.
If administrators are using manual cleanup tasks to remove data from the HVDB, the automatic cleanup threads should be disabled in on the runtime. This can be done by setting the frequency advanced configuration property for the applicable thread to -1
.
for more information, check out the knowledge center documentation here