Artificial intelligence is rapidly becoming part of modern database operations. From troubleshooting and performance analysis to conversational workflows and intelligent automation, enterprises want AI to improve productivity and operational efficiency.
But for many organizations, especially in highly regulated industries, there is a fundamental challenge:
AI cannot come at the cost of data control.
Financial institutions, government agencies, defence organizations, healthcare providers, and other compliance-driven enterprises often operate under strict policies that prohibit sensitive operational data from leaving controlled environments. Sending prompts, telemetry, or database context to public inferencing endpoints is simply not an option.
To address this reality, IBM Db2 Genius Hub introduces Customer-Managed Air-Gapped Inferencing, a deployment model designed for organizations that require AI inferencing to remain fully inside their own infrastructure and security boundary.
This is more than an alternative hosting option. It is a practical approach to sovereign AI for enterprise database operations.
Understanding the AI Stack Options in Db2 Genius Hub
IBM Db2 Genius Hub provides three inferencing models, allowing organizations to choose the approach that best fits their operational and security requirements.
- IBM-Managed Cloud Inferencing
- In this model, IBM manages the inferencing infrastructure, endpoints, and operations using services such as AWS Bedrock. It is the simplest deployment option and requires minimal setup from customers
- Customer-Managed Cloud Inferencing
- Customers use their own cloud-based AI services, such as AWS Bedrock, watsonx.ai, or GCP Vertex AI, and configure those endpoints within Db2 Genius Hub.
- Customer-Managed Air-Gapped Inferencing
- This option is designed for environments where inferencing must remain entirely behind the firewall. Customers deploy and manage their own AI stack on-premises using supported accelerator infrastructure while Db2 Genius Hub securely connects to the internally managed endpoint.
This blog focuses on the third option — the air-gapped deployment model built for organizations that need maximum control over AI infrastructure and data handling.
Why Air-Gapped Inferencing Matters
For many enterprises, AI adoption is not limited by interest or use cases. It is limited by architecture and compliance requirements.
Organizations often need guarantees around:
- Data sovereignty
- Internal infrastructure ownership
- Network isolation
- Regulatory compliance
- Predictable operational boundaries
These requirements become especially important when AI interacts with operational database systems, particularly in regulated industries where organizations must maintain strict control over operational metadata, infrastructure boundaries, and network exposure.
Customer-Managed Air-Gapped Inferencing allows organizations to keep:
- Prompts
- Operational telemetry
- Database context
- Inferencing workloads
inside infrastructure they fully control.
The result is an AI deployment model aligned with enterprise governance and security policies while still enabling AI-assisted database operations.
Customer-Managed Air-Gapped Inferencing allows teams to adopt AI-powered Db2 operations without changing their security posture while still enabling intelligent diagnostics, troubleshooting, and conversational workflows.
What Customer-Managed Air-Gapped Inferencing Looks Like
In this deployment model, both Db2 Genius Hub and the inferencing stack run on-premises.
The customer deploys a dedicated inferencing server on supported accelerator hardware. Two deployment options are available:
Option 1: Red Hat Enterprise Linux AI (RHEL AI) Package
A comprehensive package that includes:
- Red Hat OS 9.6+
- Red Hat vLLM container
- Podman container runtime
- Model optimizer tools
This is a complete solution where Red Hat provides support for the entire stack.
Option 2: Red Hat vLLM Container with Podman
A flexible option for customers who already have a VM with an operating system. This requires:
- Red Hat vLLM container
- Podman container runtime
- Compatible with CentOS Stream 9.6+, Ubuntu 22.04+, or RHEL 9.6+
This option provides Red Hat support for the vLLM container while allowing customers to use their existing infrastructure.
Both options enable:
- IBM Granite 4 model deployment
- OpenAI-compatible API endpoints
The currently documented deployment path uses:
ibm-granite/granite-4.0-h-small
The customer is responsible for managing:
- Accelerator infrastructure
- Model deployment
- Container lifecycle
- Endpoint exposure
- Networking and security
- Runtime monitoring and tuning
Db2 Genius Hub then connects to the configured inferencing endpoint through its AI Configuration interface.
In simple terms:
The customer owns and operates the AI stack, while Db2 Genius Hub consumes the inferencing service.
How the Architecture Works
The architecture can be viewed as four integrated layers:
1. Db2 Genius Hub Platform
Hosts the core platform capabilities, including:
- Console and APIs
- Agentic AI services
- Workflow orchestration
- Memory and reasoning services
2. Agentic Reasoning Layer
Handles intelligent orchestration and combines:
- Live Db2 operational context
- Historical insights
- Institutional Db2 knowledge
to support troubleshooting, diagnostics, and conversational workflows.
3. Customer-Managed Inferencing Layer
Runs:
- Red Hat vLLM runtime
- IBM Granite model
on supported accelerator infrastructure inside the customer network
4. Connected Db2 Environments
Supports interaction with:
- On-premises Db2 systems
- Cloud-hosted Db2 deployments
through the Genius Hub platform.
This architecture enables AI-powered workflows without requiring external inferencing services.
Supported Model and Deployment Pattern
The documented deployment approach currently supports NVIDIA, AMD, and Intel Gaudi accelerator environments.
The key setup pattern includes:
- Logging in to the Red Hat container registry
- Pulling the appropriate vLLM container image
- Authenticating with Hugging Face to download the model
- Running the model server in a containerized vLLM runtime
- Configuring accelerator-specific runtimes and dependencies
- Exposing the endpoint to Db2 Genius Hub over an internal host and port
Supported deployment platforms include:
- NVIDIA deployments use NVIDIA H100 GPU systems with the Red Hat CUDA-based vLLM container.
- AMD deployments use AMD MI300X accelerator systems with the ROCm-based Red Hat vLLM container.
- Intel Gaudi 2/3 is also supported. For Intel-specific setup details, see our Intel Gaudi deployment guide.
The video walkthrough specifically demonstrates the NVIDIA H100 path and shows how the model is loaded and exposed for Genius Hub to consume.
A Practical Setup Flow
At a high level, the air-gapped setup follows this sequence:
1. Prepare the accelerator host
Provision a supported accelerator server in your private environment. The walkthrough highlights an NVIDIA H100-based system, while the documentation also covers AMD MI300X and Intel Gaudi 2/3 support.
Note: The setup steps for AMD MI300X are identical to those for NVIDIA H100, with the only difference being the use of the ROCm-based vLLM container instead of the CUDA-based container. For AMD-specific details, see the vLLM AMD setup guide.
Prerequisites include:
- A Red Hat account
- Podman installed
- Python 3 and pip
- Access to Hugging Face for model download
- Supported accelerator hardware and drivers
2. Pull the RHEL AI vLLM container
Authenticate to the Red Hat registry and pull the appropriate vLLM container image for your accelerator platform (for example, the CUDA-based container for NVIDIA, the ROCm-based container for AMD, or the Gaudi-optimized container for Intel Gaudi environments).
3. Download the Granite model
Authenticate to Hugging Face using a token and download the vLLM-compatible model. For this deployment option, Db2 Genius Hub uses the IBM Granite 4 model, specifically ibm-granite/granite-4.0-h-small for the air-gapped inferencing path.
4. Start the vLLM server with Podman
Run the container with the appropriate accelerator runtime flags, shared memory settings, model path, API key, and host/port bindings.
Key runtime considerations include:
- Tensor parallel size should match the number of GPUs assigned to the model
- Prefix caching and chunked prefill can improve response behavior
- GPU memory utilization must be tuned to the available hardware
- The server can be run interactively for testing or detached for ongoing service
Once the container is running and the model is loaded into GPU memory, the endpoint is ready for inferencing.
For detailed setup commands and configuration options, refer to the IBM Db2 Genius Hub documentation:
5. Configure the endpoint in Db2 Genius Hub
In the AI Configuration page in Db2 Genius Hub, select Bring your own AI stack and choose your inference provider.
Db2 Genius Hub supports multiple customer-managed providers:
- RHEL vLLM (for air-gapped deployments)
- watsonx.ai
- AWS Bedrock
- GCP Vertex AI
For air-gapped scenarios, select RHEL vLLM and provide:
- Model inference URL (e.g., http://10.10.10.10:9000/v1)
- Model identifier (ibm-granite/granite-4.0-h-small)
- API key
Once configured, use the Test Connection feature to verify that Genius Hub can successfully communicate with your air-gapped inferencing endpoint.
Here is a glimpse of the AI Configuration interface in IBM Db2 Genius Hub used to configure customer-managed inferencing providers for air-gapped deployments.
Video Walkthrough
For a complete visual demonstration of the setup process on NVIDIA H100 GPUs, watch our step-by-step video guide:
What the AI Configuration Choice Changes
Choosing Customer-Managed Air-Gapped Inferencing is an architectural decision that shifts both control and responsibility to the customer.
Benefits
Organizations gain:
- Complete control over inferencing infrastructure
- Private AI operations within internal networks
- Strong alignment with regulatory requirements
- Ownership of model lifecycle and deployment
- Sovereign AI capabilities for database operations
Responsibilities
Customers also manage:
- GPU sizing and availability
- Container updates
- Model deployment and maintenance
- Endpoint security
- Runtime performance tuning
- Operational monitoring
This model is intentionally designed for enterprises that prioritize control, governance, and security over operational simplicity.
Not Just Private AI—Operationally Useful AI
Air-gapped inferencing matters not only because it preserves isolation, but because it enables practical AI-powered database operations within enterprise-controlled environments.
Db2 Genius Hub uses its agentic AI service to support capabilities such as:
- Natural language interactions for database questions
- Performance and troubleshooting analysis
- Conversational search and guided diagnostics
- Reasoning over live and historical Db2 context
- Database-aware workflows informed by institutional Db2 knowledge
The AI Configuration experience, provider validation workflow, and integrated inferencing support further reinforce this as a built-in operational capability rather than an external add-on.
A Clear Path to Sovereign AI with Db2 Genius Hub
Customer-Managed Air-Gapped Inferencing extends the Db2 Genius Hub AI stack in a way that meets real enterprise constraints.
It gives customers a documented pattern for deploying Red Hat vLLM with Granite on supported accelerator hardware, connecting that endpoint to Db2 Genius Hub, and enabling AI-powered database operations without relying on external inferencing services.
For organizations that want agentic AI capabilities without compromising on network isolation, infrastructure ownership, or data sovereignty, this option provides a practical and enterprise-ready path forward.
Final Thought
The future of enterprise AI will not be defined by a single deployment model.
It will be defined by choice.
Customer-Managed Air-Gapped Inferencing in IBM Db2 Genius Hub gives organizations a way to adopt agentic AI on their own terms—with their own infrastructure, inside their own security boundary, and with the operational control that regulated environments demand.
Ready to explore air-gapped AI for your Db2 environment?
Contact our team today for technical guidance and deployment support.
About Authors
Ashok Kumar
Ashok Kumar is a Program Director of Engineering at IBM with over 15 years of industry experience. He leads global engineering teams within IBM's Hybrid Data Management organization and drives innovation across the Db2 product portfolio, including on-premises, cloud, and BYOC deployments. His work spans advanced database technologies, from infusing AI into the Db2 query optimizer to enabling Db2 as a vector store and building agentic AI capabilities. He leads the integration of large language models into the Db2 Intelligence Center for natural language interaction, root-cause analysis, text-to-SQL translation, and automated troubleshooting. His focus sits at the intersection of AI, databases, and enterprise data management. Ashok can be reached at ashokku@us.ibm.com.
Merlin Moncy
Merlin Moncy is a Software Developer in Hybrid Data Management at IBM’s Ireland Lab, focusing on containerized Db2 offerings and automation workflows. She has experience in Python development, QA automation, CI/CD pipelines, and containerized environments, and is currently contributing to CAE team initiatives around Db2 Genius Hub and AI-powered database operations. She holds a Master’s degree in Data Analytics from the University of Galway, with a focus on machine learning, NLP, and data-driven systems. Merlin can be reached at merlin.moncy@ibm.com.
Taniya Bagh
Taniya Bagh is a Software Developer in Hybrid Data Management at IBM's Ireland Lab, focusing on Data Virtualisation, agentic AI solutions, and intelligent data-driven systems. She has experience in Java and Python development, containerized environments, and building AI-powered workflows and automation solutions. As part of the CAE team, She contributes to initiatives focused on AI-enhanced data management and next-generation enterprise solutions.
She holds a Master's degree in Data Analytics from National College of Ireland, with a focus on Machine Learning, Deep Learning models, and data-centric intelligent systems. Taniya can be reached at taniya.bagh@ibm.com.