Intel Gaudi accelerators are emerging as a powerful alternative for AI inferencing workloads, offering enterprise-grade performance for large language models in air-gapped environments. For organizations running IBM Db2 Genius Hub with Intel Gaudi 2 or Gaudi 3 infrastructure, deploying customer-managed air-gapped inferencing enables AI-powered database operations while maintaining complete control over data sovereignty and network isolation.
This blog provides a practical guide for setting up IBM Granite 4.0 inferencing on Intel Gaudi accelerators using Red Hat Enterprise Linux AI (RHEL AI) containers, specifically designed for organizations that require AI capabilities without external connectivity.
Why Air-Gapped Inferencing Matters
For many enterprises, AI adoption is not limited by interest or use cases. It is limited by architecture and compliance requirements. Organizations often need guarantees around:
- Data sovereignty
- Internal infrastructure ownership
- Network isolation
- Regulatory compliance
- Predictable operational boundaries
These requirements become especially important when AI interacts with operational database systems, particularly in regulated industries where organizations must maintain strict control over operational metadata, infrastructure boundaries, and network exposure.
Customer-Managed Air-Gapped Inferencing allows organizations to keep:
- Prompts
- Operational telemetry
- Database context
- Inferencing workloads
inside infrastructure they fully control.
The result is an AI deployment model aligned with enterprise governance and security policies while still enabling AI-assisted database operations.
Not Just Private AI—Operationally Useful AI
Air-gapped inferencing matters not only because it preserves isolation, but because it enables practical AI-powered database operations within enterprise-controlled environments.
Db2 Genius Hub uses its agentic AI service to support capabilities such as:
• Natural language interactions for database questions
• Performance and troubleshooting analysis
• Conversational search and guided diagnostics
• Reasoning over live and historical Db2 context
• Database-aware workflows informed by institutional Db2 knowledge
The AI Configuration experience, provider validation workflow, and integrated inferencing support further reinforce this as a built-in operational capability rather than an external add-on.
Intel Gaudi Setup: A Practical Guide
This guide walks through deploying IBM Granite 4.0 on Intel Gaudi accelerators for air-gapped Db2 Genius Hub deployments.
Prerequisites
- RHEL 9.6+ system
- Intel Gaudi 2 or Gaudi 3 hardware
- Gaudi software stack installed and validated (verify with `hl-smi`)
- Podman 5.x installed
- Red Hat registry access (account or service token)
- Habana runtime support for Podman
Step-by-Step Setup
Step 1: Authenticate to Red Hat Container Registry
podman login registry.redhat.io
Step 2: Pull the RHEL AI vLLM Gaudi Container
podman pull registry.redhat.io/rhaii/vllm-gaudi-rhel9:3.4.0
Step 3: Run the Container with Gaudi Support
PODMAN_OPTS="-e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host"
podman run -it --name vllm-gaudi --runtime=habana -e HABANA_VISIBLE_DEVICES=all $PODMAN_OPTS vault.habana.ai/gaudi-docker/1.24.0/ubuntu22.04/habanalabs/pytorch-installer-2.10.0:latest
Step 4: Validate Gaudi Availability
hl-smi
Step 5: Clone and Setup vLLM Gaudi Repository
cd /workspace
git clone https://github.com/vllm-project/vllm-gaudi.git
cd vllm-gaudi
git checkout gaudi3Benchmarks
Step 6: Configure HAProxy Load Balancer
Intel Gaudi deployments support scalable inferencing using HAProxy-based load balancing across multiple vLLM instances.
Install HAProxy:
yum install -y haproxy procps-ng
Or manually download if needed:
cd /tmp
curl -LO https://www.haproxy.org/download/2.8/src/haproxy-2.8.10.tar.gz
tar xzf haproxy-2.8.10.tar.gz
cd haproxy-2.8.10
make TARGET=linux-glibc USE_OPENSSL=1 USE_PCRE= -j$(nproc)
cp haproxy /usr/local/sbin/haproxy
ln -s /usr/local/sbin/haproxy /usr/local/bin/haproxy
Verify installation:
haproxy -v
Step 7: Start vLLM Servers with Load Balancing
cd /workspace/vllm-gaudi/gaudi3Benchmarks/1.23.0/v0.17.1/haproxy_lb
Start with 8 vllm/Gaudi instances (default) 8 vllm → 8 Gaudi3
Start with a custom number of vllm / Gaudi instances 4 vllm → 4 Gaudi3
Stop all processes including
Step 8: Test the Inferencing Endpoint
curl -sS http://localhost:30360/v1/models -H "Authorization: Bearer granite4.0h-g3key"
Step 9: Configure Db2 Genius Hub
1. Select Bring your own AI stack
2. Choose RHEL vLLM as the provider
3. Enter endpoint details
4. Use Test Connection to verify connectivity
Video Walkthrough
For a complete visual demonstration of the Intel Gaudi setup process, watch our step-by-step video guide that walks through container deployment and HAProxy configuration:
Additional Resources
Red Hat AI Container Catalog: https://catalog.redhat.com/en/software/containers/rhaii/vllm-gaudi-rhel9/69e0e3a360eb49b3bbffc4a8
IBM Granite 4.0 Load Balancer Configuration: https://github.com/vllm-project/vllm-gaudi/tree/gaudi3Benchmarks/gaudi3Benchmarks/1.23.0/v0.17.1/haproxy_lb
A Clear Path to Sovereign AI with Intel Gaudi
Intel Gaudi accelerators extend the Db2 Genius Hub AI stack with a cost-effective, enterprise-grade option for air-gapped inferencing. With HAProxy-based load balancing for horizontal scaling and optimized performance for large-scale inferencing workloads, Intel Gaudi provides organizations with another powerful choice for sovereign AI deployment.
Combined with enterprise support from Red Hat and Intel, plus IBM Granite 4.0 models, organizations can achieve AI-powered database operations while maintaining complete control over their infrastructure and data.
For organizations with Intel Gaudi infrastructure, this deployment path enables agentic AI capabilities without compromising on network isolation, infrastructure ownership, or data sovereignty.
Final Thought
The future of enterprise AI will not be defined by a single deployment model or hardware platform.
It will be defined by choice.
Intel Gaudi air-gapped inferencing for IBM Db2 Genius Hub gives organizations another way to adopt agentic AI on their own terms with their own infrastructure, inside their own security boundary, and with the operational control that regulated environments demand.
Ready to deploy Intel Gaudi air-gapped AI for your Db2 environment?
Contact our team today for technical guidance and deployment support.
About Authors
IBM Contributors:
- Ashok Kumar, Merlin Moncy and Taniya Bagh
- from IBM's Db2 Genius Hub team specialize in AI-powered database operations and enterprise deployment solutions.
Intel Contributors:
- Murali Madhanagopal, Shankar Ratneshwaran, Pramod Pai, and Suresh B. Nampalli
- from Intel's Gaudi AI accelerator team bring deep expertise in Intel Gaudi hardware optimization and deployment.