Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

View Only

Back to Blog List

Db2 12.1.13 Features for AI Builders: From the Engineers Who Built Them

By Shaikh Quader posted 13 hours ago

Db2 12.1.13 introduces capabilities for AI builders across three areas: SQL routine vector support, high-performance vector loading, and LLM library integration. Each section comes straight from the engineers who built it, including technical insights and learning resources.

Build AI Applications with LangChain and LlamaIndex

Db2 provides native Python packages for LangChain and LlamaIndex, two widely-used frameworks for building RAG applications and AI agents. These integrations address a critical need in the AI developer community: seamless access to enterprise-grade vector storage capabilities in Db2 through a familiar, widely-adopted Python framework.

The Python packages handle database integration. You write application logic, not SQL. Initialize a vector store with intuitive Python commands and perform similarity search using supported distance metrics including cosine similarity, Euclidean distance and dot product. All operations are supported through familiar Python workflows, making it easier to integrate Db2 into modern GenAI and agentic AI applications without requiring database expertise.

The Production Challenge

Db2 now enables seamless integration of AI applications with routine operational workloads by managing both structured and unstructured data. Db2 is known for its strong ACID guarantees and performance in large-scale business-critical applications, supporting low-latency transactions and real-time analytics.

Db2 supports semantic similarity search through SQL queries with vector distance metrics, enabling hybrid data queries that combine traditional SQL with vector operations for rich, context-aware analytics. Your vector embeddings can work alongside structured data within the same database platform, eliminating the need for separate systems and complex data synchronization.

Teams can start with familiar Python-based development workflows and seamlessly scale to enterprise requirements without architectural changes. Your RAG application doesn't need a rewrite when it moves from development to production.

Getting Started

Both packages are available on PyPI with straightforward installation and minimal configuration. Tutorial notebooks demonstrate common usage scenarios including document embedding, semantic search implementation, and RAG pipeline construction. The connectors work with Db2 12 Mod Pack 2 and later versions that include native vector capabilities.

View the Db2 LangChain notebook

View the Db2 LlamaIndex tutorial notebook

Custom AI Logic in SQL Routines

Developers can now use VECTOR data types in user-defined SQL functions and stored procedures. This lets you encapsulate AI application logic at the database layer where your data lives.

SQL routines eliminate the application-database boundary for vector operations. Instead of fetching vectors to your application server, processing them, and sending results back, you write the logic once in SQL and execute it where the data resides. With vector support in SQL routines, you can:

Encapsulate domain logic: Write a user-defined function that implements your company's specific similarity metric. Perhaps combine vector distance with business rules like product category weights or customer segment preferences. Applications call this function in their queries without reimplementing the logic in each codebase.

Build reusable vector operations: Create stored procedures for common tasks like embedding normalization, dimension validation, or outlier detection. When your normalization strategy changes (switching from L2 to max normalization), you update one procedure instead of every application that processes vectors.

Compose hybrid queries: Write functions that combine vector similarity with structured filters in ways the query optimizer can see and optimize. A product search function might accept a query vector and filter parameters, then execute a similarity search restricted to in-stock items in the user's region. The optimizer can push predicates down because it sees the complete operation, improving query performance.

Process vectors in batch: Stored procedures can iterate over vector sets to compute aggregate statistics, detect clusters, or update derived columns. Calculate the centroid of customer segment embeddings, identify vectors that deviate from their cluster, or refresh precomputed similarity scores. All without transferring vectors across the network.

Maintain consistency across applications: When multiple applications access the same vector data (a web application, batch analytics jobs, and operational reports), they can call the same SQL routines. This ensures consistent preprocessing, distance calculations, and business logic regardless of which team or language built the application.

Production-Grade Data Movement

Mod Pack 3 adds vector handling to additional Db2 data movement utilities. Mod Pack 2 introduced native vector data types and built-in functions. Mod Pack 3 connects those capabilities to the utilities developers use for production AI workloads. Db2 12.1.2 added vector support to IMPORT and EXPORT utilities. Db2 12.1.3 extends this support to additional tools that operate at scale.

LOAD command: The LOAD command now supports VECTOR columns. LOAD handles high-volume data insertion in minutes rather than hours for millions of rows. When you migrate historical embeddings into Db2, transform vector data between systems, or refresh vector embeddings, LOAD provides the performance your AI workloads require. The command accepts vector data as strings and converts them to vector format based on the column's dimension and coordinate type, similar to how it handles VARCHAR and CLOB values.

ADMIN_COPY_SCHEMA procedure: This procedure copies entire schemas, including tables, indexes, and constraints. With vector support, you can clone development schemas to production or create test environments that include vector data without manual intervention.

ADMIN_MOVE_TABLE procedure: This procedure relocates tables between tablespaces or storage groups. When embedding tables grow and require different storage tiers, ADMIN_MOVE_TABLE now handles tables with VECTOR columns.

db2move command: This command migrates entire databases or specific schemas between Db2 instances. When you promote AI applications from development to production or migrate between Db2 versions, db2move now handles vector data alongside your traditional tables.

External tables: External tables let you query data files stored outside Db2 as if they were database tables. No loading required. You define the table structure and point to files in object storage, file systems, or data lakes, then run SQL queries directly against that data. With vector support, you can now query embedding files stored externally and join them with data in Db2. This matters when you generate embeddings in batch processing systems and want to query them alongside your database records without importing terabytes of vector data.

Logical backup and restore: Logical backup and restore operations preserve vector data at the schema and table level. This provides a flexible, lightweight way to back up and restore specific schemas or individual tables containing vector data. You can perform full backups or incremental backups (cumulative or delta) of schemas with vector columns, then restore the entire schema or select specific tables as needed.

Why This Matters

These three capabilities work together to eliminate architectural compromises in AI application development.

Start fast, scale without changes: Prototype your RAG application using LangChain or LlamaIndex against Db2. When you move to production, your vector data stays in the same database that handles your transactions. No migration between vector databases. No separate infrastructure to maintain. Your LOAD utilities, backup procedures, and SQL routines work from day one through production scale.

Keep logic and data together: SQL routines let you implement business-specific similarity metrics, preprocessing pipelines, and hybrid search logic once at the database layer. Your Python application calls these routines through LangChain or LlamaIndex. Multiple applications (web interfaces, batch jobs, analytics dashboards) use the same logic without code duplication. When you refine your similarity algorithm, you update one stored procedure instead of hunting through application codebases.

Simplify operations: Your database team already knows how to back up Db2, secure it, monitor it, and tune it. Vector data gets the same treatment as your other data. When you need to refresh embeddings at scale, LOAD handles millions of vectors. When you need to copy a schema for testing, ADMIN_COPY_SCHEMA includes the vectors. When you need to query vectors in a data lake, external tables eliminate the import step.

Eliminate data silos: Vector similarity searches join naturally with structured data. A product recommendation query filters by inventory status, user permissions, and price ranges while ranking by semantic similarity. One query across one database. No application-level joins between a vector database and a relational database. No synchronization logic to keep embeddings aligned with source records.

Db2 provides vector search at scale using the LLM frameworks where developers build AI applications. Your embeddings live where your data lives. Your AI logic lives where your business logic lives. Your operations stay simple.

About Authors

A person with a beard

AI-generated content may be incorrect.

Shaikh Quader

AI Architect and Master Inventor, IBM Db2

Shaikh Quader is AI Architect and Master Inventor at IBM Db2, where he designs and implements AI systems for enterprise database platforms. With 20 years at IBM, he transitioned from software development to AI research and engineering in 2016. Shaikh leads technical AI initiatives at IBM and directs collaborative research programs with Canadian universities. He writes regularly on LinkedIn and Substack about AI implementation, productivity systems, and knowledge work transformation. His work has resulted in multiple patents and publications in database AI applications. Currently completing his PhD at York University, his research examines practical AI integration in relational database systems. His expertise spans machine learning deployment, database optimization, and enterprise AI architecture for solving complex data challenges.

Raymond Yao

Senior Development Manager, IBM Db2

Raymond Yao is Senior Development Manager for the Db2 AI & Query Compiler team. His team leads major AI initiatives in Db2, including Vector DB support, LLM integration, AI Query Optimizer, and AI Query Tuner.

With over 20 years of experience in IBM, Raymond began as a Db2 software developer and has since managed diverse development teams. His projects span from modernizing Db2 with cloud technologies to enhancing the Query Engine with AI.

Yiwei He

Backend Developer, IBM Db2

Yiwei He is a backend software developer at IBM, experienced in database systems engines/storage techs, proficient in C++, with about 4 years of full-time experience at IBM. He has long experience with the Columnar Database Engine (CDE) of Db2. Focusing on the storage and data compression of the Columnar Data Engine. Working on not only saving storage space, but also improving the performance from techniques like columnar dictionaries, string compression, etc. He also has experience with IBM WatsonX.data (data platform for Gen AI) and LakeHouse projects. For example, Prestissimo (Presto with C++ worker. An open-source project. A key engine for watsonx.data) and Optimizer Plus Project (to use DB2 enterprise-proven query compilation technology coupled with advanced query rewrite and cost-based optimization techniques to optimize the query performance of Presto/Prestissimo).

Adam Zygmunt

Developer, IBM Db2

Adam Zygmunt is a Software Developer with IBM Db2 LUW kernel development. As part of the Data Management team, with a focus on the Data Movement utilities, he is involved in feature development, L3 support and problem determination.

Ioana Delaney

Senior Software Engineer, IBM Db2

Ioana Delaney is a software engineer with database background and experience in query optimization and performance. With 25 years at IBM, she worked on various releases of DB2 LUW and DB2 with BLU Acceleration in the areas of query semantics, rewrite, and optimizer.

0 comments

4 views

Permalink

https://community.ibm.com/community/user/blogs/shaikh-quader/2025/11/18/db2-12113-vector-features-for-ai-builders

Db2

Db2

Db2 12.1.13 Features for AI Builders: From the Engineers Who Built Them

By Shaikh Quader posted 13 hours ago

Build AI Applications with LangChain and LlamaIndex

Custom AI Logic in SQL Routines

Production-Grade Data Movement

Why This Matters

About Authors

Permalink

Additional
Resources

Office

Quick Links

Db2

Db2

Db2 12.1.13 Features for AI Builders: From the Engineers Who Built Them

By Shaikh Quader posted 13 hours ago

Build AI Applications with LangChain and LlamaIndex

Custom AI Logic in SQL Routines

Production-Grade Data Movement

Why This Matters

About Authors

Permalink

Additional Resources

Office

Quick Links

Additional
Resources