By Christian Garcia-Arellano, Zach Hoggard and Chris Stojanovsky
As vector data sets grow from thousands to millions of embeddings, brute-force nearest-neighbor search quickly becomes a bottleneck, making low-latency semantic search impractical. In Db2 v12.1.2.0, we laid the foundation for vector workloads with native vector support and similarity search, and in the October 2025 Db2 Early Access Program (EAP) release we introduced an early preview of vector indexing based on approximate nearest-neighbor (ANN) search. The latest Db2 EAP release significantly matures this capability, delivering enterprise-grade tunability and dramatic performance gains. With this release, the DiskANN-based vector index can reduce query latency by 100× to 1000× or more compared to exact search while maintaining high recall (>95%), enabling real-time semantic search, interactive recommendations, and low-latency RAG applications at scale. These gains come from aggressively shrinking the search space — traversing a navigable graph and filtering candidates using compressed vectors before computing exact distances. In this blog, we break down the index structure and tuning parameters so you can deliberately trade off speed, accuracy, and resource usage for your workload.
What This Blog Covers
This blog, the second in our vector index series, provides a deep technical exploration of the Db2 vector index architecture and search tuning methodology. We structure the discussion in multiple parts:
1. Index Structure and Physical Organization We begin by dissecting the vector index's internal structure, examining the two persistent tables that store the main index components: the graph table, which encodes the navigable ANN structure, and the compressed vectors table, which provides space-efficient vector representations for rapid candidate filtering. Understanding this physical organization is essential for reasoning about index behavior and interpreting the effects of tuning parameters.
2. Search Algorithm Mechanics We explain how the greedy graph traversal algorithm works, detailing the iterative process of candidate exploration and refinement. This section provides the conceptual foundation for understanding how search parameters influence query execution.
3. Search Parameter Tuning We provide detailed guidance on tuning the SEARCH_LIST_SIZE and SEARCH_BEAM_WIDTH parameters, explaining their interaction and impact on recall, latency, and resource consumption. This section includes practical tuning strategies for these parameters.
4. Practical Application We end this blog with a demonstration of the concepts through a realistic "Entity Customer" workload based on enriched TPC-DS data, showing how to construct vector embeddings from complex structured data and leverage the index for customer intelligence applications.
What does the Db2 vector index based on DiskANN index look like?
Db2’s vector index implementation leverages a DiskANN-inspired architecture that separates the index into two distinct but complementary components: a graph table that encodes the approximate nearest-neighbor (ANN) navigation structure, and a compressed vectors table that stores space-efficient vector representations. This separation enables the index to balance search performance with storage efficiency, a critical consideration when working with high-dimensional vectors at scale.
When you create a vector index, Db2 automatically creates two hidden, row-organized system tables in the SYSIBM schema. These tables are assigned unique system-generated names and work together to support both index construction and query execution. Understanding the structure and purpose of each table is essential for reasoning about index behavior, tuning parameters, and performance characteristics.
Note that some of the parameters of both the graph and the compressed vector tables can be customised as part of create index operation, and they also have an impact on the performance/recall trade-off, but we will discuss these in a subsequent blog – stay tuned for these. If you are eager in learning more about these, have a look at the EAP documentation.
Graph Table: The ANN Navigation Structure
The graph table is the heart of the vector index. It represents the ANN index as a navigable graph where each node corresponds to a vector in your base table, and edges between nodes represent proximity relationships in the vector space. During a similarity search, the query engine traverses this graph to efficiently identify candidate nearest neighbors without exhaustively comparing against every vector in the dataset.
Each row in the graph table represents a single node in this graph and contains the following columns:
- NODEID (BIGINT): A unique identifier for the node within the ANN graph. This ID is used internally to reference nodes during graph traversal and is distinct from the base table's primary key.
- RECORDID (BIGINT): The record ID (tuple sequence number) that links this graph node back to the corresponding row in your base table. This mapping is crucial for returning actual query results after the ANN search identifies candidate vectors.
- VECTOR (INT8 | FLOAT32, <size of base vector>): The full-precision vector from the base table. The data type matches your base table's vector column definition—either INT8 for quantized integer vectors or FLOAT32 for floating-point vectors.
- NUMNEIGHBORS (INT): The actual number of neighbors (edges) that this node has in the graph. This value can vary from node to node and is bounded by the maximum degree parameter specified during index creation. Nodes with higher connectivity typically serve as "hub" nodes that facilitate efficient graph traversal.
- NEIGHBOR<N> (INT): A series of columns (NEIGHBOR1, NEIGHBOR2, ..., NEIGHBOR64) that store the NODEIDs of this node's neighbors in the graph. The maximum of 128 neighbors represents the upper bound on node degree in the current implementation. During graph traversal, the search algorithm examines these neighbor relationships to navigate toward the query vector's nearest neighbors.
The graph structure itself is constructed using a greedy algorithm that iteratively adds vectors and establishes edges based on proximity in the vector space. The resulting graph exhibits "small-world" properties: most nodes can be reached from any starting point through a relatively short path, enabling logarithmic search complexity in practice.
Compressed Vectors Table: Space-Efficient Vector Storage
While the graph table contains full-precision vectors necessary for accurate distance calculations during certain phases of search, the compressed vectors table provides a space-efficient alternative representation used during the initial stages of candidate filtering. This two-tier approach—using compressed vectors for broad filtering and full vectors for refinement—is a key optimization that reduces memory bandwidth and I/O requirements.
Each row in the compressed vectors table corresponds to a node in the graph and contains:
- NODEID (BIGINT): The unique node identifier that links this compressed vector back to its corresponding entry in the graph table.
- COMPVECT (VARBINARY(<compressed vector size>)): The compressed representation of the base vector. The product quantization compression scheme used by Db2 reduces the vector's storage footprint while preserving enough information to compute approximate distances. The maximum size of 512 bytes accommodates compressed representations of high-dimensional vectors while maintaining a fixed upper bound on storage per vector.
During search execution, the query engine first uses compressed vectors to rapidly identify a broad set of candidates, then refines this candidate set using the full-precision vectors from the graph table. This staged approach significantly reduces the number of expensive full-precision distance calculations required, directly impacting query latency. Also, there is an expectation is that this compressed vector table is fully in memory in order to get the most benefit of this optimization, but discussing this also is beyond the scope of this blog, and we will come back to this point in a subsequent blog.
Search Algorithm in a Nutshell
The DiskANN-based vector index in Db2 uses a greedy search strategy to quickly find the nearest neighbors of a query vector. This approach is designed to balance speed and accuracy, even when working with large datasets stored on disk.
There are two key parameters that control the search:
- Search List Size (L): How many candidate points we keep track of during the search in the search list that is ordered by distance to the query point.
- Search Beam Width (W): How many candidates we explore in each step.
Here’s the simplified process from the starting point to find the K closest points to the reference query point:
- Build a Candidate List: We begin with the starting node in our search list.
- Iterative Search:
- From the current search list, pick the W closest candidates to the query using a fast, compressed representation of the data (this is very efficient because the compressed vectors are cached in memory, more on this later).
- Load the full-precision vectors for these W candidates from the graph table, along with their neighbors. Each of these will result in a single disk access, as both the vector and the neighbor list are stored together in the graph table.
- Compute accurate distances to each of the neighbors using the full-precision data.
- Add the neighbors of these candidates to the search list of size L for further exploration, and when the list is full, discarding the ones that are more distant than the last element in the list.
- Stop When Done: The process continues until the search list is exhausted or an early stopping condition is met.
- Return Results: Finally, the algorithm returns the K closest points to the query.
The speed vs accuracy trade-off explained
The DiskANN-based vector index in Db2 achieves its remarkable performance through an approximate nearest-neighbor (ANN) search strategy that trades perfect accuracy for dramatic speed improvements. Understanding this trade-off and how to control it through search parameters is essential for optimizing the index for your specific workload requirements.
Search Parameters That Control the Trade-off
Two key parameters govern the behavior of the ANN search algorithm at query time: SEARCH_LIST_SIZE (L) and SEARCH_BEAM_WIDTH (W). These parameters are specified in the similarity search query itself through OPTGUIDELINES, allowing you to tune the speed-accuracy balance on a per-query basis without rebuilding the index. This is an example of such a query extracted from the EAP documentation where we are searching for the 10 nearest-neighbors (K) and using a SEARCH_LIST_SIZE (L) of 100:
SELECT C0,
VECTOR_DISTANCE(CI3, VECTOR('[5,9,11]', 3, INT8), EUCLIDEAN) AS DISTANCE
FROM VS1
ORDER BY DISTANCE
FETCH FIRST 10 ROWS ONLY
/* <OPTGUIDELINES> <IXSCAN TABLE='VD1' SEARCH_LIST_SIZE='100'/> </
OPTGUIDELINES> */;
SEARCH_LIST_SIZE (L): Controlling Search Breadth
The SEARCH_LIST_SIZE parameter determines the maximum number of candidate nodes maintained in the search list during graph traversal. This ordered list tracks the most promising candidates encountered so far, ranked by their distance to the query vector.
How it works:
As the algorithm explores the graph, it continuously adds newly discovered nodes to the search list.
When the list reaches size L, the algorithm discards the farthest candidate before adding a closer one.
The final K nearest neighbors are selected from this list of L candidates.
Impact on performance:
- Larger L values (e.g., 100-200): The algorithm maintains a broader set of candidates, increasing the probability of capturing the true nearest neighbors. This improves recall but requires more memory to maintain the candidate list and triggers more distance computations, increasing query latency.
- Smaller L values (e.g., 20-50): The algorithm focuses on a narrower set of candidates, reducing memory overhead and computational cost. This accelerates queries but may miss true nearest neighbors if they're pruned too early, reducing recall.
Tuning guidance:
- Use the default L=50 as a baseline for most workloads.
- L should typically be at least 2-3× the desired k
- For recall-critical applications, increase L to 100-200
- For latency-sensitive applications (e.g., real-time recommendations), decrease L to 10-25
- Monitor recall metrics by comparing the exact results with the approximate results, and adjust accordingly
SEARCH_BEAM_WIDTH (W): Controlling Search Depth per Iteration
The SEARCH_BEAM_WIDTH parameter determines how many candidates are expanded (explored) in each iteration of the graph traversal.
How it works:
In each iteration, the algorithm selects the W closest unexplored candidates from the search list.
For each of these W candidates, it loads the full-precision vector and neighbor list from the graph table (requiring disk I/O when those rows are not yet present in the buffer pool).
It computes accurate distances to all neighbors and adds them to the search list for future exploration.
Impact on performance:
- Larger W values (e.g., 10-50): The algorithm explores more nodes per iteration, broadening the search frontier and increasing the likelihood of discovering optimal paths through the graph. This improves recall but increases disk I/O (W random reads per iteration from the graph table, if not already present in the buffer pool) and also CPU utilization for distance calculations.
- Smaller W values (e.g., 1-2): The algorithm follows fewer paths per iteration, reducing disk access and computational overhead. This minimizes latency but may cause the search to get "stuck" in local regions of the graph, missing distant but relevant neighbors. The default of 2 falls in this category.
Tuning guidance:
- Start with the default W = 2 as a baseline for minimal latency
- For high-recall requirements, increase W to 10-20 to explore more graph paths
- W should be tuned in conjunction with L—higher L values can compensate for lower W values and vice versa.
Understanding the Trade-off: The L-W Interaction
The relationship between SEARCH_LIST_SIZE and SEARCH_BEAM_WIDTH is not independent—these parameters interact to determine overall search behavior:
Complementary effects:
- High L, Low W: Maintains many candidates but explores them conservatively. Good for datasets where the graph has clear "highways" to nearest neighbors. Reduces disk I/O per iteration but may require more iterations to converge.
- Low L, Low W: Minimum latency at the cost of reduced recall. Use for real-time applications with strict latency SLAs where approximate results are acceptable.
- High L, High W: Maximum recall at the cost of maximum resource consumption. Use for offline batch processing or when accuracy is paramount.
- Low L, High W: Maintains fewer candidates but explores them aggressively. Useful when the graph structure is complex or when nearest neighbors are scattered. Increases disk I/O per iteration but may converge faster.
Practical tuning strategy:
- Establish baseline: Start with the defaults of L=50, W=2 and measure recall and latency.
- Optimize for recall: If recall is insufficient, first increase L (cheaper than increasing W), then increase W if needed to get additional accuracy.
- Optimize for latency: If queries are too slow, decrease L, always measuring latency improvements, until you find reach your expected latency. You may try to increase W afterwards to see if that improves recall without hurting latency.
- Validate at scale: Test with representative query workloads and dataset sizes—optimal parameters may shift as data volume grows.
The beauty of these search-time parameters is that they allow you to adapt the index behavior to different query types without rebuilding. A single index can serve both high-recall analytical queries (large L and W) and low-latency operational queries (small L and W) by simply adjusting the parameters in the SQL query itself.
Seeing It in Practice: The Entity Customer Workload
To demonstrate the vector index capabilities in a realistic scenario, we'll walk through a practical example using an enriched customer dataset. This workload illustrates how vector indexes can power semantic search over complex, multi-dimensional customer profiles—a common use case in customer analytics, personalization, and recommendation systems.
Dataset Construction: From TPC-DS to Entity Customer
Our example builds upon the standard TPC-DS benchmark customer table, enriching it with comprehensive transactional and behavioral data to create a holistic "entity customer" representation. This enrichment process transforms simple customer records into rich, multi-faceted profiles suitable for semantic similarity search.
Data Enrichment Process
Starting with the base TPC-DS customer table, we augment each customer record with three categories of behavioral data:
1. Top 3 Purchased Items For each customer, we identify their three most frequently purchased items, capturing:
- Item description and detailed categorization (category, class, color)
- Current item price
- Total spending on the item
- Return amounts
- Average spending minus returns with categorical classification (very_high, high, medium, low, very_low)
2. Top 3 Stores We track the customer's three most-visited stores, including:
- Store operational details (number of employees, division, company)
- Geographic information (city, county, state, ZIP, country)
- Customer's spending and return behavior at each store
- Net spending classification
3. Monthly Spending Patterns We capture the customer's financial behavior over the past three years with monthly granularity:
- Total spending per month
- Total returns per month
- Net spending (spending minus returns)
- Temporal context (year and month)
- Spending range classification
JSON Representation
This enriched data is serialized into a JSON structure for each customer, creating a comprehensive text representation that captures the customer's profile, preferences, and behavior patterns. The JSON format provides a flexible, human-readable representation that preserves the hierarchical relationships in the data.
Here's an example of parts of the entity customer record for customer ID 148:
{
"CUSTOMER": {
"C_CUSTOMER_SK": 148,
"C_FIRST_NAME": "George",
"C_LAST_NAME": "Wiseman",
"C_AGE": "57",
"C_BIRTH_COUNTRY": "PHILIPPINES",
"top_n_item_sp_minus_ret": [
{
"I_CURRENT_PRICE": 3.73,
"I_CATEGORY": "Electronics",
...
"SPENDING": 10637.15,
"RETURN_AMT": 0.00,
"AVG_SPENDING_MINUS_RETURNS": 10637.1
},
...
],
"top_n_store_sp_minus_ret": [
{
"S_NUMBER_EMPLOYEES": 220,
"S_CITY": "Four Corners",
...
"SPENDING": 26452.80,
"RETURN_AMT": 0.00,
"SPENDING_MINUS_RETURNS": 26452.80
},
...
],
"cust_sp_minus_ret_last_n_month": [
{
"SPENDING": 57221.34,
"RETURN_AMT": 792.21,
"SPENDING_MINUS_RETURNS": 56429.13,
"YEAR": 2001,
"MONTH": 12
...
},
...
]
}
}
This example shows George Wiseman, an older customer from the Philippines, who exhibits high spending patterns on electronics, children's products, and classical music. His shopping behavior is distributed across stores in Georgia, Texas, and North Carolina, with consistently high monthly spending throughout late 2001, though with some return activity in August.
Vector Embedding Generation
Once the entity customer data is in JSON format, we generate vector embeddings that capture the semantic meaning of each customer profile. These embeddings enable similarity-based queries such as "find customers similar to this high-value customer" or "identify customers with comparable purchasing patterns."
Embedding Model: Slate 125m
We use the Slate 125m embedding model to transform the JSON string representation of each customer into a dense vector representation. Slate 125m is specifically designed for encoding structured and semi-structured text, making it well-suited for our entity customer JSON documents.
Key characteristics:
- Dimensionality: 768 dimensions (FLOAT32)
- Input: Raw JSON string representation of the customer entity
- Output: Dense vector embedding capturing semantic relationships
Distance Metric: Euclidean Distance
For this workload, we use Euclidean distance as the similarity metric. Euclidean distance measures the straight-line distance between two points in the 768-dimensional vector space, making it appropriate for embeddings from models like Slate that produce normalized or near-normalized vectors.
The choice of Euclidean distance over other metrics (such as Cosine similarity) depends on the embedding model's characteristics. Slate 125m embeddings work well with Euclidean distance because the model is trained to produce vectors where geometric distance correlates with semantic similarity.
In Db2, when creating the vector index, we specify this as:
CREATE VECTOR INDEX entity_customer_vec_idx
ON entity_customer(embedding_vector)
WITH DISTANCE EUCLIDEAN
Results
The chart below illustrates the performance and recall tradeoffs of approximate vector index search compared to exact similarity search across datasets of 1, 5, and 10 million 768-dimensional FLOAT32 vectors, using a fixed memory budget of 5% of the dataset size and retrieving the top-10 nearest neighbors. The bars show the speedup achieved by the vector index, while the line indicates recall relative to exact results. Three search configurations are evaluated: parameters tuned to prioritize speed (L=10, W=2), default parameters (L=50, W=2), and parameters tuned to prioritize recall (L=100, W=2). When speed is prioritized, the index delivers the highest acceleration, reaching approximately 881× speedup at 10 million vectors with around 68% recall. Default search parameters provide a balanced operating point, achieving roughly 375× speedup while maintaining about 90% recall. When recall is prioritized, more conservative search behavior increases graph exploration, resulting in higher recall—up to approximately 97%—while still delivering substantial performance gains of around 148× at the largest scale. Across all configurations, speedup increases with dataset size, highlighting both the scalability of the vector index and the ability, through tuning, to achieve anything from extreme performance to near-exact result quality with orders-of-magnitude improvement over exact similarity search.
Conclusion
The Db2 vector index, built on the DiskANN algorithm, represents a powerful tool for scaling semantic search and similarity-based applications to production workloads. By understanding the index's internal architecture, you can make informed decisions about tuning parameters and deployment configurations that align with your specific performance and accuracy requirements. This latest Early Access Program release of this feature marks a significant step forward in the feature's evolution, introducing critical capabilities like expanded distance metric support, removed scale limitations, and comprehensive tuning parameters. These enhancements provide the flexibility needed to optimize the speed-accuracy trade-off for diverse workloads, from latency-sensitive real-time applications to recall-critical analytical queries. As demonstrated through the Entity Customer workload, vector indexes enable sophisticated customer intelligence applications that would be impractical with exact similarity search at scale. The ability to perform semantic search over millions of rich, multi-dimensional customer profiles opens new possibilities for personalization, recommendation, and predictive analytics. We encourage you to experiment with the vector index feature in your own applications and share your feedback as we continue to refine the capability toward general availability. The tuning strategies and architectural insights covered in this blog provide a foundation for getting started, but every workload has unique characteristics that may benefit from further optimization.
For more information about the Db2 vector index and to participate in the Early Access Program, visit the Db2 LUW Early Access program page.
About the Authors
Christian Garcia-Arellano is STSM and Db2 OLTP Architect at the IBM Toronto Lab, and has a MSc in Computer Science from the University of Toronto. Christian has been working in various DB2 Kernel development areas since 2001. Initially Christian worked on the development of the self tuning memory manager (STMM) and led various of the high availability features for DB2 pureScale that make it the industry leading database in availability. More recently, Christian was one of the architects for Db2 Event Store, and the leading architect of the Native Cloud Object Storage feature in Db2 Warehouse. Christian can be reached at cmgarcia@ca.ibm.com.
Zach Hoggard is a Senior Software Developer for IBM Db2 in the IBM Data & AI organization at the IBM Canada Lab. Zach has over 10 years of Db2 Kernel development experience and his areas of expertise are the Buffer Pool and Storage Manager components of Db2, but also has extensive experience with all Db2 Kernel components. Zach has strong interest in Db2 problem determination as well as Db2 system design in the Kernel area. Zach is an IBM Certified Database Associate for Db2 v11.1 and v10.5, and holds a Bachelor’s degree in Software Engineering from Western University in London, Ontario, Canada.
Chris Stojanovski is an Advisory Software Developer for IBM Db2 in the IBM Data & AI organization at the IBM Canada Lab. Chris has been with IBM for five years, contributing primarily to the Index Manager team while also working with Data Management Services and XML. As a member of the Db2 Kernel team, he is passionate about expanding his knowledge of database internals and system architecture. Chris holds a Bachelor of Computer Science from the University of Waterloo, a Bachelor of Business from Wilfrid Laurier University, and a Master’s in Computer Science from Western University.
#Db2 #Db2Warehouse #AI #vector