watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

Setting up IBM watsonx.data Developer Edition & Integrating it with Langflow

By Agnes George posted 30 days ago

  

Setting up IBM Milvus Developer Edition & Integrating it with Langflow

Overview

This guide explains how to install and configure IBM Milvus Developer Edition and integrate it with Langflow to build Retrieval-Augmented Generation (RAG) workflows. By the end, you'll have Milvus running locally, connected to Langflow, ready for experimentation with vector search and generative AI pipelines.

Prerequisites

  • A system running Red Hat Enterprise Linux (RHEL) 9, Ubuntu 22.04, or macOS
  • Docker or Podman (version ≥ 4.6)
  • Access to the IBM Container Registry (ICR)
    • For using IBM Cloud Container Registry, the credentials must use the cp user along with the entitlement key created from from the container software library

  • Network access to pull container images
  • Recommended Python Version Range: Python 3.10 - 3.12

Step-by-Step Installation of IBM watsonx.data Developer Edition (with Milvus)

1) Set up installation directory & environment variables

# Terminal
mkdir ~/watsonx_dev
cd ~/watsonx_dev

export LH_ROOT_DIR=$(pwd)
export LH_RELEASE_TAG=latest
export IBM_LH_TOOLBOX=cp.icr.io/cpopen/watsonx-data/ibm-lakehouse-toolbox:$LH_RELEASE_TAG
export LH_REGISTRY=cp.icr.io/cp/watsonx-data
export PROD_USER=cp
export IBM_ENTITLEMENT_KEY=<your_IBM_entitlement_key>
export IBM_ICR_IO=cp.icr.io
Why this matters: LH_ROOT_DIR sets your working directory; LH_RELEASE_TAG pins the version; registry variables enable image pulls using your IBM entitlement credentials.


2) Pull the developer package image and extract

# Choose docker or podman
export DOCKER_EXE=docker   # or podman

# Pull the toolbox image
$DOCKER_EXE pull $IBM_LH_TOOLBOX

# Extract the package
id=$($DOCKER_EXE create $IBM_LH_TOOLBOX)
$DOCKER_EXE cp $id:/opt /tmp/

# Verify and extract on host
cat /tmp/opt/bom.txt
cksum /tmp/opt/*/*
export LH_ROOT_DIR=/root/lakehouse
mkdir -p $LH_ROOT_DIR
tar -xf /tmp/opt/dev/ibm-lh-dev-*.tgz -C $LH_ROOT_DIR
Why this matters: You’re extracting the developer edition bundle (containers + scripts) and verifying checksums for integrity.

3) Authenticate to the IBM Container Registry

$DOCKER_EXE login ${IBM_ICR_IO} --username=${PROD_USER} --password=${IBM_ENTITLEMENT_KEY}

If you work in an air‑gapped/private environment, also login to your private registry.

4) Run the setup script

$LH_ROOT_DIR/ibm-lh-dev/bin/setup --license_acceptance=y --runtime=$DOCKER_EXE
# Optional:
# $LH_ROOT_DIR/ibm-lh-dev/bin/setup --license_acceptance=y --runtime=$DOCKER_EXE --password=<yourPassword>
Why this matters: Initializes the environment, pulls images, configures containers, and ensures license acceptance.

5) Start the services

$LH_ROOT_DIR/ibm-lh-dev/bin/start

After starting, required containers (including Milvus) should be running.

6) Verify Milvus service status

$LH_ROOT_DIR/ibm-lh-dev/bin/status --all
$LH_ROOT_DIR/ibm-lh-dev/bin/start-milvus   # start if needed
$LH_ROOT_DIR/ibm-lh-dev/bin/stop-milvus    # stop when needed

Once startup is complete, check the status and note the ports

7) Access the WatsonX Console

Open your browser and navigate to: https://localhost:<https_port>

  • Username: ibmlhadmin
  • Password: your custom password or password (default)

    host_port is port number of lhconsole-ui, typically 9443 or 8443

Installing and Launching Langflow

Install Langflow

# Option 1
pip install langflow -U

# Option 2: Using uv
pip install uv
uv pip install langflow -U

Run Langflow

langflow run
# By default, Langflow launches at: http://localhost:7860

Connecting Langflow to IBM Milvus

  1. Open the Langflow Playground (http://localhost:7860).
  2. Drag the Milvus component to the canvas.
  3. Enter connection details:
  • Connection URI: https://localhost:19530
  • Collection Name: langflow_collection
  • Primary Field Name: id
  • Text Field Name: text
  • Vector Field Name: embedding

Then add IBM watsonx.ai Embedding and Model components, configure API endpoint, project ID, and API key. For embeddings use ibm/slate-125m-english-rtrvr; for LLM use ibm/granite-3-2-8b-instruct.

Configure & Connect Milvus for a RAG Use Case

Create the Collection

import numpy as np
from pymilvus import MilvusClient, DataType

# Configuration
COLLECTION_NAME = "langflow_collection"
DIMENSION = 384  # For IBM Slate embedding model

mc = MilvusClient(uri="https://localhost:19530", token="<token-if-required>")

# Create schema
schema = mc.create_schema(auto_id=False, enable_dynamic_field=True)

# Add fields
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=65535)
schema.add_field(field_name="title", datatype=DataType.VARCHAR, max_length=65535)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=DIMENSION)

# Create collection
mc.create_collection(collection_name=COLLECTION_NAME, schema=schema)

print(f"✅ Created collection '{COLLECTION_NAME}'")

Create a Vector Index

# Prepare index parameters
index_params = mc.prepare_index_params()

# Add index configuration
index_params.add_index(
    field_name="embedding",
    index_type="AUTOINDEX",  # Automatic indexing
    metric_type="COSINE"     # Cosine similarity
)

# Create index
mc.create_index(collection_name=COLLECTION_NAME, index_params=index_params)

print("✅ Created vector index")

Configure IBM watsonx.ai Embeddings

from ibm_watsonx_ai.foundation_models import Embeddings

# IBM Cloud credentials
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": "YOUR_API_KEY"  # Use environment variables in production
}

project_id = "YOUR_PROJECT_ID"

# Initialize embedding model
embedding_model = Embeddings(
    model_id="ibm/slate-125m-english-rtrvr-v2",
    credentials=credentials,
    project_id=project_id,
    verify=False  # Set True in production with proper certificates
)

To create a watsonx.ai project refer the steps in creating-project. Here, you would have to create "YOUR_API_KEY" and get "YOUR_PROJECT_ID


Build Your First RAG Application

Step 1 — Prepare Sample Data

# Sample texts for demonstration
sample_texts = [
    "Machine learning is a subset of artificial intelligence that enables computers to learn from data.",
    "Vector databases store high-dimensional embeddings for similarity search applications.",
    "Natural language processing helps computers understand and generate human language."
]

# Generate embeddings
generated_embeddings = embedding_model.embed_documents(sample_texts)

# Prepare data for insertion
batch_data = [
    {
        "id": i + 1,
        "text": txt,
        "title": title,
        "embedding": vec
    }
    for i, (txt, title, vec) in enumerate(zip(
        sample_texts,
        ["ML Introduction", "Vector DB Guide", "NLP Fundamentals"],
        generated_embeddings
    ))
]

# Insert data
insert_result = mc.insert(collection_name=COLLECTION_NAME, data=batch_data)

print("✅ Data inserted successfully")
print(f"Inserted {len(batch_data)} documents")

Step 2 — Verify Data Insertion

# Get collection statistics
stats = mc.get_collection_stats(COLLECTION_NAME)
print(f"Collection stats: {stats}")

# List all collections
collections = mc.list_collections()
print(f"Available collections: {collections}")

Step 3 — Run a Test Query

# Example: Semantic search query
query_text = "What is machine learning?"

# Generate query embedding
query_embedding = embedding_model.embed_query(query_text)

# Search in Milvus
search_results = mc.search(
    collection_name=COLLECTION_NAME,
    data=[query_embedding],
    limit=3,
    output_fields=["text", "title"]
)

# Display results
for hits in search_results:
    for hit in hits:
        print(f"Title: {hit['entity']['title']}")
        print(f"Text: {hit['entity']['text']}")
        print(f"Score: {hit['distance']}")
        print("-" * 50)

Step 4 — Try Langflow Playground

Try the playground option on the top right corner of the langflow window and start querying there.

References

IBM Docs – watsonx.data Developer Edition install guides
Milvus.io - Milvus x Langflow


#watsonx.data
1 comment
37 views

Permalink

Comments

29 days ago

Very informative blog. Great work.