From Chat to Action:
Connecting AI Agents to Enterprise Data with the IBM watsonx.data intelligence MCP Server
Ravi Prasad Pentakota | IBM watsonx.data intelligence | March 2026
This post is written for developers and data engineers building AI agents on MCP-compatible frameworks. Familiarity with SQL and basic agent concepts is assumed; no prior knowledge of watsonx.data intelligence is required.
1. The Problem
If you've built an AI agent that touches enterprise data, you've already hit the wall. The agent generates SQL that looks plausible but fails at runtime because it guessed a table name that doesn't exist in your schema. Or it suggests a change, you ship it, and something downstream breaks that nobody thought to check. These aren't edge cases — they're the default failure modes when agents operate without real knowledge of your data estate.
These aren't model quality problems. The model is doing its job. The problem is that the agent is operating without the context it needs to make good decisions about your specific data estate. The standard workarounds — stuffing schema docs into the system prompt, hardcoding table names, building bespoke retrieval pipelines for each agent — don't scale. They're brittle, expensive to maintain, and they still don't give the agent real-time signals about data quality, ownership, lineage, or access controls.
What agents actually need is a semantic layer: a governed, queryable interface to metadata about your data — what exists, what it means, how trustworthy it is, and what depends on it. That's the context layer that turns an agent that guesses into one that knows.
2. A Governed Context Layer for AI Agents
Watsonx.data intelligence is IBM's metadata management and data governance platform — it holds your data catalog, quality scores, lineage graphs, business glossary, data protection rules, and data products. The watsonx.data intelligence MCP server exposes all of that as structured, callable tools at agent runtime, through any MCP-compatible framework.
MCP turns watsonx.data intelligence into a runtime context layer for AI agents — exposing governed metadata as callable tools that agents can reason over in real time.
MCP (Model Context Protocol) is an open standard that allows AI agents to call external tools and data sources through a consistent, framework-agnostic interface — letting any MCP-compatible agent runtime connect to any MCP server without custom integration code.
Unlike traditional APIs, MCP exposes tools with structured schemas and natural language descriptions that agent frameworks can dynamically discover and reason over — eliminating the need for hardcoded integrations.
There are two deployment options: a managed, IBM-hosted endpoint available now in tech preview, and an open-source package you can run locally or in any environment you control. We'll cover both in detail in Section 6.
3. Our MCP Server Implementation
If you're already familiar with MCP as a protocol, this section focuses on how we've implemented it specifically — and what that means for what your agents can actually do.
New to MCP? At a protocol level, MCP works as a client-server handshake: the agent (client) calls tools/list to receive the full catalog of available tools with their typed schemas, then calls tools/call to invoke a specific tool. The server executes the call and returns a structured response. No custom routing or integration code is required on the client side — the agent reasons over tool descriptions to select the right tool for each step. For a full protocol reference, see the MCP specification at modelcontextprotocol.io.
3.1 Tool Architecture
With these MCP tools, AI agents can discover datasets without knowing schema names, validate data quality before running queries, check lineage before deploying schema changes, generate accurate SQL grounded in the catalog, and manage governed data products. The tools span five capability domains: Asset Discovery, Automated Data Governance, Lineage Analysis, Data Analysis (Text-to-SQL), and Data Product Management. Each tool has a typed input schema, a typed output, and a natural language description that agents use during tool selection. Here’s what a representative tool definition looks like:
|
{
"name": "get_data_quality_for_asset",
"description": "Retrieve data quality metrics for a specific asset, including
overall quality score and dimensions: consistency, validity,
and completeness.",
"inputSchema": {
"type": "object",
"properties": {
"asset_id_or_name": {
"type": "string",
"description": "UUID or name of the asset"
},
"container_id_or_name": {
"type": "string",
"description": "Project or catalog name"
},
"container_type": {
"type": "string",
"enum": ["project", "catalog"]
}
},
"required": ["asset_id_or_name", "container_id_or_name", "container_type"]
}
}
|
When an agent calls tools/list, it receives the full catalog of tools with schemas like this. The agent reasons over the descriptions to select the right tool for each step of the task — no hardcoding, no routing logic required on the client side.
3.2 Connection Configuration
Connecting to the managed server (fastest path):
|
{
"mcpServers": {
"watsonx-di": {
"command": "npx",
"args": [
"mcp-remote",
"https://api.dataplatform.cloud.ibm.com/semantic_agents/public/v1/mcp_server/mcp",
"--header",
"x-api-key:<your-api-key>"
]
}
}
}
|
For the open source / local server:
|
{
"mcpServers": {
"watsonx-di-local": {
"command": "node",
"args": ["path/to/data-intelligence-mcp-server/dist/index.js"],
"env": {
"IBM_API_KEY": "<your-api-key>",
"INSTANCE_URL": "<your-saas-or-cpd-endpoint>"
}
}
}
}
|
4. What you can build: agent capabilities
4.1 Asset Discovery
Your agent can discover datasets without knowing a single table name. Point it at a question like “find our customer satisfaction data” and it queries the catalog using natural language, returning matching assets with enriched schemas, quality scores, and ownership metadata. No schema docs in the prompt. No hardcoded table names. This is the entry point for almost every agent workflow that touches data.
4.2 Automated Data Governance
Your agent can onboard a new data source end-to-end from a single natural language prompt: import metadata from the connected source, enrich it with profiling and LLM-generated descriptions, run data quality analysis, classify sensitive columns automatically, and deploy data protection rules. Workflows that previously required hours of manual steward effort become single agent calls.
4.3 Lineage Analysis
Your agent can pull upstream and downstream lineage graphs for any asset in the catalog, drawn from real traced lineage — not model inference. This unlocks three concrete workflows:
• Impact analysis — before making a schema change, an agent queries downstream lineage to identify every table, ETL job, and dashboard that depends on the affected asset, so the developer knows the full blast radius before touching anything.
• Root cause analysis — when a data pipeline fails or a dashboard shows unexpected values, an agent traces upstream lineage to find where the bad data entered the system, dramatically reducing the time to identify and fix the source of the issue.
• Regulatory audit preparation — compliance teams can use an agent to generate a complete, point-in-time lineage trace for any data asset, showing exactly where data originated, how it was transformed, and where it flows — documentation that would otherwise take days to assemble manually.
4.4 Data Analysis / Text-to-SQL
Your agent can translate natural language questions into SQL, execute them via secure read-only access, and optionally persist results as reusable assets in the catalog. The key differentiator is how SQL is generated: rather than guessing at table and column names, the agent grounds every query in the enriched metadata catalog. Enterprise schemas are full of opaque abbreviations — a column called ARR_DLY_MIN or a table called OPS_EVT_FCT_42 means nothing to a model without context. Watsonx.data intelligence enriches that metadata with LLM-generated descriptions and vectorizes it, so the agent can match natural language questions to the right tables and columns before generating a single line of SQL. The result is queries that actually execute — and return correct results.
4.5 Data Product Management
Your agent can manage the full data product lifecycle: discover publishable assets, package them into data products, attach governance contracts and delivery methods, assign business domains, and publish — all through tool calls. This enables automated publishing workflows that would otherwise require manual steward intervention at each step.
As a concrete example: a data platform team receives a request to publish a new “Customer 360” data product. Previously, a steward would manually identify candidate tables, document their schema, apply classification tags, draft a data contract, and route for approval — a process taking several days. With the MCP server, an agent can execute that entire workflow from a single prompt: it calls search_asset to locate candidate tables, get_data_quality_for_asset to confirm they meet quality thresholds, classify_asset to apply sensitivity labels, create_data_product to package them, and publish_data_product to make them discoverable — end to end, in minutes.
5. Walkthroughs: Agents in Action
5.1 Data Engineering: Schema Change Impact Analysis
Every data team knows the pain of making a schema change and having something unexpectedly break downstream. This walkthrough shows how a coding agent uses the watsonx.data intelligence MCP server to get a full impact analysis before any change is deployed.
The Prompt
|
Demo Prompt
Help me create a change to a schema in my SQL-based database. In the eu_daily_trades table: 1) Rename TRADE_AMOUNT → GROSS_TRADE_AMOUNT, 2) Add NET_TRADE_AMOUNT (nullable for now). Before deploying, give me a report of the downstream impact of this change using watsonx.data intelligence lineage.
|
What the Agent Does
The coding agent writes the SQL migration script, then — before executing it — calls the watsonx.data intelligence MCP server to pull real downstream lineage:
|
Step 1: lineage_search_lineage_assets(query="EU_DAILY_TRADES")
→ Resolves asset to lineage ID: wxdi://asset/eu_daily_trades_abc123
Step 2: lineage_get_lineage_graph(
asset_id="wxdi://asset/eu_daily_trades_abc123",
upstream_hops=0,
downstream_hops=3
)
→ Returns:
Downstream tables: TRADES_SUMMARY, RISK_EXPOSURE_DAILY, EU_COMPLIANCE_AGG, PNL_DAILY
ETL pipeline jobs: nightly_trades_aggregation, risk_calc_pipeline
BI dashboards: Trading Desk Overview, Risk Exposure Report, EU Compliance Dashboard
|
The agent returns a structured impact report before any change is deployed. The developer knows exactly what to coordinate, test, and communicate — derived from real lineage, not guesswork. You get the blast radius before the blast.
5.2 BI Agent: Metric Definition + Trusted SQL Execution
This walkthrough demonstrates how a BI agent uses watsonx.data intelligence to answer business questions against a schema it has never seen — without hallucinating a single table or column name.
The Prompt
|
Demo Prompt
Help me do some data analysis on our net promoter score data. I want to answer: 1) What is the NPS for customers delayed by more than 45 mins? 2) How does a customer's loyalty tier impact their NPS score on long-haul vs short-haul flights?
|
What the Agent Does
Without watsonx.data intelligence, the agent would have to guess table names like FBK_METRIC_REC_88 and column names like LVL_CD. Instead, it uses the catalog:
|
Step 1: search_asset(query="NPS net promoter score feedback")
→ Returns: FBK_METRIC_REC_88 ("Customer Feedback NPS Record")
Step 2: search_asset(query="flight operations delay data")
→ Returns: OPS_EVT_FCT_42 ("Flight Operations Event Fact Table")
Step 3: get_asset_details(asset_id="FBK_METRIC_REC_88", ...)
→ Enriched schema:
NPS_SCORE → "Net Promoter Score (0-10)"
LVL_CD → "Loyalty Tier Level Code (GOLD, SILVER, BRONZE)"
TRIP_LEN_CD → "Trip Length Code (Long-haul, Short-haul)"
Step 4: get_asset_details(asset_id="OPS_EVT_FCT_42", ...)
→ Enriched schema:
ARR_DLY_MIN → "Arrival Delay in Minutes"
FK: joins to FBK_METRIC_REC_88 on BOOKING_REF
Step 5: generate_sql_query(
question="NPS for customers delayed > 45 mins",
assets=["FBK_METRIC_REC_88", "OPS_EVT_FCT_42"]
)
→ Generates valid SQL using real table names, real columns, real join condition
Step 6: sql_query_execution(sql="...", connection="reporting_db")
→ Result: NPS = -24.3 for customers delayed >45 mins (100% became detractors)
|
The agent didn't guess a single table or column name. Every reference in the generated SQL came from the catalog. The enriched metadata — not just raw schema, but LLM-expanded descriptions and relationship mappings — is what makes the difference between SQL that executes correctly and SQL that hallucinates.
6. Deployment Options
There are two ways to run the watsonx.data intelligence MCP Server.
Use Managed if:
• you want zero infrastructure — IBM hosts and operates the server
• you want the full IBM toolset, including tools that wrap internal IBM APIs not in the open source package
Use Open Source if:
• you want local experimentation
• you want to extend the toolset — fork it, add custom tools for your own enterprise systems
The managed server includes a superset of tools, including tools that wrap internal IBM APIs not exposed in the open source package. The comparison table below has the full details.
|
|
Managed (SaaS)
|
Open Source / Local
|
|
Hosted by
|
IBM
|
You
|
|
Infrastructure
|
Zero — IBM managed
|
Self-managed
|
|
Auth
|
API key (x-api-key header)
|
API key (configurable)
|
|
Tool set
|
Full — incl. internal IBM APIs
|
Core tool set
|
|
Availability
|
Available now in SaaS tech preview; Software June 2026
|
Available now
|
|
Best for
|
Enterprise production workloads
|
Dev/test, custom builds
|
|
Extensibility
|
Standard IBM toolset
|
Fully open source — fork and extend
|
6.1 Managed MCP Server (SaaS)
The managed server runs inside IBM Cloud infrastructure, co-located with the watsonx.data intelligence SaaS services. Tool calls never traverse external networks. Authentication is via API key in the x-api-key request header.
Available now in tech preview; generally available and included in the Software (on-prem) version beginning 1H 2026. The managed server includes the full tool set, including tools that wrap internal IBM APIs not available in the open source implementation.
Security and access control: All tool calls are scoped to the permissions of the authenticated API key — the MCP server enforces the same role-based access controls as the watsonx.data intelligence platform itself. Agents cannot read, modify, or discover assets that the authenticated user does not have permission to access. Sensitive column classifications and data protection rules applied through the Automated Data Governance tools are enforced at query time, not just at cataloguing time, so a Text-to-SQL agent cannot inadvertently return masked or restricted data through a generated query.
6.2 Open Source / Local
The open source server is available on IBM GitHub. You can target either a watsonx.data intelligence SaaS or on-premises environment.
Because it's fully open, you can inspect the implementation, add custom tools for your own enterprise systems, and fork freely. Best suited for development, experimentation, and custom agent builds.
7. Getting Started
The fastest path:
1. Start a watsonx.data intelligence SaaS trial at https://www.ibm.com/solutions/data-intelligence
2. Generate an API key from your platform settings
3. Add the managed server config to your MCP client (Claude, Langflow, or any MCP-compatible framework — see Section 3.2)
4. Run one of the demo prompts from Section 5 to verify the connection
To self-host instead, visit the watsonx.data intelligence open source MCP GitHub page for full installation instructions, configuration guides, and examples:
github.com/IBM/data-intelligence-mcp-server
8. Closing
The agents that hold up in production aren't the ones with the best prompts — they're the ones with the best context. Governed metadata, reliable data quality signals, real lineage, enforced access controls: this is what turns an agent from a capable prototype into something you can trust with your data estate. The watsonx.data intelligence MCP server puts that context layer in reach of any agent developer, through any MCP-compatible framework.
Whether you're building a data engineering copilot, a BI agent, a governance automation workflow, or something we haven't thought of yet — the full capability of watsonx.data intelligence is available to your agents through a governed semantic layer.
#watsonx.data