From Simple Instructions to Precision and Guardrails: The Journey of Engineering Production-Ready Prompts for IBM InfoSphere Change Data Capture
Part 3: The Evolution of the Prompt: From Simple to Complex
Authors: HS Manoj Kumar, Dev Sarkar
Recap: From Failures to Solutions
In Part 2, we documented nine critical failure patterns in our naive prompt: from engine name contamination to confident hallucinations. Each failure revealed a gap between what LLMs naturally do (be helpful) and what we needed them to do (be accurate).
Now comes the transformation: twelve specific changes that turned our simple prompt into a rigorous rules engine.
The Evolution: Twelve Critical Transformations
Each transformation addresses one or more of the failures we encountered. We'll show you the problem, the solution we implemented, and the real-world impact.
1. Product Name Normalization (Solving the Terminology Chaos)
The Problem: CDC has evolved through multiple branding iterations. The same engine might appear as:
- "CDC Oracle"
- "IIDR CDC Oracle"
- "InfoSphere CDC for Oracle"
- "CDC Replication Engine for Oracle"
Our retrieval system would find documents using different names, and the model treated them as separate products.
The Solution: We created an equivalence map at the top of the prompt:
CDC Oracle ≡ IIDR CDC Oracle ≡ CDC Replication Engine for Oracle ≡
InfoSphere CDC for Oracle ≡ IBM CDC for Oracle
We did this for all the CDC Engines and the major components.
The Impact:
- Before: Fragmented answers across "different products"
- After: Unified, coherent responses recognizing all names as equivalent
- Result: Reduction in "I don't have information about X engine" false negatives
Real example:
User: "How does IIDR handle log shipping for Oracle?"
Before: "I don't have information about IIDR and log shipping."
After: "IBM CDC for Oracle (also known as IIDR CDC Oracle) uses log-based capture..."
2. Context Relevance Filtering (Stopping the Noise)
The Problem: The model was pulling in any context that contained matching keywords, even when discussing completely different topics. A question about "containerization" would pull in HA cluster documentation just because it mentioned "deployment."
The Solution: We introduced a mandatory scoring system:
STEP 2: CONTEXT SCORING (CRITICAL)
For each piece of context:
- DIRECT MATCH: Context directly discusses the primary topic → USE THIS
- PARTIAL MATCH: Mentions topic but focuses on related aspects → USE CAREFULLY
- INDIRECT MENTION: Topic mentioned in passing → IGNORE
- NO MATCH: Different topic entirely → IGNORE
ONLY USE CONTEXT THAT:
- Directly addresses the primary topic identified in the question
- Contains specific, actionable information about that topic
- Uses the same key terminology as the question
Special handling for keyword-stuffed context from Milvus:
If context contains excessive keyword repetition:
→ Normalize importance by semantic content, not keyword count
→ Don't give disproportionate weight to repeated terms
→ Focus on unique information density
The Impact:
- Before: Answers mixed containerization (Docker) with HA clustering
- After: Clean separation between image building and cluster management
- Result: Reduction in "confused context" errors
- Bonus: Keyword-stuffed documents no longer dominated responses
Real example:
User: "How do I build a CDC Docker image?"
Before: [Mixed answer about Docker images AND HA cluster shared storage]
After: [Clean answer about Dockerfile configuration, image layers, and container setup]
3. Engine Category Validation (Preventing Impossible Scenarios)
The Problem: Users would ask if feature X works on engine Y, and the model would give generic CDC answers without checking if the feature was even applicable to that engine type.
Example:
- User: "Does DDL Awareness work for Teradata?"
- Model: "Yes, DDL Awareness captures schema changes..."
But Teradata is target-only. DDL Awareness only applies to source engines.
The Solution: We encoded engine categories explicitly:
ENGINE CATEGORIES:
SOURCE-ONLY: MySQL, DB2z Remote Engine, MariaDB
DUAL: Postgres, Oracle, DB2, MSSQL Server, Informix, Sybase, IBM i
TARGET-ONLY: Netezza, Teradata, Kafka, MongoDB, BigQuery, Snowflake, etc.
VALIDATION RULES:
- DDL Awareness → Only applicable to SOURCE or DUAL engines
- Apply methods/target mappings → Only applicable to TARGET or DUAL engines
- Before answering about a feature, validate it applies to the engine category
The Impact:
- Before: Generic answers that didn't account for engine capabilities
- After: Accurate "This feature does not apply to [engine type]" responses
- Result: Eliminated entire class of misleading answers
Real example after the fix:
User: "Does DDL Awareness work for Teradata?"
Model: "DDL Awareness is not applicable to Teradata. Teradata is a target-only engine in CDC, and DDL Awareness only applies to source engines that capture schema changes from databases."
4. Source vs Target Context Separation (Fixing Role Confusion)
The Problem: Questions about source configuration would get answers about target configuration, and vice versa. The model couldn't distinguish between source-side and target-side operations.
The Solution: We added explicit directionality detection:
SOURCE vs TARGET CLASSIFICATION:
Before answering, determine:
1. Is the question about SOURCE side (capture, scraping, log reading)?
Keywords: "source", "capture", "scrape", "redo logs", "transaction logs"
2. Is the question about TARGET side (apply, write, insert)?
Keywords: "target", "apply", "destination", "write to"
3. Is it about BOTH or CONNECTION between them?
Keywords: "replication", "subscription", "mapping"
FILTERING RULES:
- If question is about SOURCE → Ignore all target-specific context
- If question is about TARGET → Ignore all source-specific context
- If question is about END-TO-END → Include both but label clearly
The Impact:
- Before: Source and target contexts mixed freely in responses
- After: Clean separation based on question scope
- Result: User confusion about "which side" dropped significantly
Real example:
User: "How do I configure Oracle as a source?"
Before: [Mixed answer about source capture AND target apply methods]
After: [Clean answer focused solely on source-side configuration: redo log access, capture parameters, log reader settings]
5. Engine-Specific Scoping (The Nuclear Option)
The Problem: This was our biggest issue. Questions about Oracle CDC were getting answers that mixed in DB2, SQL Server, and other engines.
The Solution: We added hard-coded scoping rules:
ENGINE SCOPING (MOST CRITICAL):
1. SPECIFIC ENGINE QUESTIONS:
- Question mentions "Oracle", "SQL Server", "DB2" → Answer ONLY for that engine
- If no context for that engine → "No information available for [engine]"
2. GENERAL CDC QUESTIONS:
- No engine specified → Provide general approach
- If procedures vary by engine → State this explicitly
3. NEVER create engine sections that only say "No specific information available"
The Impact: This single change had the biggest effect on user trust.
- Before: "I asked about Oracle, why is it telling me about DB2?"
- After: Laser-focused answers for the requested engine only
Real example:
User: "How does CDC Oracle handle DDL replication?"
Before: "CDC uses log-based capture. For Oracle, you configure XStream... For DB2, you use Q Capture... For SQL Server..."
After: "CDC Oracle handles DDL replication through log-based capture using Oracle XStream. When a DDL event occurs, the capture agent..."
6. Query Variation Normalization (Fixing Semantic Blindness)
The Problem: Users would ask the same question in different ways and get inconsistent results:
- "rename table in CDC" → detailed answer
- "table rename process" → "no information available"
- "how to rename a table" → generic answer
The Solution: Added query analysis step:
QUERY VARIATION HANDLING:
Before filtering context, identify:
1. Core concept (e.g., "table renaming", "encryption", "DDL handling")
2. Acceptable variations (rename table, table rename, renaming tables)
3. Related synonyms (DDL → schema change, encryption → SSL/TLS)
When matching context:
→ Use semantic similarity, not exact string matching
→ Treat "rename table" and "table renaming" as identical
→ Look for concept matches, not word-for-word matches
The Impact:
- Before: Query phrasing dramatically affected result quality
- After: Consistent responses regardless of how the question was phrased
- Result: User satisfaction improved as the system felt "smarter"
7. Hallucination Prevention (Honesty Over Fluency)
The Problem: LLMs are trained to be helpful and complete. When information was missing, our model would "fill in the gaps" with plausible-sounding but incorrect details.
The Solution: Explicit anti-hallucination rules:
IF no directly relevant context exists:
→ State "No specific information available about [topic]"
IF insufficient context for engine-specific question:
→ "Based on available context, general CDC approach is X, but engine-specific
steps may vary. For [engine] details, please consult official documentation."
CONFIDENCE ASSESSMENT:
- If you do not have high confidence from the context retrieved state
"I do not know about this..."
- DO NOT try to imagine/cook up answers
The Impact:
- Before: Some of the answers contained fabricated "helpful" details
- After: Model says "I don't know" when it doesn't know
- Result: Accuracy improved as reviewed from continuous testing
User feedback shifted from "It gives wrong answers" to "Sometimes it doesn't have the answer, but what it does say is reliable."
Real example:
User: "Does CDC support MongoDB as a source?"
Before: "Yes, CDC supports MongoDB through the universal connector with log-based replication capabilities..."
After: "Based on available documentation, MongoDB is supported as a TARGET in CDC, not as a source. CDC cannot capture changes from MongoDB databases."
8. Domain Classification (Handling Ambiguity)
The Problem: Some concepts in CDC have multiple meanings depending on context. "Encryption" could mean:
- Agent-to-agent communication (in-flight)
- Staging store files (at-rest)
- Database log files
- CDC-to-database connections
- Masking sensitive data in CDC logs
Questions like "Does CDC support encryption?" got muddled answers mixing all these domains.
The Solution: Forced classification before answering:
ENCRYPTION DOMAINS IN CDC:
1. In-Flight Encryption (CDC-to-CDC Components)
2. At-Rest Encryption (Staging Store)
3. Database Logs Encryption
4. CDC-to-Database Connection Encryption
5. CDC Instance Logs Encryption (masking)
If question asks "Does CDC support encryption?":
→ First classify which domain
→ Then answer for THAT domain only
→ If question is ambiguous, explain all domains briefly
We did similar classification for:
- Containerization (Docker images vs. HA clusters)
- DDL handling (CREATE vs. ALTER vs. DROP vs. RENAME)
- Replication modes (continuous vs. scheduled/net change)
Real-world example:
User: "Does CDC support TDE?"
Classification: Database Logs Encryption (domain 3)
Answer: "Yes, CDC supports Transparent Data Encryption (TDE) for database logs, provided the CDC agent has proper access credentials and the appropriate CDC version is used. For Oracle TDE, CDC requires version X.X or higher..."
The Impact:
- Before: Encryption questions got mixed answers about multiple unrelated topics
- After: Clear, domain-specific responses
- Result: Elimination of ambiguous multi-domain answers
9. Structured JSON Output (Making It Machine-Readable)
The Problem: Free-text answers were:
- Hard to parse downstream
- Inconsistent in structure
- Difficult to audit
- Impossible to extract metadata from
The Solution: Mandated strict JSON schema:
{
"reasoning_steps": {
"question_analysis": "What aspect of CDC?",
"engine_scope": "Which engines?",
"context_coverage": "What info is available?",
"information_gaps": "What's missing?",
"source_quality": "Official docs vs internal discussions"
},
"answer": "Technical response with sources",
"confidence_level": "High/Medium/Low",
"missing_information": "Specific gaps",
"referenced_sources": {
"official_documentation": ["URLs"],
"technotes": ["URLs with descriptions"],
"internal_discussions": ["Slack references"]
},
"engine_coverage": {
"engines_with_info": ["Oracle", "DB2"],
"engines_missing": ["SQL Server"]
}
}
The Impact:
- Enabled automated quality scoring
- Made it possible to build dashboards tracking answer quality
- Allowed programmatic extraction of source citations
- Result: Improvement in our ability to audit and improve the system
10. Technote Attribution (Traceability)
The Problem: IBM publishes hundreds of technotes (TechNotes are IBM's knowledge base articles). Our model would often ignore these even when they were highly relevant, or mention them without linking.
The Solution: Mandatory technote rules:
TECHNOTES INCLUSION RULES:
- If a technote in context is semantically related → MUST be included
- Use semantic similarity, not exact string matching
- Always include complete URL
- Provide brief description of what it covers
The Impact:
- Before: Very limited relevant technotes were cited
- After: Increase in citation rate for relevant technotes
- Result: Verified every answer against official IBM sources during testing
Real example:
Before: "CDC supports encryption through SSL/TLS configuration."
After: "CDC supports encryption through SSL/TLS configuration. See IBM Technote #1234567 (https://www.ibm.com/support/pages/technote-1234567) which covers 'Configuring SSL/TLS encryption for CDC agent communication' for detailed setup instructions."
11. Voice Normalization (Maintaining Professional Identity)
The Problem: The model would use first-person plural ("we") in responses, creating confusion about whether the chatbot was speaking as IBM:
"We support encryption in CDC through SSL/TLS configuration. We recommend enabling this in production environments."
This made it sound like the chatbot itself was IBM, which was inappropriate.
The Solution: Added explicit voice rules:
VOICE AND TERMINOLOGY:
- NEVER use "we/us/our" to refer to IBM/CDC team
- Instead use: "IBM CDC", "the CDC team", "IBM documentation states"
- Maintain third-person professional tone
- Speak as a consultant, not as IBM itself
SYNONYM MAPPING:
WE ≡ IBM ≡ CDC ≡ IIDR ≡ IBM CDC team
→ Replace "we" with appropriate third-person reference
The Impact:
- Before: Chatbot sounded like it was claiming to be IBM
- After: Professional third-person consultant voice
- Result: Clearer identity separation between chatbot and company
Real example:
Before: "We support encryption and we recommend enabling it in production."
After: "IBM CDC supports encryption, and IBM documentation recommends enabling it in production environments."
12. Fact Deduplication (Eliminating Repetition)
The Problem: Early versions would repeat the same information multiple times:
- Once in "General CDC" section
- Again in engine-specific sections
- Again in the conclusion
The Solution: Fact merging rules:
FACT MERGING RULES:
- If multiple sources say the same thing → merge into single statement,
list all sources
- Don't restate same fact in "General CDC" and engine sections
- Keep "General CDC" strictly for engine-agnostic facts
- Include only ONE conclusion at the end
The Impact:
- Before: Verbose, repetitive responses that frustrated users
- After: Concise, non-redundant answers
- Result: Improved readability and user satisfaction
The Complete Picture: How the Transformations Work Together
These twelve transformations don't work in isolation—they form a system of checks and balances:
- Product Name Normalization ensures consistent terminology
- Context Relevance Filtering removes noise
- Engine Category Validation prevents impossible scenarios
- Source vs Target Separation maintains architectural clarity
- Engine-Specific Scoping keeps answers focused
- Query Variation Normalization handles different phrasings
- Hallucination Prevention enforces honesty
- Domain Classification resolves ambiguity
- Structured JSON Output enables quality monitoring
- Technote Attribution provides traceability
- Voice Normalization maintains professional identity
- Fact Deduplication improves readability
Together, they transform the LLM from a helpful but unreliable assistant into a rigorous technical support tool.
The Before and After: A Complete Example
Let's see how these transformations work on a real query:
User: "How do I set up DDL replication for Oracle in CDC?"
Before (Naive Prompt):
"CDC supports DDL replication across multiple databases. For Oracle, you configure XStream for log capture. For DB2, you use Q Capture with journal access. For SQL Server, you need CDC tracking enabled. We recommend testing in a non-production environment first. Contact support if you need help."
Problems:
- ❌ Mentions DB2 and SQL Server (engine contamination)
- ❌ Uses "we" (voice inconsistency)
- ❌ No sources cited
- ❌ Generic advice, no specific steps
- ❌ No confidence indication
After (Evolved Prompt):
{
"reasoning_steps": {
"question_analysis": "User asking about DDL replication setup for Oracle source",
"engine_scope": "Oracle only",
"context_coverage": "Documentation available for Oracle DDL Awareness feature",
"information_gaps": "Version-specific configuration details may vary",
"source_quality": "Official documentation + technote available"
},
"answer": "IBM CDC for Oracle (also known as IIDR CDC Oracle) supports DDL replication through the DDL Awareness feature, which is applicable because Oracle is a dual-purpose engine (can act as both source and target). To set up DDL replication:\n\n1. Enable supplemental logging on Oracle source database\n2. Configure the capture agent with DDL Awareness parameter enabled\n3. Subscribe to DDL events in the subscription configuration\n4. Set up event handlers for DDL event ID 9505\n\nWhen a DDL event occurs (CREATE, ALTER, DROP, RENAME operations), the subscription will pause and generate event 9505. IBM documentation recommends reviewing the captured DDL statement before proceeding.\n\nFor detailed configuration steps, see IBM Technote #1234567: 'Configuring DDL Awareness for Oracle in CDC' (https://www.ibm.com/support/pages/technote-1234567).",
"confidence_level": "High",
"missing_information": "Version-specific parameter names may vary between CDC 11.4 and 11.5",
"referenced_sources": {
"official_documentation": ["CDC for Oracle Configuration Guide"],
"technotes": ["#1234567: Configuring DDL Awareness for Oracle"],
"internal_discussions": []
},
"engine_coverage": {
"engines_with_info": ["Oracle"],
"engines_missing": []
}
}
Improvements:
- ✅ Oracle-only (no engine contamination)
- ✅ Third-person professional voice
- ✅ Sources cited with URLs
- ✅ Specific, actionable steps
- ✅ Confidence level indicated
- ✅ Engine category validated (dual-purpose)
- ✅ Source-side operation clarified
- ✅ Structured JSON for downstream processing
The Cost: From 200 Words to 5,000 Words
Our evolved prompt went from a simple 200-word instruction set to a comprehensive 5,000-word rules engine.
Was it worth it?
Absolutely. Every word was earned through a production failure. Every constraint prevents a specific class of wrong answers.
The prompt now includes:
- 15+ engine name equivalence mappings
- 3 engine category definitions with 40+ engines classified
- Context scoring criteria (4 relevance levels)
- Source vs target classification keywords
- Engine-specific scoping rules
- 5 encryption domain definitions
- JSON schema requirements
- Voice normalization rules
- Anti-hallucination guidelines
- Technote attribution requirements
- Fact deduplication rules
- Query variation handling logic
The Engineering Mindset: Prompts as Code
This evolution taught us to treat prompts like production code:
- Version controlled - Every change tracked in Git
- Tested - Regression suite with known failure cases
- Documented - Comments explaining why each rule exists
- Reviewed - Team review before deployment
- Monitored - Quality dashboards tracking effectiveness
Each of the twelve transformations went through:
- Problem identification - Real production failure
- Root cause analysis - Why did the naive prompt fail?
- Solution design - How do we prevent this systematically?
- Implementation - Add explicit rules to prompt
- Testing - Verify fix doesn't break other cases
- Monitoring - Track improvement in production
Conclusion: Complexity with Purpose
Our prompt is complex because CDC is complex. Our prompt is long because domain expertise can't be compressed.
Every transformation solved real problems:
- Product name normalization → unified terminology
- Context filtering → eliminated noise
- Engine validation → prevented impossible scenarios
- Source/target separation → architectural clarity
- Engine scoping → focused answers
- Query normalization → consistent experience
- Hallucination prevention → trustworthy responses
- Domain classification → resolved ambiguity
- JSON structure → quality monitoring
- Technote attribution → traceability
- Voice normalization → professional identity
- Fact deduplication → readability
In Part 4, we'll distill the key lessons from this journey and share principles that apply beyond CDC to any production LLM application where accuracy matters.
Next: Part 4 - Key Lessons Learned and Best Practices for Prompt Engineering
Previous: Part 2 - The Initial Prompt and the Failures We Encountered
#watsonx.ai
#PromptLab
#GenerativeAI