Some thoughts on Trustworthy Generative AI

View Only

Some thoughts on Trustworthy Generative AI - and a recorded webinar

By James Taylor posted Tue November 14, 2023 05:15 PM

Like

I recently gave a webinar on How to achieve more trustworthy Generative AI with Decision Automation and it's now available on-demand. You can also download the PDF of the slides here. I thoroughly enjoyed giving the webinar and it's definitely worth your time! Please email me or post any questions you have.

I was hoping to be joined by a colleague, Dr Jan Purchase, but we had some technical challenges and he wasn't able to make it. He shared a couple of really good points about the slides that I wanted to capture:

When mixing decision automation and GenAI/LLMs, one of the key questions is where the boundary between them should be. Jan observed that:

GenAI is very good at interpretation (processing inputs) and explicability (generating human outputs) but poor at:
- following complex rules
- spotting contradictions in complex rules
- providing a rationale for an answer/outcome - by quoting regulations for example
- explaining complex reasoning (although some keep trying to do this).
All of these things belong in decision services built with your decision automation platform because they can:
- articulate complex rules in a scalable way (both those created by humans and those discovered from data using machine learning)
- connect rules with regulations and check rule consistency
He also pointed out that Machine Learning models can fall in performance as their inputs change over time and that framing them in a decision can be used as a basis for monitoring this and triggering retraining as needed

He also had some thoughts on error reduction and curating answers from GenAI/LLMs:

Retrieval Augmented Generation or RAG is the ability to automatically incorporate curated snippets of your own documents into a prompt to answer questions confined to your policies, your products, etc – rather than common sense or general knowledge.
This keeps your answers more focused on your business and can be orchestrated using decision automation.
While decision automation can help make sure your chatbots don't make mistakes of decision-making, the chatbots can also prevent errors in how decision services are used, providing a multi-step conversation that makes it easier for the user to get the right data entered into the decision service and so makes the whole interaction more robust.

While I discussed the value of a decision model (see my blog series on decision modeling for more details), Jan pointed out that there is a popular orchestration layer for ML and LLMs called LangChain that has recently been challenged for not yet having a strong visual interface. Decision modeling and decision automation tooling can provide this visual orchestration for combining LLMs, decision services and ML right now.

Jan has a lot of experience with mixing decision automation and AI and made the point that when LLMs or GenAI falter, you need to find out fast. Keeping interim states and collateral to quickly diagnose issues is critical and decision automation allows you to automate your reaction to unexpected results so you respond (and correct) more quickly.

This brought up the whole issue of testing AI where Jan had some experience to share:

LLMs and GenAI can be poor at reproducibility (although this is improving) so you’ll need to test LLM systems behavior in the aggregate.
The fact that the responses can vary even with the same inputs and the need for consistency to protect your reputation means you should build your CI/CD platform early. This will let you support experimentation with prompt design and rapidly assess the impact of new API features.
You can use GenAI/LLMs to test GenAI/LLMs :)
You should expect more regulatory impact. You'll need some means of showing you have tested your use of GenAI and this may well become mandatory as AI regulations evolve.
Decision automation architectures provide a framework for automating this testing of your AI - it can make GenAI more robust by fact-checking and voting to reduce hallucination and it can reduce bias by applying KPI-based corrections.

Jan's final observation was that it is important to keep it simple. The key to success with LLMs is to keep each step simple. Projects fail because they expect the AI to do too much in one shot (e.g., understand customers’ intent while obeying corporate and compliance constraints and manipulating database records). The result is a solution that works sometimes and fails unpredictably on others - bad news for your reputation. A step-by-step orchestration of smaller AI components works much more reliably and can be automated and documented using decision automation.

Drop me a line if you have questions for me and Jan and we we'll get them answered for you!

#DecisionAutomation #GenerativeAI #ArtificialIntelligence(AI)#MachineLearning #decisionmgt #IBMChampions

2 comments

10 views

Permalink

Comments

James Taylor

Fri November 17, 2023 10:38 AM

@Christian Nørgaard glad you found it helpful! Drop me a line any time with questions.

Christian Nørgaard

Fri November 17, 2023 10:23 AM

Thanks for great Webinar James. Helped me see the benefits of using Decision Services in connection with GenAI and/or ML.

IBM Business Automation Community

Come for answers. Stay for best practices. All we’re missing is you.

Decision Management (ODM, ADS)