watsonx.ai

 View Only

Generative AI Agent Architecture for Software Assistant

By Yohan Bensoussan posted 28 days ago

  

Co Authored by: Yohan Bensoussan, Jaffa Sztejnbok. Build Lab EMEA

If you're here to find a tutorial on integrating LangChain with Watsonx.ai, this might not be the blog you're looking for! This post is about proposing a high-level architecture for building production-ready AI assistants using an agent-based approach. Following some critiques from the community about the limitations of using only function calling, our demo offers to try to answer this challenge.

We aim to equip enterprise with the architecture to build autonomous agents that surpass traditional methods, steering clear of deep integration, ensuring AI assistants are not only innovative but also practical for production and real-world applications.

Building an Assistant for Your Software with AI: A Comprehensive Demo Using Agent Architecture.

The promise of generative AI for technology companies, like startups and ISVs, comes with a simple yet necessary question: How do we embed generative AI into our software?

In fintech sector, imagine an intelligent assistant that truly uses the added value of your technology to offer personalized advice to your customers. In cybersecurity sector, securing data can be more intuitive. With specialized agents, you can monitor and react to threats with precision.

In this blog post, we aim to harmonize the agent-based approach with function calling, encapsulating both within a toolkit designed to segment complex problems into manageable parts in order to demystify the integration of generative AI and transform the abstract into the concrete.

For our purposes, we will use as an example an imagined observability platform called Observ.ai.

The Challenge

Observability tools are instrumental in monitoring and managing complex distributed systems. They provide comprehensive insights into the performance, health, and behavior of applications, infrastructure, and services in real-time. Leveraging advanced instrumentation and analytics, these tools collect data across various layers of the technology stack, including application code, network, and underlying infrastructure. By offering metrics, traces, and logs, observability tools enable teams to quickly detect, diagnose, and resolve issues, enhancing system reliability, performance, and scalability. Moreover, they facilitate proactive monitoring, anomaly detection, and trend analysis, empowering organizations to optimize their digital operations and deliver exceptional user experiences.

Keep this observability platform in mind as we aim to integrate an AI assistant deeply into the core capabilities of the platform. This assistant will be at the users' fingertips, ready to answer their questions, perform actions, schedule tasks, define alerts, discover new insights, and provide other capabilities related to observability. So, let's dive into building such an assistant for Observ.ai using a powerful multi-LLM agent AI architecture.

The Architecture

Everything starts with a correct architecture. By correct, we mean:

  1. Modular: The agent architecture should be designed with functional modularity in mind. This involves decomposing the assistant's functionalities into independent, self-contained agent modules. Each module can be responsible for a specific task, such as question answering, data retrieval, action execution, or natural language generation. This modularity allows for agent reusability, independent development and maintenance, and scalability (new modules can be seamlessly integrated to extend the assistant's capabilities as needed).
  2. Optimized: The agent's performance and cost are optimized by using an ensemble of specialized, small agents. These agents offer several advantages: reduced cost, faster responses, enhanced accuracy, and decreased hallucinations.
  3. Integrative: By leveraging specialized agents, it becomes simpler to connect to Observ.ai's APIs, databases, and other services to access data, execute actions, and retrieve results.

About our imagined software, Observ.ai...

Observ.ai empowers users to understand their systems through logs, metrics, traces, alerts, and dashboards. Manually sifting through this data can be a time-consuming and tedious process, especially when it can be handled automatically as alerts begin to appear or after events occur. A virtual assistant can streamline this process by offering a variety of functionalities, including:

  • Software Data Requests: Instead of wading through dashboards, users can ask natural language questions like, "What's the CPU utilization for the past hour?" or "Is there an increase in API latency for service X?"
  • Documentation Q&A: "How to define a new alert for specific logs?"
  • Performing Actions: The assistant can take specific actions upon user request. Imagine asking, "Activate service X for 2 hours," or "Schedule a load test for the e-commerce platform every Monday at 2 AM."
  • Additional Actions: The assistant can be extended to perform a wider range of observability tasks, such as triggering alerts, visualizing data, or generating charts and graphs based on user requests.

Understanding the Flow with a Conversation Example.

Let's walk through a green path conversation to illustrate how the different components work together:

User: "Give me the critical error logs for today."

Watsonx Assistant recognizes the main intent action on observ.ai between greetings, action on observ.ai, how-to support, or task assignment.

Watsonx Assistant: Sends the user query to the LLM orchestrator agent.

Watsonx.ai - Mixtral Model: Analyzes the intent and determines the tool call required for data retrieval from the Database.

Watsonx.ai - CodeLlama Model: Generates an SQL query based on the user query ("critical error logs for today") that observ.ai's database can understand.

Observ.ai Database: Executes the SQL query and retrieves the relevant data (average memory usage for the specified timeframe).

Watsonx.ai - Mixtral Model: Receives the Observ.ai data and decides on the next tool call (explaining the answer in natural language).

Watsonx.ai - Granite or Mixtral Model: The specialized LLM model receives the data (likely in JSON format) and transforms it into natural language.

Watsonx Assistant: Receives the response from the Orchestrator ("I have found 2 critical error logs for today: 1. Database connection timeout. 2. Failed to authenticate user.") and presents it back to the user.

Multi-agent approach

And now, let’s deep dive. This architecture leverages a multi-agent approach, where different AI models work together seamlessly:

1. Intent recognition with Watsonx Assistant 

Watsonx Assistant has multiple roles. It acts as the user interface. It captures queries and utilizes intent recognition to understand the user's main goal. It also handles digressions, dynamically responding to the user by changing the conversation topic as needed, and returning to the original topic once the digression conversation ends.

Here, basic actions are defined:

We can assume that the "Assign Task" will be linked to a straightforward flow, using a basic call to the Monday GraphQL API as an example. And the "How-to?" action is connected to a RAG flow. For the purpose of the demo, we will focus on another action — "Observ.ai Action" that needs to manage multiple flows and operation over Observ.ai software.

This process is empowered by an orchestrator LLM agent based on watsonx.ai.

2. Orchestrator LLM Agent (using Mixtral-8x7b)

It acts as a decision engine, analyzing the user's query in depth and selecting the most appropriate chain of actions. It delegates tasks to specialized functions when needed. A function can be an assembly of tasks, queries, and models, for instance. Integrating these techniques allows users to understand the reasoning behind the assistant's decisions, fostering AI explainability.

As the orchestrator is the central mechanism to manage the flow, the prompt needs to cover all potential cases managed by the assistant's actions. The orchestrator also needs to have natural language capabilities, function-calling capabilities, and sometimes coding capabilities. This is the reason we have chosen the Mixtral 8x7b model, which is proficient for such tasks. Here's a great post from Niklas Heidloff on the topic: Mixtral Agents: Tools for Multi-Turn SQL.

Orchestrator agent prompt components:

Instruction

Tools

Function tool to run externally. This can be an API call, an SQL query, or an additional inference to another model.

Helpers

Some other tools are directly handled by the orchestrator.

Conversation history

This is an aggregation of the conversation between the user, assistant, and external system, built on the fly (example described below).

Context

For example, the current date. Here, we can consider including more data about the profile of the connected user, the currently open application, and more.

And finally, the user input, which is none other than the user's request to the assistant.

3. SQL Specialized Model (CodeLlama-34b)

Acts as an SQL query specialist to produce the proper SQL query that the observ.ai database will interpret.

4. Tool Calling

Once agents have prepared all the necessary data for the function call (in this example, to request from a database),the endpoint system runs a parser on the query:

  1. Query is Correct - The endpoint system provides the database response (often in JSON format).
  2. Query is Incorrect - The endpoint system reports errors, and an iteration can be run again with the SQL specialized agent.

Iteration of SQL Specialized Model (CodeLlama) – When the SQL query is not recognized by the observ.ai system, iterating with the agent, this time considering the previous issues, significantly improves the output.

5. Answer is Returned to the Orchestrator LLM (Mixtral)

The Orchestrator decides on the next step. It chooses the explainer tool responsible for converting system outputs into natural language. At this step, it might decide to run the same model or a specialized agent for conversation (an agent based on the granite-chat model for example). Using a specialized model like Granite offers several advantages: Granite is specifically trained on professional, trusted enterprise datasets and chat interactions, leading to more accurate and human-like outputs in several domains compared to more general-purpose LLMs.

In general, the use of smaller, specialized models leads to efficiency: they require less computational resources to run, making them more cost-effective, offering better performance, and being more scalable for real-world applications.

In this case, it builds the final answer by itself. The choice of whether to redirect to a specialized tool or to manage it within the orchestrator lies with the prompt engineer.

6. A Symphony of Agents

In a similar way, the orchestrator will continue to manage requests, flows, and answers, even for a multi-turn conversation. Here, for example, when a user asks for more details and requests another action, the assistant will chain the operations to run: 1. DB answer and 2. Connection load test that is a function using API.

At any step, it's possible to consider merging model actions or splitting them based on complexity. For example, Mixtral can manage the initial generation of the SQL query directly as part of the tool choice step. Alternatively, it can automatically delegate this task to a Specialized SQL agent. Here, the evaluation of cost and model performance will play the role of the architect.

The integration between the different agents, which are receiving and sending information among themselves, is incorporated into the real functions. For instance, at the end of the function for database data retrieval in Observ.ai, it will always call back to the orchestrator agent and provide it with the database output to decide what to do with this new information.

This assistant architecture can typically handle software operations in an enterprise, and when we talk about enterprise AI, it’s impossible to ignore the Trust Layer.

Watsonx.Governance

Very excited about enabling generative AI and LLMs in our solutions, but soon realized that we need to be cautious to ensure it does not provide incorrect or biased responses or generate hallucinations. We understood that using generative AI and LLMs might be risky, and we quickly came to realize that we must closely monitor this new capability.

We simply need an Observing Eye. This is where Watsonx.Governance comes in. Watsonx.Governance acts as the watchful guardian of the entire process. It monitors and evaluates all models and prompts throughout their lifecycle. All the facts are centralized for AI Explainability. Here’s how it ensures a smooth and reliable experience:

Consistency: Governance maintains consistency across all interactions by ensuring that prompt templates are clear, concise, and aligned with the desired functionalities.

Accuracy: It continuously evaluates the accuracy of responses generated by the models, identifying any potential biases or factual errors. Watsonx.Governance plays a vital role in this by scrutinizing prompt templates for potential biases.

Security: Governance also plays a role in security by monitoring for potential vulnerabilities in prompt templates and ensuring they do not generate outputs that could expose sensitive information or lead to unintended actions.

So, let’s see how it acts as our Observing Eye. We will follow the lifecycle of a prompt:

  • The environments we will monitor (lifecycle)
  • The LLM we are using.

   The prompt template 

The prompt Feature

The evaluation of the prompt in the Development environment shows that there are 10 alerts, indicating that our prompt is not performing well and the output is not as close to what we were expecting.

Model health in DEV environment

In the Validate environment, we will only display the test results to show that we have improved our prompt. The only alert is on readability, which is expected, as the output should be an SQL statement or an API call, etc., and is not meant to be readable text for humans.

In production environment prompt is called with PII data and we can see that Watsonx.governance is alerting on that.

The Conclusion

To wrap it up, think of this agent-based AI architecture as your next-level upgrade for integrating generative AI into enterprise systems. Modular, optimized, and seamlessly integrative - this framework isn’t just about enhancing AI assistants, it's about redefining them. 

By using Conversational platform like Watsonx assistant and specialized agents, we can optimize performance and cost, while also ensuring that the assistant can easily connect to software, databases, and other services. The architecture is also flexible, allowing for the integration of new modules as needed, and for the extension of the assistant's capabilities to handle a wider range of tasks.

With Watsonx.governance in the mix, the operations are not only powered but also protected, ensuring every interaction is secure, accurate, and consistent. 

It's strategic enterprise AI, with Watsonx leading the way.

Yohan Bensoussan

AI Architect, EMEA Build Lab

Jaffa Sztejnbok

Cloud Architect , EMEA Build Lab


#watsonx.ai #PromptLab


#GenerativeAI
0 comments
38 views

Permalink