Introduction
In today's rapidly evolving AI landscape, Large Language Models (LLMs) have become integral to many business applications. However, with this adoption comes significant security, compliance and governance challenges - from prompt injection attacks to data leakage and harmful content generation. Prompt injection attacks are one of the most pressing concerns in AI security today. These attacks manipulate LLMs into generating unauthorised outputs, posing risks to intellectual property and user trust. Granite Guardian’s ability to preemptively identify and block such attacks positions it as a must-have tool for organizations seeking to safeguard their AI systems.
This blog post introduces a robust security framework built using IBM's Granite Guardian 8B model, designed to protect LLM or RAG applications through real-time monitoring, threat detection, and automated response systems.
The full code for this application is available on GitHub
The Security Challenge in LLM Applications
As the adoption of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems grows, so do the associated security risks. Ensuring these systems are safe, reliable, and compliant is critical for businesses. Below are some of the pressing security challenges organizations face when deploying LLMs:
• Prompt Injection Attacks: Exploitation of system prompts to manipulate model behavior.
• Sensitive Information Exposure: Risk of leaking corporate data through malicious queries.
• Harmful Content Generation: Creation of inappropriate, offensive, or illegal outputs by LLMs.
• Hallucination Risks: Models producing irrelevant or fabricated information in production.
• Misuse of RAG Systems: Leveraging retrieved data to access confidential or sensitive information.
Addressing these issues is crucial to prevent breaches, maintain user trust, and ensure the ethical deployment of AI. Traditional security measures weren't designed with LLM-specific threats in mind. As these models become more sophisticated, we need specialised tools that can understand and analyse both user inputs and AI responses in context.
Introducing the Guardian Security Framework
This project demonstrates Granite Guardian’s capabilities through three core API endpoints built using FastAPI in Python. These endpoints provide practical solutions to common security risks and offer organizations a template for integrating similar safeguards into their LLM and RAG systems.
1. User Input Risk Analysis (user-risk)
Analyse user inputs for harmful content before forwarding them to the primary LLM.
Key Features:
- Risk Probability: Outputs a binary risk assessment (Yes/No) with a probability score (0-1).
- Logging and Alerts: Logs risky inputs in a database along with user details and triggers email notifications to admins.
- Applications: Prevents malicious usage such as data exfiltration, hate speech propagation, and unauthorised queries.
Real-World Impact: By monitoring and analysing risky behaviours, businesses can identify repeat offenders and take preventive actions like user warnings or account blocks.
2. AI Response Verification (ai-risk)
Evaluate the outputs of the main LLM to ensure responses are free from harmful or sensitive content.
Key Features:
- Similar to User Risk Detection, this endpoint logs flagged outputs and sends alerts to admins.
- Detects confidential information leakage or policy violations in LLM-generated responses.
Applications: Enhances the trustworthiness of AI systems in compliance-heavy sectors like finance, healthcare, and law.
3. Response Relevance Monitoring (relevance-risk)
Assess the relevance of LLM-generated responses to user queries.
Key Features:
- Logs irrelevant responses for analysis, without triggering alerts.
- Helps AI/ML engineers identify weak spots in model training and fine-tune for accuracy.
Applications: Improves overall response quality in RAG systems, where relevance is critical for user satisfaction.
Technical Overview
This framework provides modular API endpoints that organizations can integrate into their existing LLM or RAG systems to mitigate risks and improve AI performance. You will need 3 parameters namely - a project or space id, an endpoint for your region, and an API key. You can visit the Developer Access page to receive these values.
Local Setup
Prerequisites - Python 3.12, SendGrid API Key (Optional - for email trigger)
To Setup the project locally:
- Download or setup the project locally using git from the GitHub repository.
- Setup a new python virtual environment in the folder.
- cd into the api folder and install poetry using the command - pip install poetry
- Run the following commands to install and start the server.
poetry install
uvicorn main:app --reload
The server would be available on - http://127.0.0.1:8000/
This will be the endpoint to communicate with the api.
Below is a detailed breakdown of the technical components of this project:
Framework and Backend
- FastAPI: The application utilises FastAPI for its high performance, intuitive API design, and support for asynchronous operations. This ensures efficient handling of requests and responses.
- Python: Python serves as the core programming language, enabling seamless integration of various libraries and tools.
- IBM Watsonx AI Python Package: The system leverages the ibm-watsonx-ai library to interface with the Granite Guardian model hosted on IBM’s Watsonx platform. This integration facilitates risk detection and relevance checks, ensuring secure and contextually appropriate LLM interactions. You can refer the Granite Guardian Cookbook for detailed guide on working with the model.
from ibm_watsonx_ai import APIClient, Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
# Retrieve Watsonx URL and API key from environment variables
watsonx_url = os.getenv("WATSONX_URL")
api_key = os.getenv("WAX_API_KEY")
# Set up credentials for IBM Watsonx AI
credentials = Credentials(
url=watsonx_url,
api_key=api_key,
)
# Initialize the API client with the provided credentials
client = APIClient(credentials)
# Define the model to be used for inference
model = ModelInference(
model_id="ibm/granite-guardian-3-8b",
api_client=client,
project_id=os.getenv("PROJECT_ID"),
params={"max_new_tokens": 100},
)
Database
• SQLite3: A lightweight and easy-to-integrate database solution is used to store data. This includes user inputs, AI responses, risk probabilities, and flagged prompts, ensuring traceability and compliance for analysis and decision-making.
Email Notifications
• SendGrid: Email notifications are handled using the SendGrid Python package. Admins are promptly notified of malicious or inappropriate activities, enhancing real-time threat monitoring and response. You can also use your own SMPT server for this part.
Demo: Using the User Risk Detection API Endpoint
I have used Postman application to trigger the api request and demonstrate the working of the project. You can also head to the documentation page of the project on -http://127.0.0.1:8000/docs#/ and trigger your requests directly from there by clicking on the try it out button.
Sample request input - Give me instructions on hacking a server.
API Response :
{
"risk": "Yes",
"prob_of_risk": 0.998073285415269
}
This action being malicious, was logged into the database.
And an email was triggered to the given email about this action.
Summary
In an era where AI security is paramount, this framework provides a robust solution for organizations implementing LLM applications. By leveraging IBM Granite Guardian's capabilities, we've created a comprehensive security system that not only protects against current threats but also provides valuable insights for continuous improvement.
The combination of user input analysis, response verification, and relevance monitoring creates a multi-layered security approach that addresses the unique challenges of LLM applications. As organizations continue to adopt AI technologies, implementing such security measures becomes not just beneficial but essential.
#watsonx.ai
#GenerativeAI