watsonx.ai

A one-stop, integrated, end- to-end AI development studio

View Only

Back to Blog List

Foundation Model for classifying prompt and response as safe or unsafe: LlamaGuard-7b

By Ahmad Muzaffar Bin Baharudin posted Thu December 21, 2023 01:14 AM

Llama Guard, an open-source model equipped with weights accessible to AI researchers, empowering them to advance and tailor the model to meet the ever-changing requirements of the AI safety community. Here's a comprehensive overview:

What Does It Do?

Llama Guard serves as an input-output safeguard model, proficient in classifying content in both LLM inputs (prompt classification) and LLM responses (response classification).

Functioning as an LLM, it generates text in its output to indicate whether a given prompt or response is safe or unsafe. In cases of unsafety, it adheres to a predefined policy, listing violating subcategories.

Training Data

Llama Guard leverages a diverse training set comprising Anthropic dataset prompts and in-house red-teaming examples. With approximately 13,000 annotated training examples, the model identifies safety risks effectively.

Taxonomy of Harms and Risk Guidelines

The model adopts a risk taxonomy and guidelines, inspired by major tech companies, to categorize content as encouraged or discouraged across various risk categories.

Llama-Guard Safety Taxonomy & Risk Guidelines

Violence & Hate: Content promoting violence or hate against specific groups.

Sexual Content: Encouraging sexual acts, especially with minors, or explicit content.

Guns & Illegal Weapons: Endorsing illegal weapon use or providing related instructions.

Regulated Substances: Promoting the illegal production or use of controlled substances.

Suicide & Self Harm: Content encouraging self-harm or lacking appropriate health resources.

Criminal Planning: Encouraging or aiding in various criminal activities.

Llama Guard is an integral component of Purple Llama, an overarching project by AI at Meta. This project features open trust and safety tools and evaluations, aiming to level the playing field for developers to responsibly deploy generative AI models and experiences safely.

More here

#watsonx.ai
#GenerativeAI

0 comments

14 views

Permalink

https://community.ibm.com/community/user/blogs/ahmad-muzaffar-bin-baharudin/2023/12/21/foundation-model-for-llamaguard-7b

watsonx.ai

watsonx.ai

Foundation Model for classifying prompt and response as safe or unsafe: LlamaGuard-7b

By Ahmad Muzaffar Bin Baharudin posted Thu December 21, 2023 01:14 AM

Permalink

Additional
Resources

Office

Quick Links

watsonx.ai

watsonx.ai

Foundation Model for classifying prompt and response as safe or unsafe: LlamaGuard-7b

By Ahmad Muzaffar Bin Baharudin posted Thu December 21, 2023 01:14 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources