IBM Build Partners Technical Group

IBM Build Partners Technical Group

Part of the Build to Win Group - Connect, Learn, Share




#watsonx
#AI
  • 1.  Agent use case - emotion AI

    Posted 4 days ago
    Edited by John Pegram 4 days ago

    Is it possible to find a roadmap or guidance for the design and deployment of an agent that would provide analysis of video and voice? (video enrichment) 

    The use case is to assess and identify the impact of 

    1. police misconduct on stop-and-search/police incidents. For example, say an end user records a stop-and-search incident, during that incident, someone is assaulted by the police, understandably, an event that creates trauma. If the end user gives the video to a client (Law firm) to assist with a police complaint or personal injury civil claim building the agent provides an assessment/analytics report on the impact of the event.   We understand that impact assessment tools are being gradually adopted by the legal sector. I would suggest that a tool such as this would not be used in a courtroom due to ethical considerations such as machine bias and human rights.  

    Where the value lies in its ability to enhance a legal team's evaluation of impact.  As a second example, perhaps the same agent assesses video to identify the impact of domestic abuse. Our overall proposition/concept as a build partner is an IVA specialising in public law and human rights for civil liberties law firms. Our project partner wants to create an agent that would be something that could possibly be deployed alongside our main solution as an integration or higher-end version. Our clients are more interested in capabilities to assist with client intake and perhaps provide a tool for end users that provides legal information and guidance. I would really welcome some feedback here. 

    Technology wise we intend to build from Orchestrate, but this sort of agent from what I have been told would be created via Watson.ai that would then be part of a Watson Orchestrate assistant. 



    ------------------------------
    John Pegram
    Managing Director / Owner
    Future Bound IT Ltd
    London
    0208 1875870
    ------------------------------



  • 2.  RE: Agent use case - emotion AI

    Posted 5 hours ago

    Hi John,

    IBM publishes practical guidance and patterns for building agents that analyze video and voice including design patterns (speech + vision), product docs and APIs (Speech-to-Text, Vision), agent development/orchestration guidance (watsonx.ai, watsonx Orchestrate), and Operator/AgentOps guidance for deploying and operating agents in production.

    1. Design pattern for speech + vision (RAG) - IBM Cloud has an explicit Speech & Vision recognition design considerations pattern that covers conversational speech-to-text, text-to-speech and computer-vision considerations for Retrieval-Augmented Generation (RAG) agent workflows. This is the closest thing to a technical "roadmap" or blueprint IBM provides for multimodal (video+voice) agents.

    2. watsonx.ai - agent development + AgentOps - the watsonx.ai pages describe Agent Builder/AgentOps features, guidance on choosing models, tracing/evaluation, and scaling agents from experimentation to production (including best practices for monitoring and optimizing agent performance). Use watsonx.ai as the developer studio for building the agent. AI agent development

    3. Speech and TTS APIs - IBM documents production-grade services (Watson Speech to Text, Text to Speech) with features useful for voice analysis: streaming transcription, speaker diarization, interim results, domain-tuned models. These are the recommended building blocks for the audio side. watson speech to text 

    4. Orchestration & prebuilt agents - watsonx Orchestrate and the agent catalog include prebuilt domain agents and tooling (Agent Builder, Flow Builder) for composing workflows and integrating agents into enterprise systems - useful when you need to chain voice/video analysis into business processes. The Orchestrate docs also include specific integration guidance (e.g., content repositories).

    5. Thought leadership + how-to material - IBM Think articles and tutorials explain agentic RAG, use cases for agents across industries, and provide hands-on resources (tutorials, videos) that often link to patterns and code examples. These are good for higher-level design decisions and examples. The 2025 Guide to AI Agents

    I recommend these practical next steps:

    1. Read the Speech & Vision design pattern to get architecture diagrams and specific design considerations (preprocessing, transcription, diarization, vision models, RAG pipelines).

    2. Prototype using Watson Speech to Text (streaming + diarization) + a computer-vision model (IBM patterns show options) and connect them via a RAG pipeline in watsonx.ai. Use the Agent Builder/Flow Builder to compose the workflow.

    3. Plan operational concerns up front: model selection, evaluation/tracing, latency (real-time vs batch), data retention/privacy (video/audio PII), and deployment (on-prem/hybrid/cloud). The watsonx docs include guidance for AgentOps and monitoring.



    ------------------------------
    Sancia Matthyssen
    Program Director, AI Partnerships
    IBM
    Austin
    ------------------------------



  • 3.  RE: Agent use case - emotion AI

    Posted 5 hours ago
    Thanks for the detailed overview and guidance.  It seems that in the legal sector, there are ethical issues around such deployments to help support claims, assess impact and assist with legal decision-making. 
    We are going to possibly put the project on GitHub as an AI project in its own right and ask if people can volunteer their time to maybe get a solution together. It's gone back to the original concept of helping identify unconscious bias impact and may be something police would use to train their officers with.  The Black Police Association in the UK expressed an interest. Our main ESA solution that has been approved is for the legal sector. This is really looking more like public sector and again the guidance here is great. 


    John Pegram

    Managing Director

    Future Bound IT Ltd

    Mobile: 07724 901130 

    Sales:  0208 1875870

    Support: 0208 0513958S0208 1875870

    Email  johnpegram@futureboundit.com

    Website www.futureboundit.com

     

     

    Future-Bound-IT-Logo-Final

     

            http://futureboundit.com/wp-content/uploads/2016/01/abacusnext_lightBG_1.png

     


    Future Bound IT Ltd, company number 11240184, 86- 90 Paul Street, London, EC2A 4NE