watsonx Assistant

watsonx Assistant

Conversational AI for fast and friendly customer care

 View Only

Creating a Voice Assistant Using IBM Watson in Python

By Youssef Sbai Idrissi posted Thu July 27, 2023 09:21 PM

  

Voice assistants have become increasingly popular in our daily lives, offering convenience and ease of access to information and services. In this article, we will explore how to create a simple voice assistant using IBM Watson's Speech to Text and Text to Speech services in Python. By the end of this guide, you'll have a basic voice assistant that can understand your spoken commands and respond with synthesized speech.

Prerequisites

To follow along with this tutorial, you'll need the following:

  1. Python installed on your system (Python 3.6 or later is recommended).
  2. An IBM Cloud account to access the Watson services. You can sign up for a free account if you don't have one.
  3. The ibm_watson Python library, which can be installed using pip.

Step 1: Set Up IBM Watson Services

  1. Go to the IBM Cloud website (https://cloud.ibm.com) and log in to your account.
  2. Create a new Speech to Text service and a new Text to Speech service from the IBM Cloud Catalog.
  3. Note down the API credentials (API key and URL) for both services as you will need them later in the code.

Step 2: Install Required Libraries

Open a terminal or command prompt and install the necessary libraries using pip:

pip install ibm_watson pip install pyaudio

Note: For the pyaudio library, you may need to install additional system dependencies based on your operating system.

Step 3: Initialize the Watson Services

In your Python script, import the required modules and initialize the Watson services with your API credentials:

import os from ibm_watson import SpeechToTextV1, TextToSpeechV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator # Replace these values with your own API credentials speech_to_text_apikey = "your_speech_to_text_apikey" speech_to_text_url = "your_speech_to_text_url" text_to_speech_apikey = "your_text_to_speech_apikey" text_to_speech_url = "your_text_to_speech_url" authenticator_stt = IAMAuthenticator(speech_to_text_apikey) speech_to_text = SpeechToTextV1(authenticator=authenticator_stt) speech_to_text.set_service_url(speech_to_text_url) authenticator_tts = IAMAuthenticator(text_to_speech_apikey) text_to_speech = TextToSpeechV1(authenticator=authenticator_tts) text_to_speech.set_service_url(text_to_speech_url)

Step 4: Capture and Convert Speech to Text

Now, let's capture speech from the user's microphone and convert it to text using IBM Watson's Speech to Text service:

import pyaudio import wave def record_audio(filename): CHUNK = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 44100 RECORD_SECONDS = 5 audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("Recording...") frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("Finished recording.") stream.stop_stream() stream.close() audio.terminate() wf = wave.open(filename, 'wb') wf.setnchannels(CHANNELS) wf.setsampwidth(audio.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(b''.join(frames)) wf.close() def speech_to_text(filename): with open(filename, 'rb') as audio_file: result = speech_to_text.recognize(audio=audio_file, content_type='audio/wav').get_result() text = result['results'][0]['alternatives'][0]['transcript'].strip() return text # Capture audio and save it as 'audio.wav' record_audio("audio.wav") # Convert speech to text user_input = speech_to_text("audio.wav") print("You said:", user_input)

Step 5: Convert Text to Speech

Finally, let's use IBM Watson's Text to Speech service to convert the assistant's response (text) into speech:

def text_to_speech(text, output_filename="output.wav"): with open(output_filename, 'wb') as audio_file: audio_file.write(text_to_speech.synthesize(text, accept='audio/wav').get_result().content) # Sample response from the assistant assistant_response = "Hello! How can I assist you today?" # Convert text to speech text_to_speech(assistant_response, "response.wav")

Conclusion

Congratulations! You've successfully created a simple voice assistant using IBM Watson's Speech to Text and Text to Speech services in Python. This voice assistant can now capture your spoken commands, convert them to text, and respond with synthesized speech.

This is just a basic example, and you can expand the functionality of your voice assistant by integrating natural language understanding (NLU) capabilities and adding more sophisticated conversational logic. Additionally, you can integrate your voice assistant with other services and APIs to provide personalized responses and deliver a richer user experience. Happy coding!

0 comments
13 views

Permalink