Creating a Voice Assistant Using IBM Watson in Python

Voice assistants have become increasingly popular in our daily lives, offering convenience and ease of access to information and services. In this article, we will explore how to create a simple voice assistant using IBM Watson's Speech to Text and Text to Speech services in Python. By the end of this guide, you'll have a basic voice assistant that can understand your spoken commands and respond with synthesized speech.

Prerequisites

To follow along with this tutorial, you'll need the following:

Python installed on your system (Python 3.6 or later is recommended).
An IBM Cloud account to access the Watson services. You can sign up for a free account if you don't have one.
The ibm_watson Python library, which can be installed using pip.

Step 1: Set Up IBM Watson Services

Go to the IBM Cloud website (https://cloud.ibm.com) and log in to your account.
Create a new Speech to Text service and a new Text to Speech service from the IBM Cloud Catalog.
Note down the API credentials (API key and URL) for both services as you will need them later in the code.

Step 2: Install Required Libraries

Open a terminal or command prompt and install the necessary libraries using pip:


pip install ibm_watson
pip install pyaudio

Note: For the pyaudio library, you may need to install additional system dependencies based on your operating system.

Step 3: Initialize the Watson Services

In your Python script, import the required modules and initialize the Watson services with your API credentials:


import os
from ibm_watson import SpeechToTextV1, TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Replace these values with your own API credentials
speech_to_text_apikey = "your_speech_to_text_apikey"
speech_to_text_url = "your_speech_to_text_url"
text_to_speech_apikey = "your_text_to_speech_apikey"
text_to_speech_url = "your_text_to_speech_url"

authenticator_stt = IAMAuthenticator(speech_to_text_apikey)
speech_to_text = SpeechToTextV1(authenticator=authenticator_stt)
speech_to_text.set_service_url(speech_to_text_url)

authenticator_tts = IAMAuthenticator(text_to_speech_apikey)
text_to_speech = TextToSpeechV1(authenticator=authenticator_tts)
text_to_speech.set_service_url(text_to_speech_url)

Step 4: Capture and Convert Speech to Text

Now, let's capture speech from the user's microphone and convert it to text using IBM Watson's Speech to Text service:


import pyaudio
import wave

def record_audio(filename):
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 44100
    RECORD_SECONDS = 5

    audio = pyaudio.PyAudio()

    stream = audio.open(format=FORMAT,
                        channels=CHANNELS,
                        rate=RATE,
                        input=True,
                        frames_per_buffer=CHUNK)

    print("Recording...")

    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)

    print("Finished recording.")

    stream.stop_stream()
    stream.close()
    audio.terminate()

    wf = wave.open(filename, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

def speech_to_text(filename):
    with open(filename, 'rb') as audio_file:
        result = speech_to_text.recognize(audio=audio_file, content_type='audio/wav').get_result()

    text = result['results'][0]['alternatives'][0]['transcript'].strip()
    return text

# Capture audio and save it as 'audio.wav'
record_audio("audio.wav")

# Convert speech to text
user_input = speech_to_text("audio.wav")
print("You said:", user_input)

Step 5: Convert Text to Speech

Finally, let's use IBM Watson's Text to Speech service to convert the assistant's response (text) into speech:


def text_to_speech(text, output_filename="output.wav"):
    with open(output_filename, 'wb') as audio_file:
        audio_file.write(text_to_speech.synthesize(text, accept='audio/wav').get_result().content)

# Sample response from the assistant
assistant_response = "Hello! How can I assist you today?"

# Convert text to speech
text_to_speech(assistant_response, "response.wav")

Conclusion

Congratulations! You've successfully created a simple voice assistant using IBM Watson's Speech to Text and Text to Speech services in Python. This voice assistant can now capture your spoken commands, convert them to text, and respond with synthesized speech.

This is just a basic example, and you can expand the functionality of your voice assistant by integrating natural language understanding (NLU) capabilities and adding more sophisticated conversational logic. Additionally, you can integrate your voice assistant with other services and APIs to provide personalized responses and deliver a richer user experience. Happy coding!

watsonx Assistant

watsonx Assistant

Conversational AI for fast and friendly customer care