Voice assistants have become increasingly popular in our daily lives, offering convenience and ease of access to information and services. In this article, we will explore how to create a simple voice assistant using IBM Watson's Speech to Text and Text to Speech services in Python. By the end of this guide, you'll have a basic voice assistant that can understand your spoken commands and respond with synthesized speech.
Prerequisites
To follow along with this tutorial, you'll need the following:
- Python installed on your system (Python 3.6 or later is recommended).
- An IBM Cloud account to access the Watson services. You can sign up for a free account if you don't have one.
- The
ibm_watson
Python library, which can be installed using pip
.
Step 1: Set Up IBM Watson Services
- Go to the IBM Cloud website (https://cloud.ibm.com) and log in to your account.
- Create a new Speech to Text service and a new Text to Speech service from the IBM Cloud Catalog.
- Note down the API credentials (API key and URL) for both services as you will need them later in the code.
Step 2: Install Required Libraries
Open a terminal or command prompt and install the necessary libraries using pip
:
pip install ibm_watson
pip install pyaudio
Note: For the pyaudio
library, you may need to install additional system dependencies based on your operating system.
Step 3: Initialize the Watson Services
In your Python script, import the required modules and initialize the Watson services with your API credentials:
import os
from ibm_watson import SpeechToTextV1, TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
speech_to_text_apikey = "your_speech_to_text_apikey"
speech_to_text_url = "your_speech_to_text_url"
text_to_speech_apikey = "your_text_to_speech_apikey"
text_to_speech_url = "your_text_to_speech_url"
authenticator_stt = IAMAuthenticator(speech_to_text_apikey)
speech_to_text = SpeechToTextV1(authenticator=authenticator_stt)
speech_to_text.set_service_url(speech_to_text_url)
authenticator_tts = IAMAuthenticator(text_to_speech_apikey)
text_to_speech = TextToSpeechV1(authenticator=authenticator_tts)
text_to_speech.set_service_url(text_to_speech_url)
Step 4: Capture and Convert Speech to Text
Now, let's capture speech from the user's microphone and convert it to text using IBM Watson's Speech to Text service:
import pyaudio
import wave
def record_audio(filename):
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("Recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("Finished recording.")
stream.stop_stream()
stream.close()
audio.terminate()
wf = wave.open(filename, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(audio.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
def speech_to_text(filename):
with open(filename, 'rb') as audio_file:
result = speech_to_text.recognize(audio=audio_file, content_type='audio/wav').get_result()
text = result['results'][0]['alternatives'][0]['transcript'].strip()
return text
record_audio("audio.wav")
user_input = speech_to_text("audio.wav")
print("You said:", user_input)
Step 5: Convert Text to Speech
Finally, let's use IBM Watson's Text to Speech service to convert the assistant's response (text) into speech:
def text_to_speech(text, output_filename="output.wav"):
with open(output_filename, 'wb') as audio_file:
audio_file.write(text_to_speech.synthesize(text, accept='audio/wav').get_result().content)
assistant_response = "Hello! How can I assist you today?"
text_to_speech(assistant_response, "response.wav")
Conclusion
Congratulations! You've successfully created a simple voice assistant using IBM Watson's Speech to Text and Text to Speech services in Python. This voice assistant can now capture your spoken commands, convert them to text, and respond with synthesized speech.
This is just a basic example, and you can expand the functionality of your voice assistant by integrating natural language understanding (NLU) capabilities and adding more sophisticated conversational logic. Additionally, you can integrate your voice assistant with other services and APIs to provide personalized responses and deliver a richer user experience. Happy coding!