IBM Cloud Global

 View Only

Transcribe Audio to Text with IBM Watson Speech to Text using Python

By Gunasekaran Venkatesan posted Wed April 03, 2024 11:58 AM


In this tutorial, we will explore how to use IBM Watson's Speech to Text service in Python to transcribe audio files into text. IBM Watson offers powerful speech recognition capabilities that can be easily integrated into your Python applications via its API.


Before we begin, make sure you have the following:

  1. An IBM Cloud account.
  2. Created a Speech to Text service instance in IBM Cloud and obtained the API key and service URL.
  3. Please use Jupyter noteboook  which is using for python code to test and build.

Step 1:  Install the IBM Watson SDK:

You can install the ibm-watson SDK using pip:

pip install ibm-watson

Step 2: Write Python Code:

 Below is an example of how you can use the IBM Watson Speech to Text service in Python:

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Replace 'apikey' and 'url' with your actual API key and service URL
authenticator = IAMAuthenticator('your_api_key')
speech_to_text = SpeechToTextV1(

# Replace 'your_service_url' with your actual service URL

# Transcribe audio from a file
with open('path_to_audio_file', 'rb') as audio_file:
    result = speech_to_text.recognize(

# Print the transcription result

Note: Make sure to replace 'your_api_key', 'your_service_url', and 'path_to_audio_file' with your actual API key, service URL, and the path to your audio file, respectively. You can also change the model parameter based on your language and audio quality needs.

  • The code uses an API key ('your_api_key') and service URL ('your_service_url') to authenticate with the IBM Watson Speech to Text service. This is done using the IAMAuthenticator from the ibm_cloud_sdk_core.authenticators module.
  • The code opens an audio file ('path_to_audio_file') in binary mode and sends it to the Speech to Text service using the recognize method of the SpeechToTextV1 instance. Parameters such as content_type (specifying the audio format, e.g., 'audio/wav') and model (specifying the language model to use, e.g., 'en-US_BroadbandModel') are provided.
  • The API call returns a result object containing the transcription of the audio. This result object is stored in the result variable.
  • Finally, the code prints the transcription result using print(result).

Step 3: Run the Code

Save the Python code in a file (e.g., and run it using the command:


So, yes, the provided code makes an API call to the IBM Watson Speech to Text service to transcribe audio from a file.

Sample Demo: 

Here I executed my code in Jupyter notebook and here you can see the output from audio files into text which is highlighted in the below image.


Congratulations! You've successfully transcribed audio to text using IBM Watson's Speech to Text service and Python. This tutorial covered the setup, audio file conversion, and transcription process. Feel free to explore further customization options and features offered by IBM Watson's Speech to Text service for your applications.

Getting started with Speech to Text

Python API

    #IBMWatsonSpeechtoText #Python #IBMWatson