October 16, 2024
Learn how to add speech to text conversion using OpenAI Whisper API in Retool
With OpenAI Whisper, converting audio speech to text becomes easy and opens up opportunities for building powerful applications. Whisper can transcribe audio into raw text, which can then be stored in a database, searched, and further manipulated. Integrating this capability in Retool creates even more potential by combining data and automation in one place. Let’s explore how to build a simple app using Whisper’s speech-to-text API in Retool.
Before we begin, you’ll need the following:
Start by creating a REST API resource in Retool. The Whisper API will be the base URL, and you’ll need to authenticate using your OpenAI API key.
https://api.openai.com/v1/audio/transcriptions
.POST
.We’ll now create a minimal interface that allows users to upload an audio file for transcription.
Add a File Picker component to your Retool app to let users upload an audio file. Named fileButton1
.
Whisper's API requires, so the request body will need two fields
model
and
file
model
field to whisper-1
.file
field, bind it to the File Picker component using ``.If all the parts are correctly connected, we should see a similar situation when pressing "Run" to the query:
To display the converted text, we’ll add a Text Area component that updates with the transcription result.
Drag a Text Area component into the interface.
Bind the component to the transcription response using:
{ { query1.data?.text } }
This will display the text returned from the Whisper API after a successful transcription.
One common issue when working with audio files is hitting file size limits. The Whisper API has size restrictions, and if you hit this limit, you’ll need to split the file into smaller chunks and transcribe each part individually. You can then concatenate the transcribed results.
Another possible issue is the API request timing out for longer files. If this happens, you can increase the timeout setting in Retool’s Advanced tab for your Whisper API resource. Set it to a higher value, such as 120 seconds, to handle longer processing times.
Once you have your raw text, the sky's the limit. You can add more components to your Retool app, such as buttons to store the transcribed text in a database or a search feature to browse through past transcriptions.
Here’s a simple example:
With these steps, you now have a fully functional Retool app that integrates OpenAI Whisper for speech-to-text conversion. From here, you can extend its functionality as needed—perhaps building a searchable repository of meeting notes or automating workflows that involve audio data.