Customize speech models with fine-tuning

Article
03/28/2025

With custom speech, you can enhance speech recognition accuracy for your applications by using a custom model for real-time speech to text, speech translation, and batch transcription.

You create a custom speech model by fine-tuning an Azure AI Speech base model with your own data. You can upload your data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint.

This article shows you how to use fine-tuning to create a custom speech model. For more information about custom speech, see the custom speech overview documentation.

Start fine-tuning

Custom speech fine-tuning includes models, training and testing datasets, and deployment endpoints. Each project is specific to a locale. For example, you might fine-tune for English in the United States.

To create a custom speech project in Speech Studio, follow these steps:

Sign in to the Speech Studio.
Select the subscription and Speech resource to work with.

Important

If you train a custom model with audio data, select a service resource in a region with dedicated hardware for training audio data. See footnotes in the regions table for more information.
Select Custom speech > Create a new project.
Follow the instructions provided by the wizard to create your project.

Select the new project by name or select Go to project. Then you should see these menu items in the left panel: Speech datasets, Train custom models, Test models, and Deploy models.

Customize speech models with fine-tuning

Start fine-tuning

Related content

Additional resources