How to Use Your Personal Voice with AVSpeechSynthesizer

A long time ago, back in 2013 and just prior to Swift launching, I created a tutorial in Objective-C called AVSpeechSynthesizer Tutorial – Letting Your App Speak Like Siri. A lot of things have changed since then in that we now have Swift, SwiftUI, and personalised voice. People with iPhone 12 or later, and various Mac and iPad devices, can spend time recording phrases on the screen, and after the device has processed those phrases, you then have a personalised voice that can be used in various places on the phone.

This tutorial looks at how to use AVSpeechSynthesizer to select your voice and get yourself to speak anything you type in.

Using AVSpeechSynthesisVoice to get Personalised Voice

We will first begin by defining the method that gets your personalised voice and also prompts the user just prior to getting your voice.

func fetchPersonalVoices() async {
    let status = await AVSpeechSynthesizer.requestPersonalVoiceAuthorization()
    
    if status == .authorized {
        let voices = AVSpeechSynthesisVoice.speechVoices()
            .filter { $0.voiceTraits.contains(.isPersonalVoice) }
        personalVoices = voices
        selectedVoice = voices.first
    } else {
        print("Personal Voice authorization denied or not available.")
        personalVoices = []
    }
}

This method is marked as async so that we can await the status and give time for the user to allow access to their personal voice.

If authorised, we get the speech voices and check if the voiceTraits contain .isPersonalVoice. Assuming you have a personalised voice setup, it will add the voices to personalVoices (we will declare this later in the view) and also selects ths first voice that was returned. In my case, I just have one personal voice setup.

If the user didn’t authorise access to their personal voice, then we pretty much stop there. At this point it would be ideal to just grabn the voices without the filter above and let the user select from the built-in options on your device.

Preparing the Text to be Uttered

Now that we have our voice to work with, we need a way to use that voice. This is done with the AVSpeechUtterance class.

func speakText(voice: AVSpeechSynthesisVoice, settings: SpeechSettings) {
    guard !textToSpeak.isEmpty else { return }
    
    let utterance = AVSpeechUtterance(string: textToSpeak)
    utterance.rate = settings.rate
    utterance.pitchMultiplier = settings.pitch
    utterance.voice = voice
    
    synthesizer.stopSpeaking(at: .immediate)
    synthesizer.speak(utterance)
}

This method accepts an AVSpeechSynthesisVoice as well as some settings that I will show after this.

We first check to see if the text to speak is empty or not and return if it is. This variable is set in the View and is wrapped with @State.

We create our utterance by passing in the textToSpeak.

We then set out rate and pitch multiplier that are passed in the settings.

We then set the voice that utterance will use.

Next, we instructe the synthesizer to stop speaking. This will be noticable if you hit the speak button in the view while it is speaking.

We then tell the synthesizer to start speaking.

Settings Struct

Rather than pass in multiple items to the speakTextMethod I opted to bundle them up into a struct so that it passes as a single parameter.

struct SpeechSettings {
    var rate: Float = 0.5
    var pitch: Float = 1.0
}

We set some defaults for the rate and pitch. Volume is also another option although in my testing I found it didn’t make any noticiable difference to my personal voice.

Setting up our View

Now that we have all of the peices in place, lets build a view so that we can interact with the AVSpeechSynthesizer.

struct ContentView: View {
    @State private var textToSpeak: String = ""
    @State private var personalVoices: [AVSpeechSynthesisVoice] = []
    @State private var selectedVoice: AVSpeechSynthesisVoice?
    @State private var settings = SpeechSettings()
    private let synthesizer = AVSpeechSynthesizer()
    
    var body: some View {
        VStack(spacing: 20) {
            Text("Speak with Your Personal Voice")
                .font(.headline)
            
            TextField("Enter text here", text: $textToSpeak)
                .textFieldStyle(RoundedBorderTextFieldStyle())
                .padding()
            
            if personalVoices.isEmpty {
                Text("No Personal Voices found")
                    .foregroundColor(.gray)
            } else {
                Picker("Select Voice", selection: $selectedVoice) {
                    ForEach(personalVoices, id: \.identifier) { voice in
                        Text(voice.name)
                            .tag(Optional(voice))
                    }
                }
                .pickerStyle(MenuPickerStyle())
                .padding(.horizontal)
            }
            
            VStack(spacing: 15) {
                HStack {
                    Text("Rate: \(String(format: "%.2f", settings.rate))")
                    Slider(value: $settings.rate, in: 0.1...1.0, step: 0.05)
                }
                
                HStack {
                    Text("Pitch: \(String(format: "%.2f", settings.pitch))")
                    Slider(value: $settings.pitch, in: 0.5...2.0, step: 0.1)
                }
                
            }
            .padding(.horizontal)
            
            Button(action: {
                if let voice = selectedVoice ?? personalVoices.first {
                    speakText(voice: voice, settings: settings)
                }
            }) {
                Text("Speak")
                    .font(.title2)
                    .padding()
                    .frame(maxWidth: .infinity)
                    .background(textToSpeak.isEmpty || personalVoices.isEmpty ? Color.gray : Color.blue)
                    .foregroundColor(.white)
                    .cornerRadius(10)
            }
            .disabled(textToSpeak.isEmpty || personalVoices.isEmpty)
            .padding(.horizontal)
            
            Spacer()
        }
        .padding()
        .task {
            await fetchPersonalVoices()
        }
    }
    
    ...
    
    ...
}

This is our view. We declare four @State property wrapped variables to hold the information we need. We also have a private synthesizer constant declared on line 6.

I wont go into the detail of the view too much, but in here we have a basic view with Text, TextField, and a Picker to name a few of the views we are using.

These give a title, a place to enter text, and a picker to select your voice, if you have more than one.

The VStack on lines 31 to 43 contain Slider views that allow you to alter the pitch and rate of your voice.

We have a Button declared on line 45 that makes a call to our speakText method and passes in the voice we selected and the settings we have chosen.

At the end, on lines 64 to 66 we have a .task that awaits the fetchPersonalVoices method.

If you run the app now, and have a personalised voice setup on your device, you will be able to type text and hear it spoken in your own voice.

Any questions, please reach out in the comments.

Using AVSpeechSynthesisVoice to get Personalised Voice

Preparing the Text to be Uttered

Settings Struct

Setting up our View

Leave a Reply Cancel reply