vapi4k-core/com.vapi4k.dtos.voice/PlayHTVoiceDto

PlayHTVoiceDto

@Serializable

data class PlayHTVoiceDto(var inputPreprocessingEnabled: Boolean? = null, var inputReformattingEnabled: Boolean? = null, var inputMinCharacters: Int = -1, val inputPunctuationBoundaries: MutableSet<PunctuationType> = mutableSetOf(), var fillerInjectionEnabled: Boolean? = null, var voiceId: String = "", var voiceIdType: PlayHTVoiceIdType = PlayHTVoiceIdType.UNSPECIFIED, var customVoiceId: String = "", var speed: Double = -1.0, var temperature: Double = -1.0, var emotion: PlayHTVoiceEmotionType = PlayHTVoiceEmotionType.UNSPECIFIED, var voiceGuidance: Double = -1.0, var styleGuidance: Double = -1.0, var textGuidance: Double = -1.0) : PlayHTVoiceProperties, CommonVoiceDto(source)

Constructors

PlayHTVoiceDto

constructor(inputPreprocessingEnabled: Boolean? = null, inputReformattingEnabled: Boolean? = null, inputMinCharacters: Int = -1, inputPunctuationBoundaries: MutableSet<PunctuationType> = mutableSetOf(), fillerInjectionEnabled: Boolean? = null, voiceId: String = "", voiceIdType: PlayHTVoiceIdType = PlayHTVoiceIdType.UNSPECIFIED, customVoiceId: String = "", speed: Double = -1.0, temperature: Double = -1.0, emotion: PlayHTVoiceEmotionType = PlayHTVoiceEmotionType.UNSPECIFIED, voiceGuidance: Double = -1.0, styleGuidance: Double = -1.0, textGuidance: Double = -1.0)

Properties

customVoiceId

@Transient

open override var customVoiceId: String

This enables specifying a voice that doesn't already exist as an PlayHTVoiceIdType enum.

emotion

open override var emotion: PlayHTVoiceEmotionType

An emotion to be applied to the speech.

fillerInjectionEnabled

open override var fillerInjectionEnabled: Boolean?

This determines whether fillers are injected into the model output before inputting it into the voice provider.
Default `false` because you can achieve better results with prompting the model.

inputMinCharacters

open override var inputMinCharacters: Int

This is the minimum number of characters before a chunk is created. The chunks that are sent to the voice provider for the voice generation as the model tokens are streaming in. Defaults to 30.
Increasing this value might add latency as it waits for the model to output a full chunk before sending it to the voice provider. On the other hand, increasing might be a good idea if you want to give voice provider bigger chunks, so it can pronounce them better.
Decreasing this value might decrease latency but might also decrease quality if the voice provider struggles to pronounce the text correctly.

inputPreprocessingEnabled

open override var inputPreprocessingEnabled: Boolean?

This determines whether the model output is preprocessed into chunks before being sent to the voice provider.
Default `true` because voice generation sounds better with chunking (and reformatting them).
To send every token from the model output directly to the voice provider and rely on the voice provider's audio generation logic, set this to `false`.
If disabled, vapi-provided audio control tokens like will not work.

inputPunctuationBoundaries

open override val inputPunctuationBoundaries: MutableSet<PunctuationType>

These are the punctuations that are considered valid boundaries before a chunk is created. The chunks that are sent to the voice provider for the voice generation as the model tokens are streaming in. Defaults are chosen differently for each provider.
Constraining the delimiters might add latency as it waits for the model to output a full chunk before sending it to the voice provider. On the other hand, constraining might be a good idea if you want to give voice provider longer chunks, so it can sound less disjointed across chunks. Eg. ['.'].

inputReformattingEnabled

open override var inputReformattingEnabled: Boolean?

This determines whether the chunk is reformatted before being sent to the voice provider. Many things are reformatted including phone numbers, emails and addresses to improve their enunciation.
Default `true` because voice generation sounds better with reformatting.
To disable chunk reformatting, set this to `false`.
To disable chunking completely, set `inputPreprocessingEnabled` to `false`.

provider

@EncodeDefault

val provider: VoiceProviderType

speed

open override var speed: Double

This is the speed multiplier that will be used.

styleGuidance

open override var styleGuidance: Double

A number between 1 and 30. Use lower numbers to to reduce how strong your chosen emotion will be. Higher numbers will create a very emotional performance.

temperature

open override var temperature: Double

A floating point number between 0, exclusive, and 2, inclusive. If equal to null or not provided, the model's default temperature will be used. The temperature parameter controls variance. Lower temperatures result in more predictable results, higher temperatures allow each run to vary more, so the voice may sound less like the baseline voice.

textGuidance

open override var textGuidance: Double

A number between 1 and 2. This number influences how closely the generated speech adheres to the input text. Use lower values to create more fluid speech, but with a higher chance of deviating from the input text. Higher numbers will make the generated speech more accurate to the input text, ensuring that the words spoken align closely with the provided text.

voiceGuidance

open override var voiceGuidance: Double

A number between 1 and 6. Use lower numbers to reduce how unique your chosen voice will be compared to other voices.

voiceId

var voiceId: String

voiceIdType

@Transient

open override var voiceIdType: PlayHTVoiceIdType

This is the provider-specific ID that will be used.

Functions

assignEnumOverrides

fun assignEnumOverrides()

verifyValues

open override fun verifyValues()