Skip to main content

Direct TTS and Background Utilities

The Unity SDK provides flexible options for text-to-speech and utility execution beyond standard conversations. You can play TTS audio directly, run utilities in the background, or combine both for scenarios like story narration with parallel data processing.

Direct Text-to-Speech

Use UGSDK.Speak to play text-to-speech audio independently of any conversation.

Basic Usage

// Play TTS for any text
var result = await UGSDK.Speak.SpeakAndPlayAsync("Hello, welcome to the story!");

TTS Result Data

The SpeakResult contains subtitle timing information:

var result = await UGSDK.Speak.SpeakAndPlayAsync(text);

if (result.Subtitles != null && result.Subtitles.Count > 0)
{
foreach (var subtitle in result.Subtitles)
{
Debug.Log($"'{subtitle.Text}' at {subtitle.StartTimeSec}s for {subtitle.DurationSec}s");
}
}

Waiting for Playback Completion

SpeakAndPlayAsync returns when audio is queued for playback. To wait until playback actually finishes, use the subtitle timing data:

var result = await UGSDK.Speak.SpeakAndPlayAsync(text);

// Calculate total audio duration from subtitles
if (result.Subtitles != null && result.Subtitles.Count > 0)
{
var lastSubtitle = result.Subtitles[result.Subtitles.Count - 1];
float audioDurationSec = lastSubtitle.StartTimeSec + lastSubtitle.DurationSec;

// Wait for playback to complete
await Awaitable.WaitForSecondsAsync(audioDurationSec);
}

// Audio has finished playing
_nextButton.interactable = true;

Displaying TTS Text

Display the spoken text on a UI element while audio plays:

[SerializeField] private TextMeshProUGUI _storyText;

public async void PlayNarration(string text)
{
// Show text on screen
_storyText.text = text;
_storyText.gameObject.SetActive(true);

// Play audio
var result = await UGSDK.Speak.SpeakAndPlayAsync(text);

// Wait for completion using subtitle timing
if (result.Subtitles?.Count > 0)
{
var last = result.Subtitles[result.Subtitles.Count - 1];
await Awaitable.WaitForSecondsAsync(last.StartTimeSec + last.DurationSec);
}
}

Text-Mode Conversations

Text-mode conversations execute the conversation flow (including utilities) without generating audio output. This is useful for running utilities in the background.

Configuration

// Set up conversation configuration
string configJson = GetConfigJson("your_config_name");
UGSDK.ConversationManager.SetConfigurationFromJson(configJson);
UGSDK.ConversationManager.GetConfiguration().Context = context;

// Enable text mode - no audio output from conversation
UGSDK.ConversationManager.GetConfiguration().Mode = ConversationMode.Text;

// Start conversation
UGSDK.ConversationManager.StartConversation();

Running Utilities

Specify utilities to run when the conversation starts:

// Build context for utilities
var context = new Dictionary<string, object>
{
{ "page_text", "Once upon a time..." },
{ "image_description", "A colorful forest scene" },
{ "question_examples", "What color is the tree?" }
};

// Configure and start
UGSDK.ConversationManager.SetConfigurationFromJson(configJson);
UGSDK.ConversationManager.GetConfiguration().Context = context;
UGSDK.ConversationManager.SetOnInputUtilities(new List<string>
{
"get_teachables",
"get_question_w_guidelines"
});
UGSDK.ConversationManager.GetConfiguration().Mode = ConversationMode.Text;
UGSDK.ConversationManager.StartConversation();

Receiving Utility Results

Subscribe to conversation events to receive utility results:

using UG.Models.WebSocketResponseMessages;

private void OnEnable()
{
UGSDK.ConversationManager.OnConversationEvent += HandleConversationEvent;
}

private void OnDisable()
{
UGSDK.ConversationManager.OnConversationEvent -= HandleConversationEvent;
}

private void HandleConversationEvent(ConversationEvent conversationEvent)
{
switch (conversationEvent.Type)
{
case ConversationEventType.DataReceived:
var dataReceived = conversationEvent.Data as DataReceivedData;
if (dataReceived?.Data != null)
{
foreach (var kvp in dataReceived.Data)
{
Debug.Log($"Utility result: {kvp.Key} = {kvp.Value}");
}
}
break;

case ConversationEventType.RunCompleted:
var runCompleted = conversationEvent.Data as RunCompletedData;
if (runCompleted?.Results != null)
{
foreach (var kvp in runCompleted.Results)
{
Debug.Log($"Run completed: {kvp.Key} = {kvp.Value}");
}
}
if (runCompleted?.Errors != null)
{
foreach (var error in runCompleted.Errors)
{
Debug.LogError($"Utility error: {error.Key} - {error.Value}");
}
}
break;
}
}

Combining TTS with Background Utilities

Run utilities in the background while playing direct TTS audio:

public async void ShowStoryPage(string pageText, Dictionary<string, object> context)
{
// Start utilities in background (text-mode, no audio)
UGSDK.ConversationManager.SetConfigurationFromJson(configJson);
UGSDK.ConversationManager.GetConfiguration().Context = context;
UGSDK.ConversationManager.SetOnInputUtilities(new List<string> { "get_teachables" });
UGSDK.ConversationManager.GetConfiguration().Mode = ConversationMode.Text;
UGSDK.ConversationManager.StartConversation();

// Play TTS directly (separate from conversation)
_storyText.text = pageText;
var result = await UGSDK.Speak.SpeakAndPlayAsync(pageText);

// Wait for audio completion
if (result.Subtitles?.Count > 0)
{
var last = result.Subtitles[result.Subtitles.Count - 1];
await Awaitable.WaitForSecondsAsync(last.StartTimeSec + last.DurationSec);
}

// Utility results arrive via OnConversationEvent while TTS plays
_nextButton.interactable = true;
}

Summary

CapabilityAPINotes
Direct TTSUGSDK.Speak.SpeakAndPlayAsync()Independent of conversation
TTS timingSpeakResult.SubtitlesCalculate duration for playback completion
Text-mode conversationMode = ConversationMode.TextRuns utilities without audio output
Utility resultsConversationEventType.DataReceived, RunCompletedVia conversation event handler

Use Cases

  • Story narration: Play scripted text with TTS while preparing next-page content
  • Background processing: Generate teachable moments, image prompts, or questions without blocking UI
  • Non-interactive pages: Display content with audio but no user conversation
  • Parallel workflows: Combine direct audio playback with AI-powered data extraction