Direct TTS and Background Utilities
The Unity SDK provides flexible options for text-to-speech and utility execution beyond standard conversations. You can play TTS audio directly, run utilities in the background, or combine both for scenarios like story narration with parallel data processing.
Direct Text-to-Speech
Use UGSDK.Speak to play text-to-speech audio independently of any conversation.
Basic Usage
// Play TTS for any text
var result = await UGSDK.Speak.SpeakAndPlayAsync("Hello, welcome to the story!");
TTS Result Data
The SpeakResult contains subtitle timing information:
var result = await UGSDK.Speak.SpeakAndPlayAsync(text);
if (result.Subtitles != null && result.Subtitles.Count > 0)
{
foreach (var subtitle in result.Subtitles)
{
Debug.Log($"'{subtitle.Text}' at {subtitle.StartTimeSec}s for {subtitle.DurationSec}s");
}
}
Waiting for Playback Completion
SpeakAndPlayAsync returns when audio is queued for playback. To wait until playback actually finishes, use the subtitle timing data:
var result = await UGSDK.Speak.SpeakAndPlayAsync(text);
// Calculate total audio duration from subtitles
if (result.Subtitles != null && result.Subtitles.Count > 0)
{
var lastSubtitle = result.Subtitles[result.Subtitles.Count - 1];
float audioDurationSec = lastSubtitle.StartTimeSec + lastSubtitle.DurationSec;
// Wait for playback to complete
await Awaitable.WaitForSecondsAsync(audioDurationSec);
}
// Audio has finished playing
_nextButton.interactable = true;
Displaying TTS Text
Display the spoken text on a UI element while audio plays:
[SerializeField] private TextMeshProUGUI _storyText;
public async void PlayNarration(string text)
{
// Show text on screen
_storyText.text = text;
_storyText.gameObject.SetActive(true);
// Play audio
var result = await UGSDK.Speak.SpeakAndPlayAsync(text);
// Wait for completion using subtitle timing
if (result.Subtitles?.Count > 0)
{
var last = result.Subtitles[result.Subtitles.Count - 1];
await Awaitable.WaitForSecondsAsync(last.StartTimeSec + last.DurationSec);
}
}
Text-Mode Conversations
Text-mode conversations execute the conversation flow (including utilities) without generating audio output. This is useful for running utilities in the background.
Configuration
// Set up conversation configuration
string configJson = GetConfigJson("your_config_name");
UGSDK.ConversationManager.SetConfigurationFromJson(configJson);
UGSDK.ConversationManager.GetConfiguration().Context = context;
// Enable text mode - no audio output from conversation
UGSDK.ConversationManager.GetConfiguration().Mode = ConversationMode.Text;
// Start conversation
UGSDK.ConversationManager.StartConversation();
Running Utilities
Specify utilities to run when the conversation starts:
// Build context for utilities
var context = new Dictionary<string, object>
{
{ "page_text", "Once upon a time..." },
{ "image_description", "A colorful forest scene" },
{ "question_examples", "What color is the tree?" }
};
// Configure and start
UGSDK.ConversationManager.SetConfigurationFromJson(configJson);
UGSDK.ConversationManager.GetConfiguration().Context = context;
UGSDK.ConversationManager.SetOnInputUtilities(new List<string>
{
"get_teachables",
"get_question_w_guidelines"
});
UGSDK.ConversationManager.GetConfiguration().Mode = ConversationMode.Text;
UGSDK.ConversationManager.StartConversation();
Receiving Utility Results
Subscribe to conversation events to receive utility results:
using UG.Models.WebSocketResponseMessages;
private void OnEnable()
{
UGSDK.ConversationManager.OnConversationEvent += HandleConversationEvent;
}
private void OnDisable()
{
UGSDK.ConversationManager.OnConversationEvent -= HandleConversationEvent;
}
private void HandleConversationEvent(ConversationEvent conversationEvent)
{
switch (conversationEvent.Type)
{
case ConversationEventType.DataReceived:
var dataReceived = conversationEvent.Data as DataReceivedData;
if (dataReceived?.Data != null)
{
foreach (var kvp in dataReceived.Data)
{
Debug.Log($"Utility result: {kvp.Key} = {kvp.Value}");
}
}
break;
case ConversationEventType.RunCompleted:
var runCompleted = conversationEvent.Data as RunCompletedData;
if (runCompleted?.Results != null)
{
foreach (var kvp in runCompleted.Results)
{
Debug.Log($"Run completed: {kvp.Key} = {kvp.Value}");
}
}
if (runCompleted?.Errors != null)
{
foreach (var error in runCompleted.Errors)
{
Debug.LogError($"Utility error: {error.Key} - {error.Value}");
}
}
break;
}
}
Combining TTS with Background Utilities
Run utilities in the background while playing direct TTS audio:
public async void ShowStoryPage(string pageText, Dictionary<string, object> context)
{
// Start utilities in background (text-mode, no audio)
UGSDK.ConversationManager.SetConfigurationFromJson(configJson);
UGSDK.ConversationManager.GetConfiguration().Context = context;
UGSDK.ConversationManager.SetOnInputUtilities(new List<string> { "get_teachables" });
UGSDK.ConversationManager.GetConfiguration().Mode = ConversationMode.Text;
UGSDK.ConversationManager.StartConversation();
// Play TTS directly (separate from conversation)
_storyText.text = pageText;
var result = await UGSDK.Speak.SpeakAndPlayAsync(pageText);
// Wait for audio completion
if (result.Subtitles?.Count > 0)
{
var last = result.Subtitles[result.Subtitles.Count - 1];
await Awaitable.WaitForSecondsAsync(last.StartTimeSec + last.DurationSec);
}
// Utility results arrive via OnConversationEvent while TTS plays
_nextButton.interactable = true;
}
Summary
| Capability | API | Notes |
|---|---|---|
| Direct TTS | UGSDK.Speak.SpeakAndPlayAsync() | Independent of conversation |
| TTS timing | SpeakResult.Subtitles | Calculate duration for playback completion |
| Text-mode conversation | Mode = ConversationMode.Text | Runs utilities without audio output |
| Utility results | ConversationEventType.DataReceived, RunCompleted | Via conversation event handler |
Use Cases
- Story narration: Play scripted text with TTS while preparing next-page content
- Background processing: Generate teachable moments, image prompts, or questions without blocking UI
- Non-interactive pages: Display content with audio but no user conversation
- Parallel workflows: Combine direct audio playback with AI-powered data extraction
Related Documentation
- Conversation Events - Full event reference
- Utilities - Utility configuration and types