Interaction Flow
This guide explains how conversations work with the UG Labs API, covering the complete flow from connection to response.
Overview
All interactions with the UG Labs platform occur through a single WebSocket connection that manages the conversation history for one session. The platform handles:
- Real-time bidirectional communication
- Audio and text input/output
- Streaming responses
- Conversation context management
Basic Interaction Flow
A typical conversation follows these phases:
1. Initial Setup
Before any interaction can occur:
- Establish WebSocket connection to
wss://pug.stg.uglabs.app/interact - Authenticate using your access token
- Configure the conversation with prompt and settings
2. Audio Streaming
Audio can be streamed to the server in chunks:
- Chunks can be of arbitrary size - no specific fragmentation required
- The first chunk must specify audio format (sampling rate and MIME type)
- Subsequent chunks maintain the same format
- Audio buffer is cleared after interaction but preserved for transcription requests
Audio Format Specifications:
{
"kind": "add_audio",
"audio": "base64-encoded-data",
"config": {
"sample_rate": 48000,
"mime_type": "audio/pcm"
}
}
3. Response Generation
Responses are streamed back in real-time:
- Text responses: Streamed word-by-word or phrase-by-phrase
- Audio responses: Streamed as generated (if audio output is enabled)
- Utilities output: Extracted insights based on configuration
4. Concurrent Operations
While a response is streaming, you can:
- Send new configuration updates
- Queue additional interactions
- Update conversation context
Authentication
Authentication tokens expire approximately 60 minutes after issuance. Implement token refresh logic in your application.
Before any operations, authenticate with:
{
"type": "request",
"kind": "authenticate",
"uid": "unique-request-id",
"access_token": "your-access-token"
}
See the Authentication Guide for obtaining access tokens.
Prompt Management
Prompts define the AI assistant's behavior using Jinja2 templating syntax, allowing dynamic context injection.
Basic Prompt Example
prompt = "You are a helpful assistant helping {{ user_name }} with their questions."
Context Variables
Pass context at runtime:
{
"kind": "set_configuration",
"prompt": "You are helping {{ user_name }} who is {{ user_age }} years old.",
"context": {
"user_name": "Alice",
"user_age": 8
}
}
Handling Optional Variables
The system operates in strict mode - all referenced variables must be defined. Use Jinja2's defined test for optional variables:
You are assisting {{ user_name }}
{% if difficulty is defined %}at {{ difficulty }} difficulty level{% endif %}.
Utilities System
Utilities extract structured information from conversations. There are two types:
Classify Utility
Matches conversation against predefined answers. Perfect for:
- Detecting user intent
- Identifying conversation topics
- Triggering specific actions
Example - Sentiment Analysis:
{
"utilities": {
"sentiment": {
"type": "classify",
"classification_question": "What is the sentiment of the user's message?",
"answers": ["positive", "negative", "neutral"]
}
}
}
Extract Utility
Generates custom text responses without predefined constraints. Perfect for:
- Entity extraction
- Conversation summarization
- Custom data extraction
Example - Name Extraction:
{
"utilities": {
"user_name": {
"type": "extract",
"extract_prompt": "Extract the user's name from the conversation. Return just the name or 'unknown' if not mentioned."
}
}
}
Execution Timing
Utilities can run at different times, affecting latency:
| Timing | When It Runs | Latency Impact | Use Case |
|---|---|---|---|
| on_input | Before LLM generation | High | When output should influence the response |
| on_output | After LLM generation | Low | When output is for logging/analytics |
| on_input_non_blocking | Parallel to LLM | Minimal | When output doesn't affect response |
Example with timing:
{
"kind": "interact",
"text": "I love this!",
"on_input": ["sentiment"],
"on_output": ["conversation_summary"]
}
Transcription
The platform supports two transcription modes:
1. Interaction-Integrated Transcription
Automatic transcription during interact requests:
{
"kind": "interact",
"audio_input": true
}
Response includes transcription:
{
"kind": "transcription",
"text": "Hello, how can I help you?",
"is_final": true
}
2. Standalone Transcription
For voice input integration or real-time phrase detection:
{
"kind": "transcribe",
"uid": "unique-request-id"
}
Standalone transcription preserves the audio buffer, allowing multiple transcriptions from the same audio without re-uploading.
Session Management
Starting Fresh
Sessions begin with empty conversation history:
{
"kind": "set_configuration",
"prompt": "You are a helpful assistant."
}
Resuming Previous Conversations
Import prior messages for context continuity:
{
"kind": "set_configuration",
"prompt": "You are a helpful assistant.",
"history": [
{
"role": "user",
"content": "What's the weather?"
},
{
"role": "assistant",
"content": "I don't have access to real-time weather data."
}
]
}
Session Context
Each WebSocket connection = one session:
- Maintains conversation history
- Preserves configuration
- Isolated from other sessions
When the connection closes, the session ends.
Service Profiles
Service profiles control LLM models and TTS providers. The default profile optimizes for most use cases.
Custom Profiles
For specific latency or provider requirements, request custom profiles from support:
{
"kind": "set_configuration",
"prompt": "You are a helpful assistant.",
"service_profile": "my-team:fast-response-smaller-model"
}
Custom profiles are configured by administrators and referenced by identifier.
Best Practices
1. Handle Token Expiration
conversation.on('error', (error) => {
if (error.code === 'TOKEN_EXPIRED') {
// Refresh token and reconnect
await refreshAuthentication();
}
});
2. Stream Audio Efficiently
// Send audio in ~100ms chunks
const timeslice = 100;
const recorder = new MediaRecorder(stream, {
mimeType: 'audio/webm',
audioBitsPerSecond: 16000
});
recorder.ondataavailable = (event) => {
sendAudioChunk(event.data);
};
recorder.start(timeslice);
3. Use Utilities Wisely
- Use
on_inputonly when the utility affects the response - Use
on_outputfor analytics and logging - Use
on_input_non_blockingfor parallel processing
4. Manage Context Size
Long conversation histories can slow down responses. Consider:
- Summarizing old conversations
- Limiting history to recent messages
- Using utilities to extract key information
Next Steps
- WebSocket Protocol - Detailed message specifications
- Utilities Guide - Deep dive into utilities
- Examples - Working code examples