Skip to main content

Interaction Flow

This guide explains how conversations work with the UG Labs API, covering the complete flow from connection to response.

Overview

All interactions with the UG Labs platform occur through a single WebSocket connection that manages the conversation history for one session. The platform handles:

  • Real-time bidirectional communication
  • Audio and text input/output
  • Streaming responses
  • Conversation context management

Basic Interaction Flow

A typical conversation follows these phases:

1. Initial Setup

Before any interaction can occur:

  1. Establish WebSocket connection to wss://pug.stg.uglabs.app/interact
  2. Authenticate using your access token
  3. Configure the conversation with prompt and settings

2. Audio Streaming

Audio can be streamed to the server in chunks:

  • Chunks can be of arbitrary size - no specific fragmentation required
  • The first chunk must specify audio format (sampling rate and MIME type)
  • Subsequent chunks maintain the same format
  • Audio buffer is cleared after interaction but preserved for transcription requests

Audio Format Specifications:

{
"kind": "add_audio",
"audio": "base64-encoded-data",
"config": {
"sample_rate": 48000,
"mime_type": "audio/pcm"
}
}

3. Response Generation

Responses are streamed back in real-time:

  • Text responses: Streamed word-by-word or phrase-by-phrase
  • Audio responses: Streamed as generated (if audio output is enabled)
  • Utilities output: Extracted insights based on configuration

4. Concurrent Operations

While a response is streaming, you can:

  • Send new configuration updates
  • Queue additional interactions
  • Update conversation context

Authentication

Token Expiration

Authentication tokens expire approximately 60 minutes after issuance. Implement token refresh logic in your application.

Before any operations, authenticate with:

{
"type": "request",
"kind": "authenticate",
"uid": "unique-request-id",
"access_token": "your-access-token"
}

See the Authentication Guide for obtaining access tokens.

Prompt Management

Prompts define the AI assistant's behavior using Jinja2 templating syntax, allowing dynamic context injection.

Basic Prompt Example

prompt = "You are a helpful assistant helping {{ user_name }} with their questions."

Context Variables

Pass context at runtime:

{
"kind": "set_configuration",
"prompt": "You are helping {{ user_name }} who is {{ user_age }} years old.",
"context": {
"user_name": "Alice",
"user_age": 8
}
}

Handling Optional Variables

The system operates in strict mode - all referenced variables must be defined. Use Jinja2's defined test for optional variables:

You are assisting {{ user_name }}
{% if difficulty is defined %}at {{ difficulty }} difficulty level{% endif %}.

Utilities System

Utilities extract structured information from conversations. There are two types:

Classify Utility

Matches conversation against predefined answers. Perfect for:

  • Detecting user intent
  • Identifying conversation topics
  • Triggering specific actions

Example - Sentiment Analysis:

{
"utilities": {
"sentiment": {
"type": "classify",
"classification_question": "What is the sentiment of the user's message?",
"answers": ["positive", "negative", "neutral"]
}
}
}

Extract Utility

Generates custom text responses without predefined constraints. Perfect for:

  • Entity extraction
  • Conversation summarization
  • Custom data extraction

Example - Name Extraction:

{
"utilities": {
"user_name": {
"type": "extract",
"extract_prompt": "Extract the user's name from the conversation. Return just the name or 'unknown' if not mentioned."
}
}
}

Execution Timing

Utilities can run at different times, affecting latency:

TimingWhen It RunsLatency ImpactUse Case
on_inputBefore LLM generationHighWhen output should influence the response
on_outputAfter LLM generationLowWhen output is for logging/analytics
on_input_non_blockingParallel to LLMMinimalWhen output doesn't affect response

Example with timing:

{
"kind": "interact",
"text": "I love this!",
"on_input": ["sentiment"],
"on_output": ["conversation_summary"]
}

Transcription

The platform supports two transcription modes:

1. Interaction-Integrated Transcription

Automatic transcription during interact requests:

{
"kind": "interact",
"audio_input": true
}

Response includes transcription:

{
"kind": "transcription",
"text": "Hello, how can I help you?",
"is_final": true
}

2. Standalone Transcription

For voice input integration or real-time phrase detection:

{
"kind": "transcribe",
"uid": "unique-request-id"
}
tip

Standalone transcription preserves the audio buffer, allowing multiple transcriptions from the same audio without re-uploading.

Session Management

Starting Fresh

Sessions begin with empty conversation history:

{
"kind": "set_configuration",
"prompt": "You are a helpful assistant."
}

Resuming Previous Conversations

Import prior messages for context continuity:

{
"kind": "set_configuration",
"prompt": "You are a helpful assistant.",
"history": [
{
"role": "user",
"content": "What's the weather?"
},
{
"role": "assistant",
"content": "I don't have access to real-time weather data."
}
]
}

Session Context

Each WebSocket connection = one session:

  • Maintains conversation history
  • Preserves configuration
  • Isolated from other sessions

When the connection closes, the session ends.

Service Profiles

Service profiles control LLM models and TTS providers. The default profile optimizes for most use cases.

Custom Profiles

For specific latency or provider requirements, request custom profiles from support:

{
"kind": "set_configuration",
"prompt": "You are a helpful assistant.",
"service_profile": "my-team:fast-response-smaller-model"
}

Custom profiles are configured by administrators and referenced by identifier.

Best Practices

1. Handle Token Expiration

conversation.on('error', (error) => {
if (error.code === 'TOKEN_EXPIRED') {
// Refresh token and reconnect
await refreshAuthentication();
}
});

2. Stream Audio Efficiently

// Send audio in ~100ms chunks
const timeslice = 100;
const recorder = new MediaRecorder(stream, {
mimeType: 'audio/webm',
audioBitsPerSecond: 16000
});

recorder.ondataavailable = (event) => {
sendAudioChunk(event.data);
};

recorder.start(timeslice);

3. Use Utilities Wisely

  • Use on_input only when the utility affects the response
  • Use on_output for analytics and logging
  • Use on_input_non_blocking for parallel processing

4. Manage Context Size

Long conversation histories can slow down responses. Consider:

  • Summarizing old conversations
  • Limiting history to recent messages
  • Using utilities to extract key information

Next Steps