Interaction Flow

This guide explains how conversations work with the UG Labs API, covering the complete flow from connection to response.

Overview

All interactions with the UG Labs platform occur through a single WebSocket connection that manages the conversation history for one session. The platform handles:

Real-time bidirectional communication
Audio and text input/output
Streaming responses
Conversation context management

Basic Interaction Flow

A typical conversation follows these phases:

1. Initial Setup

Before any interaction can occur:

Establish WebSocket connection to wss://pug.stg.uglabs.app/interact
Authenticate using your access token
Configure the conversation with prompt and settings

2. Audio Streaming

Audio can be streamed to the server in chunks:

Chunks can be of arbitrary size - no specific fragmentation required
The first chunk must specify audio format (sampling rate and MIME type)
Subsequent chunks maintain the same format
Audio buffer is cleared after interaction but preserved for transcription requests

Audio Format Specifications:

{
  "kind": "add_audio",
  "audio": "base64-encoded-data",
  "config": {
    "sample_rate": 48000,
    "mime_type": "audio/pcm"
  }
}

3. Response Generation

Responses are streamed back in real-time:

Text responses: Streamed word-by-word or phrase-by-phrase
Audio responses: Streamed as generated (if audio output is enabled)
Utilities output: Extracted insights based on configuration

4. Concurrent Operations

While a response is streaming, you can:

Send new configuration updates
Queue additional interactions
Update conversation context

Authentication

Token Expiration

Authentication tokens expire approximately 60 minutes after issuance. Implement token refresh logic in your application.

Before any operations, authenticate with:

{
  "type": "request",
  "kind": "authenticate",
  "uid": "unique-request-id",
  "access_token": "your-access-token"
}

See the Authentication Guide for obtaining access tokens.

Prompt Management

Prompts define the AI assistant's behavior using Jinja2 templating syntax, allowing dynamic context injection.

Basic Prompt Example

prompt = "You are a helpful assistant helping {{ user_name }} with their questions."

Context Variables

Pass context at runtime:

{
  "kind": "set_configuration",
  "prompt": "You are helping {{ user_name }} who is {{ user_age }} years old.",
  "context": {
    "user_name": "Alice",
    "user_age": 8
  }
}

Handling Optional Variables

The system operates in strict mode - all referenced variables must be defined. Use Jinja2's defined test for optional variables:

You are assisting {{ user_name }}
{% if difficulty is defined %}at {{ difficulty }} difficulty level{% endif %}.

Utilities System

Utilities extract structured information from conversations. There are two types:

Classify Utility

Matches conversation against predefined answers. Perfect for:

Detecting user intent
Identifying conversation topics
Triggering specific actions

Example - Sentiment Analysis:

{
  "utilities": {
    "sentiment": {
      "type": "classify",
      "classification_question": "What is the sentiment of the user's message?",
      "answers": ["positive", "negative", "neutral"]
    }
  }
}

Extract Utility

Generates custom text responses without predefined constraints. Perfect for:

Entity extraction
Conversation summarization
Custom data extraction

Example - Name Extraction:

{
  "utilities": {
    "user_name": {
      "type": "extract",
      "extract_prompt": "Extract the user's name from the conversation. Return just the name or 'unknown' if not mentioned."
    }
  }
}

Execution Timing

Utilities can run at different times, affecting latency:

Timing	When It Runs	Latency Impact	Use Case
on_input	Before LLM generation	High	When output should influence the response
on_output	After LLM generation	Low	When output is for logging/analytics
on_input_non_blocking	Parallel to LLM	Minimal	When output doesn't affect response

Example with timing:

{
  "kind": "interact",
  "text": "I love this!",
  "on_input": ["sentiment"],
  "on_output": ["conversation_summary"]
}

Transcription

The platform supports two transcription modes:

1. Interaction-Integrated Transcription

Automatic transcription during interact requests:

{
  "kind": "interact",
  "audio_input": true
}

Response includes transcription:

{
  "kind": "transcription",
  "text": "Hello, how can I help you?",
  "is_final": true
}

2. Standalone Transcription

For voice input integration or real-time phrase detection:

{
  "kind": "transcribe",
  "uid": "unique-request-id"
}

tip

Standalone transcription preserves the audio buffer, allowing multiple transcriptions from the same audio without re-uploading.

Session Management

Starting Fresh

Sessions begin with empty conversation history:

{
  "kind": "set_configuration",
  "prompt": "You are a helpful assistant."
}

Resuming Previous Conversations

Import prior messages for context continuity:

{
  "kind": "set_configuration",
  "prompt": "You are a helpful assistant.",
  "history": [
    {
      "role": "user",
      "content": "What's the weather?"
    },
    {
      "role": "assistant",
      "content": "I don't have access to real-time weather data."
    }
  ]
}

Session Context

Each WebSocket connection = one session:

Maintains conversation history
Preserves configuration
Isolated from other sessions

When the connection closes, the session ends.

Service Profiles

Service profiles control LLM models and TTS providers. The default profile optimizes for most use cases.

Custom Profiles

For specific latency or provider requirements, request custom profiles from support:

{
  "kind": "set_configuration",
  "prompt": "You are a helpful assistant.",
  "service_profile": "my-team:fast-response-smaller-model"
}

Custom profiles are configured by administrators and referenced by identifier.

Best Practices

1. Handle Token Expiration

conversation.on('error', (error) => {
  if (error.code === 'TOKEN_EXPIRED') {
    // Refresh token and reconnect
    await refreshAuthentication();
  }
});

2. Stream Audio Efficiently

// Send audio in ~100ms chunks
const timeslice = 100;
const recorder = new MediaRecorder(stream, {
  mimeType: 'audio/webm',
  audioBitsPerSecond: 16000
});

recorder.ondataavailable = (event) => {
  sendAudioChunk(event.data);
};

recorder.start(timeslice);

3. Use Utilities Wisely

Use on_input only when the utility affects the response
Use on_output for analytics and logging
Use on_input_non_blocking for parallel processing

4. Manage Context Size

Long conversation histories can slow down responses. Consider:

Summarizing old conversations
Limiting history to recent messages
Using utilities to extract key information

Next Steps

WebSocket Protocol - Detailed message specifications
Utilities Guide - Deep dive into utilities
Examples - Working code examples

Overview​

Basic Interaction Flow​

1. Initial Setup​

2. Audio Streaming​

3. Response Generation​

4. Concurrent Operations​

Authentication​

Prompt Management​

Basic Prompt Example​

Context Variables​

Handling Optional Variables​

Utilities System​

Classify Utility​

Extract Utility​

Execution Timing​

Transcription​

1. Interaction-Integrated Transcription​

2. Standalone Transcription​

Session Management​

Starting Fresh​

Resuming Previous Conversations​

Session Context​

Service Profiles​

Custom Profiles​

Best Practices​

1. Handle Token Expiration​

2. Stream Audio Efficiently​

3. Use Utilities Wisely​

4. Manage Context Size​

Next Steps​