Building a Chatbot

Learn how to build a production-ready chatbot with conversation history, streaming responses, and error handling.

Basic Chatbot

A chatbot maintains conversation context by passing all previous messages with each request:

Python

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.llm.kiwi/v1",
    api_key=os.environ.get("LLM_KIWI_API_KEY")
)

# Conversation history
messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def chat(user_input: str) -> str:
    # Add user message to history
    messages.append({"role": "user", "content": user_input})
    
    # Get response
    response = client.chat.completions.create(
        model="default",
        messages=messages
    )
    
    # Extract assistant message
    assistant_message = response.choices[0].message.content
    
    # Add to history for context
    messages.append({"role": "assistant", "content": assistant_message})
    
    return assistant_message

# Usage
print(chat("Hi, I'm learning Python!"))
print(chat("What's a good first project?"))
print(chat("Can you show me how to start?"))

Streaming Responses

For a better user experience, stream responses in real-time:

Python

def chat_stream(user_input: str):
    messages.append({"role": "user", "content": user_input})
    
    stream = client.chat.completions.create(
        model="default",
        messages=messages,
        stream=True
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)
    
    print()  # Newline after response
    messages.append({"role": "assistant", "content": full_response})
    return full_response

JavaScript Implementation

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.llm.kiwi/v1',
  apiKey: process.env.LLM_KIWI_API_KEY
});

const messages = [
  { role: 'system', content: 'You are a helpful assistant.' }
];

async function chat(userInput) {
  messages.push({ role: 'user', content: userInput });
  
  const response = await client.chat.completions.create({
    model: 'default',
    messages
  });
  
  const assistantMessage = response.choices[0].message.content;
  messages.push({ role: 'assistant', content: assistantMessage });
  
  return assistantMessage;
}

// With streaming
async function chatStream(userInput) {
  messages.push({ role: 'user', content: userInput });
  
  const stream = await client.chat.completions.create({
    model: 'default',
    messages,
    stream: true
  });
  
  let fullResponse = '';
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    fullResponse += content;
    process.stdout.write(content);
  }
  
  console.log();
  messages.push({ role: 'assistant', content: fullResponse });
  return fullResponse;
}

Managing Context Length

Conversation history grows with each turn. Manage context to avoid exceeding limits:

Python

def trim_conversation(messages, max_tokens=3000):
    """Keep conversation under token limit by removing old messages."""
    # Always keep system message
    system_msg = messages[0] if messages[0]["role"] == "system" else None
    
    # Simple approach: keep last N messages
    recent_messages = messages[-10:] if len(messages) > 10 else messages
    
    if system_msg and recent_messages[0] != system_msg:
        recent_messages = [system_msg] + recent_messages
    
    return recent_messages

# Before each request
messages = trim_conversation(messages)

Error Handling

Build resilience with proper error handling:

Python

from openai import RateLimitError, APIError
import time

def chat_with_retry(user_input: str, max_retries: int = 3) -> str:
    messages.append({"role": "user", "content": user_input})
    
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="default",
                messages=messages
            )
            assistant_message = response.choices[0].message.content
            messages.append({"role": "assistant", "content": assistant_message})
            return assistant_message
            
        except RateLimitError:
            wait_time = 2 ** attempt
            print(f"Rate limited. Retrying in {wait_time}s...")
            time.sleep(wait_time)
            
        except APIError as e:
            print(f"API error: {e}")
            messages.pop()  # Remove failed user message
            raise
    
    messages.pop()  # Remove user message if all retries failed
    raise Exception("Max retries exceeded")

Adding System Prompts

Customize chatbot behavior with system prompts:

Python

# Customer support bot
messages = [{
    "role": "system",
    "content": """You are a customer support agent for llm.kiwi.
    
    Guidelines:
    - Be friendly and professional
    - If you don't know something, say so
    - Provide links to documentation when relevant
    - Keep responses concise but helpful
    
    Available resources:
    - Documentation: docs.llm.kiwi
    - Dashboard: llm.kiwi/dashboard
    - Support email: support@llm.kiwi
    """
}]

Production Checklist

Before deploying your chatbot:

Implement rate limiting on your side
Add request timeouts
Log conversations (with privacy compliance)
Handle edge cases (empty input, malformed responses)
Set up error alerting
Monitor token usage and costs

Add Function Calling

Give your chatbot the ability to perform actions and fetch real-time data.

​Basic Chatbot

​Streaming Responses

​JavaScript Implementation

​Managing Context Length

​Error Handling

​Adding System Prompts

​Production Checklist

Add Function Calling

Basic Chatbot

Streaming Responses

JavaScript Implementation

Managing Context Length

Error Handling

Adding System Prompts

Production Checklist