Skip to main content
To ensure platform stability and fair access for all users, llm.kiwi implements rate limits across all API endpoints.

Rate Limits by Tier

All limits reset daily at Midnight UTC.
TierDaily RequestsRequests/Min (RPM)Max Tokens
Anonymous100 (shared)102,048
Free (registered)500204,096
ProUnlimited*10032,768
EnterpriseCustomCustomCustom
*Pro tier has soft limits for abuse prevention. Contact support if you need higher throughput.

Rate Limit Headers

Each API response includes headers to help you track your usage:
HeaderDescription
X-RateLimit-LimitMaximum requests allowed per minute
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the limit resets

Handling Rate Limits

When a rate limit is reached, the API returns a 429 Too Many Requests response.

Retry Strategy

Implement exponential backoff with jitter for optimal retry handling:
import time
import random

def make_request_with_retry(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait_time)

Best Practices

  1. Exponential Backoff: Wait progressively longer between retries (2s, 4s, 8s, etc.)
  2. Request Batching: Combine multiple operations when possible
  3. Token Monitoring: Check response headers to anticipate limits
  4. Caching: Cache responses for repeated identical queries

Increasing Your Limits

Upgrade to Pro

Unlock unlimited requests and higher token limits.

Enterprise

Custom limits tailored to your organization’s needs.