Audio Transcription

Audio transcription is a Pro feature. Upgrade to access.

Transform audio files into accurate text transcriptions using our optimized Whisper implementation.

Endpoint

POST https://api.llm.kiwi/v1/audio/transcriptions

Request Parameters

Parameter	Type	Required	Description
`file`	file	Yes	Audio file to transcribe (max 25MB).
`model`	string	Yes	Use `whisper-1` for standard transcription.
`language`	string	No	ISO-639-1 code (e.g., `en`, `es`, `fr`). Auto-detected if omitted.
`prompt`	string	No	Optional context to guide transcription style.
`response_format`	string	No	Output format: `json`, `text`, `srt`, `vtt`, `verbose_json`.
`timestamp_granularities`	array	No	Include `word` and/or `segment` timestamps.

Supported Audio Formats

mp3, mp4, mpeg, mpga, m4a, wav, webm

Example Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.llm.kiwi/v1",
    api_key="YOUR_API_KEY"
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

print(transcript.text)

Response Formats

Format	Description
`json`	Simple JSON with text only.
`text`	Plain text output.
`srt`	SubRip subtitle format.
`vtt`	WebVTT subtitle format.
`verbose_json`	JSON with word-level timestamps and metadata.

Verbose JSON Response

{
  "text": "Hello, welcome to our podcast.",
  "language": "en",
  "duration": 5.2,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.5 },
    { "word": "welcome", "start": 0.6, "end": 1.0 },
    { "word": "to", "start": 1.1, "end": 1.2 },
    { "word": "our", "start": 1.3, "end": 1.5 },
    { "word": "podcast", "start": 1.6, "end": 2.1 }
  ]
}

Best Practices

Audio Quality: Clear audio with minimal background noise yields best results
File Size: Keep files under 25MB (use compression if needed)
Language Hints: Specify language for non-English audio to improve accuracy
Prompt Context: Use prompt to provide domain-specific terminology

Batch Processing

Need to process large volumes? Contact us for batch API access.

​Endpoint

​Request Parameters

​Supported Audio Formats

​Example Usage

​Response Formats

​Verbose JSON Response

​Best Practices

Batch Processing

Endpoint

Request Parameters

Supported Audio Formats

Example Usage

Response Formats

Verbose JSON Response

Best Practices