Skip to main content
The LLM pipeline is OpenAI API-compatible but does not implement all features of the OpenAI API.
The default Gateway used in this guide is the public Livepeer.cloud Gateway. It is free to use but not intended for production-ready applications. For production-ready applications, consider using the Livepeer Studio Gateway, which requires an API token. Alternatively, you can set up your own Gateway node or partner with one via the ai-video channel on Discord.

Streaming Responses

Ensure your client supports SSE and processes each data: line as it arrives.
By default, the /llm endpoint returns a single JSON response in the OpenAI chat/completions format, as shown in the sidebar. To receive responses token-by-token, set "stream": true in the request body. The server will then use Server-Sent Events (SSE) to stream output in real time. Each streamed chunk will look like:
data: {
  "choices": [
    {
      "delta": {
        "content": "...token...",
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}
The final chunk will have empty content and "finish_reason": "stop":
data: {
  "choices": [
    {
      "delta": {
        "content": "",
        "role": "assistant"
      },
      "finish_reason": "stop"
    }
  ]
}