← back to writings

Writings 2026

Streaming AI Responses with the Vercel AI SDK

Streaming is one of the simplest ways to make an AI feature feel responsive. Instead of waiting for the model to finish an entire answer, the server sends incremental events as generation happens. The user sees the response begin quickly, and the interface continues updating until the model is done.

The Vercel AI SDK handles the two awkward parts of this flow: translating a provider’s stream into a consistent protocol on the server and merging those events into React state on the client.

This example uses Next.js App Router, React, and the current transport-based useChat API.

Install the SDK

Start with a Next.js application and install the core SDK plus its React bindings:

bash
npm install ai @ai-sdk/react

The example uses Vercel AI Gateway, which lets the SDK address a model with a provider/model string. Add the gateway key to .env.local:

bash
AI_GATEWAY_API_KEY=your_gateway_key
AI_MODEL=openai/gpt-5-mini

Keep this key on the server. Environment variables without the NEXT_PUBLIC_ prefix are not exposed to browser code by Next.js.

Create the streaming route

Add a Route Handler at app/api/chat/route.ts:

app/api/chat/route.ts
import { convertToModelMessages, streamText, type UIMessage } from 'ai';

export const maxDuration = 30;

export async function POST(request: Request) {
  const body = (await request.json()) as { messages?: UIMessage[] };

  if (!Array.isArray(body.messages)) {
    return Response.json({ error: 'The request must include a messages array.' }, { status: 400 });
  }

  const result = streamText({
    model: process.env.AI_MODEL ?? 'openai/gpt-5-mini',
    system: 'You are a concise technical assistant.',
    messages: await convertToModelMessages(body.messages),
    abortSignal: request.signal,
  });

  return result.toUIMessageStreamResponse();
}

There are three distinct operations here:

  1. The route receives UIMessage[], the message shape used by the chat interface.
  2. convertToModelMessages converts that UI state into the model-facing message format.
  3. toUIMessageStreamResponse returns an HTTP response using the AI SDK UI stream protocol.

It is tempting to pass the request body directly to the model. Keeping UI and model messages separate is important, though. UI messages can contain metadata and typed parts for text, tool calls, tool results, and other interface state that should not be forwarded blindly.

streamText starts the model generation and gives us a result object immediately. The response method then connects the generation stream to the outgoing HTTP response. We do not concatenate tokens or create a ReadableStream ourselves.

Render the stream with useChat

The client owns the input field, while useChat owns the conversation and request lifecycle. Create app/page.tsx:

app/page.tsx
'use client';

import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import { type FormEvent, useState } from 'react';

export default function ChatPage() {
  const [input, setInput] = useState('');
  const { messages, sendMessage, status, error, stop } = useChat({
    transport: new DefaultChatTransport({
      api: '/api/chat',
    }),
  });

  const isGenerating = status === 'submitted' || status === 'streaming';

  async function handleSubmit(event: FormEvent<HTMLFormElement>) {
    event.preventDefault();

    const text = input.trim();
    if (!text || isGenerating) return;

    setInput('');
    await sendMessage({ text });
  }

  return (
    <main>
      <h1>Streaming chat</h1>

      <section aria-label="Conversation">
        {messages.map((message) => (
          <article key={message.id}>
            <h2>{message.role === 'user' ? 'You' : 'Assistant'}</h2>

            {message.parts.map((part, index) => {
              if (part.type !== 'text') return null;

              return <p key={`${message.id}-${index}`}>{part.text}</p>;
            })}
          </article>
        ))}
      </section>

      <p role="status" aria-live="polite">
        {status === 'submitted' && 'Sending message…'}
        {status === 'streaming' && 'Generating response…'}
      </p>

      {error && <p role="alert">The response could not be generated. Please try again.</p>}

      <form onSubmit={handleSubmit}>
        <label htmlFor="message">Message</label>
        <textarea
          id="message"
          name="message"
          value={input}
          onChange={(event) => setInput(event.currentTarget.value)}
          disabled={isGenerating}
        />

        <button type="submit" disabled={!input.trim() || isGenerating}>
          Send
        </button>

        {isGenerating && (
          <button type="button" onClick={stop}>
            Stop generating
          </button>
        )}
      </form>
    </main>
  );
}

The key detail is message.parts. Current AI SDK UI messages are not represented by one content string. They contain typed parts, and a production interface should render each supported type explicitly. This example handles text and safely ignores everything else.

The status value describes the request lifecycle:

Status Meaning
submitted The user message was sent and the response has not started.
streaming Response events are arriving.
ready The chat can accept another message.
error The most recent request failed.

These states are more reliable than maintaining a separate isLoading flag. They also make it easy to disable duplicate submissions and provide meaningful feedback.

What happens over the network

When the user presses Send, the sequence is:

text
useChat
  → POST /api/chat with the UI message history
  → convertToModelMessages(messages)
  → streamText({ model, messages })
  → AI SDK UI stream response
  → useChat merges each event into messages
  → React renders the latest text part

This is not a page refresh and it is not repeated polling. One HTTP request remains open while the server sends a sequence of events. React re-renders as useChat applies those events to the current assistant message.

Add useful server-side logging

Streaming changes when errors can occur. A route can return successfully and still encounter a provider failure after generation begins. Use the callbacks on streamText for server-side telemetry:

app/api/chat/route.ts
const result = streamText({
  model: process.env.AI_MODEL ?? 'openai/gpt-5-mini',
  messages: await convertToModelMessages(body.messages),
  abortSignal: request.signal,
  onFinish({ finishReason, usage }) {
    console.info('chat generation finished', {
      finishReason,
      inputTokens: usage.inputTokens,
      outputTokens: usage.outputTokens,
    });
  },
  onError({ error }) {
    console.error('chat generation failed', { error });
  },
});

Avoid sending raw provider errors to the browser. They can contain implementation details and are rarely useful to the user. Log the full error on the server and show a stable, actionable message in the interface.

Passing request.signal as the abort signal allows cancellation to propagate when the browser disconnects or the user presses Stop generating. Providers differ in how quickly they honor cancellation, so track both cancelled requests and billed usage rather than assuming cancellation is instantaneous.

Common mistakes

The first version of a streaming chat often works locally but misses a few production constraints:

  • Using an old useChat example. Current versions use sendMessage, typed message parts, and a transport-based API. Older tutorials commonly use hook-managed input state and handleSubmit.
  • Rendering message.content. Render message.parts so the interface can evolve to support tools, files, reasoning, and custom data.
  • Exposing the provider key. Model calls belong in the server route, never in a client component.
  • Accepting unlimited history. Long conversations increase latency and cost. Validate, trim, or summarize history before calling the model.
  • Treating streaming as authentication. The route still needs authorization, rate limiting, abuse controls, and request-size limits.
  • Ignoring disconnects. Propagate the request’s abort signal and record incomplete generations.

A sensible next step

Once basic text streaming is stable, add one capability at a time: persisted conversations, tool calls, message metadata, or custom data parts. The UI stream protocol supports these features without replacing the transport established here.

The official Next.js App Router guide, chatbot guide, and useChat reference document the current API and additional options.