Writings 2026
Streaming AI Responses with the Vercel AI SDK
Streaming is one of the simplest ways to make an AI feature feel responsive. Instead of waiting for the model to finish an entire answer, the server sends incremental events as generation happens. The user sees the response begin quickly, and the interface continues updating until the model is done.
The Vercel AI SDK handles the two awkward parts of this flow: translating a provider’s stream into a consistent protocol on the server and merging those events into React state on the client.
This example uses Next.js App Router, React, and the current transport-based useChat API.
Install the SDK
Start with a Next.js application and install the core SDK plus its React bindings:
npm install ai @ai-sdk/reactThe example uses Vercel AI Gateway, which lets the SDK address a model with
a provider/model string. Add the gateway key to .env.local:
AI_GATEWAY_API_KEY=your_gateway_key
AI_MODEL=openai/gpt-5-miniKeep this key on the server. Environment variables without the NEXT_PUBLIC_ prefix are not exposed to
browser code by Next.js.
Create the streaming route
Add a Route Handler at app/api/chat/route.ts:
import { convertToModelMessages, streamText, type UIMessage } from 'ai';
export const maxDuration = 30;
export async function POST(request: Request) {
const body = (await request.json()) as { messages?: UIMessage[] };
if (!Array.isArray(body.messages)) {
return Response.json({ error: 'The request must include a messages array.' }, { status: 400 });
}
const result = streamText({
model: process.env.AI_MODEL ?? 'openai/gpt-5-mini',
system: 'You are a concise technical assistant.',
messages: await convertToModelMessages(body.messages),
abortSignal: request.signal,
});
return result.toUIMessageStreamResponse();
}There are three distinct operations here:
- The route receives
UIMessage[], the message shape used by the chat interface. convertToModelMessagesconverts that UI state into the model-facing message format.toUIMessageStreamResponsereturns an HTTP response using the AI SDK UI stream protocol.
It is tempting to pass the request body directly to the model. Keeping UI and model messages separate is important, though. UI messages can contain metadata and typed parts for text, tool calls, tool results, and other interface state that should not be forwarded blindly.
streamText starts the model generation and gives us a result object immediately. The response method
then connects the generation stream to the outgoing HTTP response. We do not concatenate tokens or create
a ReadableStream ourselves.
Render the stream with useChat
The client owns the input field, while useChat owns the conversation and request lifecycle. Create
app/page.tsx:
'use client';
import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import { type FormEvent, useState } from 'react';
export default function ChatPage() {
const [input, setInput] = useState('');
const { messages, sendMessage, status, error, stop } = useChat({
transport: new DefaultChatTransport({
api: '/api/chat',
}),
});
const isGenerating = status === 'submitted' || status === 'streaming';
async function handleSubmit(event: FormEvent<HTMLFormElement>) {
event.preventDefault();
const text = input.trim();
if (!text || isGenerating) return;
setInput('');
await sendMessage({ text });
}
return (
<main>
<h1>Streaming chat</h1>
<section aria-label="Conversation">
{messages.map((message) => (
<article key={message.id}>
<h2>{message.role === 'user' ? 'You' : 'Assistant'}</h2>
{message.parts.map((part, index) => {
if (part.type !== 'text') return null;
return <p key={`${message.id}-${index}`}>{part.text}</p>;
})}
</article>
))}
</section>
<p role="status" aria-live="polite">
{status === 'submitted' && 'Sending message…'}
{status === 'streaming' && 'Generating response…'}
</p>
{error && <p role="alert">The response could not be generated. Please try again.</p>}
<form onSubmit={handleSubmit}>
<label htmlFor="message">Message</label>
<textarea
id="message"
name="message"
value={input}
onChange={(event) => setInput(event.currentTarget.value)}
disabled={isGenerating}
/>
<button type="submit" disabled={!input.trim() || isGenerating}>
Send
</button>
{isGenerating && (
<button type="button" onClick={stop}>
Stop generating
</button>
)}
</form>
</main>
);
}The key detail is message.parts. Current AI SDK UI messages are not represented by one content string.
They contain typed parts, and a production interface should render each supported type explicitly. This
example handles text and safely ignores everything else.
The status value describes the request lifecycle:
| Status | Meaning |
|---|---|
submitted |
The user message was sent and the response has not started. |
streaming |
Response events are arriving. |
ready |
The chat can accept another message. |
error |
The most recent request failed. |
These states are more reliable than maintaining a separate isLoading flag. They also make it easy to
disable duplicate submissions and provide meaningful feedback.
What happens over the network
When the user presses Send, the sequence is:
useChat
→ POST /api/chat with the UI message history
→ convertToModelMessages(messages)
→ streamText({ model, messages })
→ AI SDK UI stream response
→ useChat merges each event into messages
→ React renders the latest text partThis is not a page refresh and it is not repeated polling. One HTTP request remains open while the server
sends a sequence of events. React re-renders as useChat applies those events to the current assistant
message.
Add useful server-side logging
Streaming changes when errors can occur. A route can return successfully and still encounter a provider
failure after generation begins. Use the callbacks on streamText for server-side telemetry:
const result = streamText({
model: process.env.AI_MODEL ?? 'openai/gpt-5-mini',
messages: await convertToModelMessages(body.messages),
abortSignal: request.signal,
onFinish({ finishReason, usage }) {
console.info('chat generation finished', {
finishReason,
inputTokens: usage.inputTokens,
outputTokens: usage.outputTokens,
});
},
onError({ error }) {
console.error('chat generation failed', { error });
},
});Avoid sending raw provider errors to the browser. They can contain implementation details and are rarely useful to the user. Log the full error on the server and show a stable, actionable message in the interface.
Passing request.signal as the abort signal allows cancellation to propagate when the browser disconnects
or the user presses Stop generating. Providers differ in how quickly they honor cancellation, so track
both cancelled requests and billed usage rather than assuming cancellation is instantaneous.
Common mistakes
The first version of a streaming chat often works locally but misses a few production constraints:
- Using an old
useChatexample. Current versions usesendMessage, typed message parts, and a transport-based API. Older tutorials commonly use hook-managed input state andhandleSubmit. - Rendering
message.content. Rendermessage.partsso the interface can evolve to support tools, files, reasoning, and custom data. - Exposing the provider key. Model calls belong in the server route, never in a client component.
- Accepting unlimited history. Long conversations increase latency and cost. Validate, trim, or summarize history before calling the model.
- Treating streaming as authentication. The route still needs authorization, rate limiting, abuse controls, and request-size limits.
- Ignoring disconnects. Propagate the request’s abort signal and record incomplete generations.
A sensible next step
Once basic text streaming is stable, add one capability at a time: persisted conversations, tool calls, message metadata, or custom data parts. The UI stream protocol supports these features without replacing the transport established here.
The official Next.js App Router guide,
chatbot guide, and
useChat reference document the current API and
additional options.