The AI chat UIs I trust most share one quality: they never create a false impression of what's happening. Not through explicit deception. Through the gap between what the interface suggests and what's actually going on.
I've built several streaming chat interfaces for ChainsAtlas, with an Express.js proxy between the front-end and OpenAI. Here's what I've learned about being honest with users in real time.
There are more states than you think
Most implementations treat the response as binary: loading or done. In practice there are at least six distinct states, and each one needs different UI:
hljs tstype ChatStatus =
| 'idle' // no request in flight
| 'waiting' // request sent, first token not yet received
| 'streaming' // tokens arriving
| 'done' // complete response received
| 'partial' // stream cut mid-way through
| 'error' // request failed entirely
The difference between waiting and streaming is worth showing explicitly. Waiting is pure latency — show a pulsing indicator. Streaming means tokens are arriving — show a cursor after the last character. Users feel this difference even when they can't describe it.
Streaming with fetch
hljs tsasync function streamChat(
messages: Message[],
onChunk: (text: string) => void,
onDone: () => void,
onError: (err: Error) => void
) {
let response: Response
try {
response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ messages }),
headers: { 'Content-Type': 'application/json' },
})
} catch {
onError(new Error('Network request failed'))
return
}
const reader = response.body!.getReader()
const decoder = new TextDecoder()
try {
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
for (const line of chunk.split('\n')) {
if (!line.startsWith('data: ')) continue
const data = line.slice(6)
if (data === '[DONE]') { onDone(); return }
const delta = JSON.parse(data)?.choices?.[0]?.delta?.content
if (delta) onChunk(delta)
}
}
onDone()
} catch {
// Stream cut mid-way — partial response, not total failure
onError(new Error('partial'))
}
}
The error thrown after the stream starts is a partial failure. Treat it differently from a clean fetch error.
When the stream cuts
It happens on congested networks, after rate limits, after server timeouts. The wrong response: clear the partial text and show a generic error. You've destroyed useful content and given the user no way to continue.
Keep the partial text visible. Mark it as incomplete with a subtle indicator. Offer a "Continue" button that passes the existing text back as context:
hljs ts// Pass the truncated response back as context for the next request
const continuationMessages = [
...previousMessages,
{ role: 'assistant', content: partialResponse },
{ role: 'user', content: 'Please continue from where you left off.' },
]
You already have the text. Using it is four lines of code.
Error messages that actually help
OpenAI errors have types. Use them:
hljs tsfunction getChatError(status: number, retryAfter?: string): string {
switch (status) {
case 429:
return `Rate limited — try again in ${retryAfter ?? '30'} seconds`
case 413:
return 'Conversation too long. Start a new chat or summarise the last few messages.'
case 400:
return 'This message was flagged. Try rephrasing it.'
default:
return 'Something went wrong. Try again in a moment.'
}
}
"An error occurred" is not helpful. The user wants to know whether it's worth retrying. A 429 is. A 400 with a content policy hit is not (without changing the message). Give them that information.
The "thinking" state
Models with extended reasoning can spend 10, 30, even 60 seconds in a reasoning phase before generating output. If your UI just shows a spinner, you're hiding real progress.
If your API surfaces thinking tokens, show them: visually distinct, clearly labelled as reasoning, collapsible. If it doesn't, at minimum show elapsed time. "Thinking... 34s" is more honest than a static spinner. Users wait longer if they believe something is actually happening.
The principle behind all of it
AI interfaces create unusual expectations. Users know the model is capable but also unpredictable. The UI's job is to calibrate those expectations accurately, not paper over the reality of how these systems work.
Latency is real. Show it. Failures happen. Show them clearly and specifically. Partial responses are useful. Preserve them. The interfaces users trust most are the ones that tell the truth about what's happening at every moment.
