Hamza Yerrou
Hamza Yerrou
Software Engineer
2025 · 12 · 11 · 8 min read

Designing AI chat UIs that don't lie about what's happening

Streaming, retries, partial failures, 'the model is thinking' — every honest pattern I've shipped.

The AI chat UIs I trust most share one quality: they never create a false impression of what's happening. Not through explicit deception. Through the gap between what the interface suggests and what's actually going on.

I've built several streaming chat interfaces for ChainsAtlas, with an Express.js proxy between the front-end and OpenAI. Here's what I've learned about being honest with users in real time.

There are more states than you think

Most implementations treat the response as binary: loading or done. In practice there are at least six distinct states, and each one needs different UI:

hljs ts
type ChatStatus = | 'idle' // no request in flight | 'waiting' // request sent, first token not yet received | 'streaming' // tokens arriving | 'done' // complete response received | 'partial' // stream cut mid-way through | 'error' // request failed entirely

The difference between waiting and streaming is worth showing explicitly. Waiting is pure latency — show a pulsing indicator. Streaming means tokens are arriving — show a cursor after the last character. Users feel this difference even when they can't describe it.

Streaming with fetch

hljs ts
async function streamChat( messages: Message[], onChunk: (text: string) => void, onDone: () => void, onError: (err: Error) => void ) { let response: Response try { response = await fetch('/api/chat', { method: 'POST', body: JSON.stringify({ messages }), headers: { 'Content-Type': 'application/json' }, }) } catch { onError(new Error('Network request failed')) return } const reader = response.body!.getReader() const decoder = new TextDecoder() try { while (true) { const { done, value } = await reader.read() if (done) break const chunk = decoder.decode(value, { stream: true }) for (const line of chunk.split('\n')) { if (!line.startsWith('data: ')) continue const data = line.slice(6) if (data === '[DONE]') { onDone(); return } const delta = JSON.parse(data)?.choices?.[0]?.delta?.content if (delta) onChunk(delta) } } onDone() } catch { // Stream cut mid-way — partial response, not total failure onError(new Error('partial')) } }

The error thrown after the stream starts is a partial failure. Treat it differently from a clean fetch error.

When the stream cuts

It happens on congested networks, after rate limits, after server timeouts. The wrong response: clear the partial text and show a generic error. You've destroyed useful content and given the user no way to continue.

Keep the partial text visible. Mark it as incomplete with a subtle indicator. Offer a "Continue" button that passes the existing text back as context:

hljs ts
// Pass the truncated response back as context for the next request const continuationMessages = [ ...previousMessages, { role: 'assistant', content: partialResponse }, { role: 'user', content: 'Please continue from where you left off.' }, ]

You already have the text. Using it is four lines of code.

Error messages that actually help

OpenAI errors have types. Use them:

hljs ts
function getChatError(status: number, retryAfter?: string): string { switch (status) { case 429: return `Rate limited — try again in ${retryAfter ?? '30'} seconds` case 413: return 'Conversation too long. Start a new chat or summarise the last few messages.' case 400: return 'This message was flagged. Try rephrasing it.' default: return 'Something went wrong. Try again in a moment.' } }

"An error occurred" is not helpful. The user wants to know whether it's worth retrying. A 429 is. A 400 with a content policy hit is not (without changing the message). Give them that information.

The "thinking" state

Models with extended reasoning can spend 10, 30, even 60 seconds in a reasoning phase before generating output. If your UI just shows a spinner, you're hiding real progress.

If your API surfaces thinking tokens, show them: visually distinct, clearly labelled as reasoning, collapsible. If it doesn't, at minimum show elapsed time. "Thinking... 34s" is more honest than a static spinner. Users wait longer if they believe something is actually happening.

The principle behind all of it

AI interfaces create unusual expectations. Users know the model is capable but also unpredictable. The UI's job is to calibrate those expectations accurately, not paper over the reality of how these systems work.

Latency is real. Show it. Failures happen. Show them clearly and specifically. Partial responses are useful. Preserve them. The interfaces users trust most are the ones that tell the truth about what's happening at every moment.

← Older
The case for boring tests in Cypress
Newer →
How I quote a freelance project in 48 hours