Tegy Read-Aloud TTS Evidence

This report backs the read-aloud design with concrete provider samples. Claude.ai uses server-side TTS over an authenticated WebSocket streaming raw PCM into Web Audio; Tegy's current browser speechSynthesis path should be replaced.

Recommended MVP Primary

Cloudflare Aura-2

Fast first audio, existing Cloudflare credentials, no new vendor.

Best Claude-Like Alternate

Gemini TTS PCM

OpenRouter + CF AIG gives clean audio/pcm at 24 kHz.

Cheapest Sample

Kokoro

Very low measured cost, but MP3 and quality needs human approval.

Quality Prompt

Tegy read aloud quality sample. This is a strategy memo excerpt: the decision is whether to enter a new market now, or wait for stronger evidence.

Listen To The Generated Samples

Cloudflare Workers AI / Deepgram Aura-2

@cf/deepgram/aura-2-en, voice luna

status 200 first audio 344 ms 10.2 s audio $0.00438 estimated

Fastest viable production path using existing Cloudflare credentials. Requested linear16/container:none; Cloudflare's REST response header was audio/mpeg, but bytes were playable as raw signed 16-bit PCM and converted to WAV for this report.

Cloudflare Workers AI / Deepgram Aura-1

@cf/deepgram/aura-1, voice luna

status 200 first audio 345 ms 8.057 s audio $0.00219 estimated

Lower-cost Cloudflare option. Same header mismatch as Aura-2. Useful fallback if quality is acceptable after listening.

Cloudflare AI Gateway -> OpenRouter Gemini TTS

google/gemini-3.1-flash-tts-preview, voice Kore

status 200 first audio 1108 ms 5.48 s audio CF AIG cache MISS

Most Claude-like existing OpenRouter path because it returns audio/pcm;rate=24000;channels=1 and passes through Cloudflare AI Gateway logging. Gateway log id: 01KW97Y5H8M68ZGGBWRQ4CE2B6.

OpenRouter Gemini TTS Direct

google/gemini-3.1-flash-tts-preview, voice Kore

status 200 first audio 1625 ms 7.92 s audio $0.003991 measured

Direct OpenRouter sample with generation metadata available after a short delay. Useful for cost accounting; production should prefer the CF AIG route for Gateway observability if we use Gemini.

OpenRouter Kokoro

hexgrad/kokoro-82m, voice af_heart

status 200 first audio 1273 ms 10.223 s audio $0.00009052 measured

Cost winner in this bakeoff, but it is MP3 rather than raw PCM. It may be a cheap fallback if subjective quality is good enough.

OpenRouter MAI-Voice-2

microsoft/mai-voice-2, voice en-US-Harper:MAI-Voice-2

status 200 first audio 1631 ms 10.56 s audio $0.003212 measured

MP3 output with measured cost close to Gemini direct on this sample. Less aligned with Claude's PCM path and not faster in this run.

Measured Results

Provider route	Model / voice	Status	First audio	Total	Audio duration	Cost	Format
Cloudflare Workers AI REST	`@cf/deepgram/aura-2-en` / `luna`	200	344 ms	4065 ms	10.2 s	$0.00438 estimated	Requested PCM 16 kHz; header said `audio/mpeg`
Cloudflare Workers AI REST	`@cf/deepgram/aura-1` / `luna`	200	345 ms	463 ms	8.057 s	$0.00219 estimated	Requested PCM 16 kHz; header said `audio/mpeg`
CF AI Gateway -> OpenRouter	`google/gemini-3.1-flash-tts-preview` / `Kore`	200	1108 ms	4080 ms	5.48 s	Gateway route did not return OpenRouter cost metadata	`audio/pcm;rate=24000;channels=1`
OpenRouter direct	`google/gemini-3.1-flash-tts-preview` / `Kore`	200	1625 ms	3915 ms	7.92 s	$0.003991 measured	`audio/pcm;rate=24000;channels=1`
OpenRouter direct	`hexgrad/kokoro-82m` / `af_heart`	200	1273 ms	1922 ms	10.223 s	$0.00009052 measured	`audio/mpeg`
OpenRouter direct	`microsoft/mai-voice-2` / `en-US-Harper:MAI-Voice-2`	200	1631 ms	2311 ms	10.56 s	$0.003212 measured	`audio/mpeg`
OpenRouter direct	`mistralai/voxtral-mini-tts-2603` / `nova`	404	391 ms to error	392 ms	n/a	n/a	Provider returned 404

Intelligibility Proxy

Each successful audio sample was converted to 16 kHz mono MP3 and transcribed with Cloudflare Workers AI @cf/deepgram/nova-3. This is not a replacement for human voice-quality review, but it catches missing or garbled spoken content.

Sample	STT status	STT latency	Word-error proxy	Transcript
Cloudflare Aura-2	200	572 ms	0.077	take a read aloud quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
Cloudflare Aura-1	200	340 ms	0.077	tachy read aloud quality sample this is a strategy memo search the decision is whether to enter a new market now or wait for stronger evidence
CF AIG -> OpenRouter Gemini TTS	200	264 ms	0.423	the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter Gemini TTS Direct	200	292 ms	0.308	the strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter Kokoro	200	300 ms	0.115	teddy red allowed quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter MAI-Voice-2	200	429 ms	0.038	peggy read aloud quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence

Implementation Decision

Recommended next implementation

Build a Tegy-authenticated WebSocket that fetches the stored assistant message server-side, streams provider audio, and feeds a Web Audio PCM player. Start with Cloudflare Aura-2 as primary. Keep OpenRouter Gemini TTS as a PCM alternate only after resolving the observed clipped opening / text-omission behavior. Do not keep browser speechSynthesis as the normal product path.

Use Tegy-managed R2 caching if replay cost matters. Cloudflare AI Gateway successfully logged OpenRouter TTS, but identical binary TTS cache probes stayed MISS.

Sources

Measurements file: assets/measurements.json. Generated locally on 2026-06-29. Audio samples are real provider outputs, not mock audio.