This report backs the read-aloud design with concrete provider samples.
Claude.ai uses server-side TTS over an authenticated WebSocket streaming
raw PCM into Web Audio; Tegy's current browser speechSynthesis
path should be replaced.
Recommended MVP Primary
Cloudflare Aura-2
Fast first audio, existing Cloudflare credentials, no new vendor.
Best Claude-Like Alternate
Gemini TTS PCM
OpenRouter + CF AIG gives clean audio/pcm at 24 kHz.
Cheapest Sample
Kokoro
Very low measured cost, but MP3 and quality needs human approval.
Quality Prompt
Tegy read aloud quality sample. This is a strategy memo excerpt: the
decision is whether to enter a new market now, or wait for stronger
evidence.
Listen To The Generated Samples
Cloudflare Workers AI / Deepgram Aura-2
@cf/deepgram/aura-2-en, voice luna
status 200first audio 344 ms10.2 s audio$0.00438 estimated
Fastest viable production path using existing Cloudflare credentials.
Requested linear16/container:none; Cloudflare's
REST response header was audio/mpeg, but bytes were
playable as raw signed 16-bit PCM and converted to WAV for this report.
Cloudflare Workers AI / Deepgram Aura-1
@cf/deepgram/aura-1, voice luna
status 200first audio 345 ms8.057 s audio$0.00219 estimated
Lower-cost Cloudflare option. Same header mismatch as Aura-2. Useful
fallback if quality is acceptable after listening.
Cloudflare AI Gateway -> OpenRouter Gemini TTS
google/gemini-3.1-flash-tts-preview, voice Kore
status 200first audio 1108 ms5.48 s audioCF AIG cache MISS
Most Claude-like existing OpenRouter path because it returns
audio/pcm;rate=24000;channels=1 and passes through
Cloudflare AI Gateway logging. Gateway log id:
01KW97Y5H8M68ZGGBWRQ4CE2B6.
OpenRouter Gemini TTS Direct
google/gemini-3.1-flash-tts-preview, voice Kore
status 200first audio 1625 ms7.92 s audio$0.003991 measured
Direct OpenRouter sample with generation metadata available after a
short delay. Useful for cost accounting; production should prefer the
CF AIG route for Gateway observability if we use Gemini.
OpenRouter Kokoro
hexgrad/kokoro-82m, voice af_heart
status 200first audio 1273 ms10.223 s audio$0.00009052 measured
Cost winner in this bakeoff, but it is MP3 rather than raw PCM. It
may be a cheap fallback if subjective quality is good enough.
status 200first audio 1631 ms10.56 s audio$0.003212 measured
MP3 output with measured cost close to Gemini direct on this sample.
Less aligned with Claude's PCM path and not faster in this run.
Measured Results
Provider route
Model / voice
Status
First audio
Total
Audio duration
Cost
Format
Cloudflare Workers AI REST
@cf/deepgram/aura-2-en / luna
200
344 ms
4065 ms
10.2 s
$0.00438 estimated
Requested PCM 16 kHz; header said audio/mpeg
Cloudflare Workers AI REST
@cf/deepgram/aura-1 / luna
200
345 ms
463 ms
8.057 s
$0.00219 estimated
Requested PCM 16 kHz; header said audio/mpeg
CF AI Gateway -> OpenRouter
google/gemini-3.1-flash-tts-preview / Kore
200
1108 ms
4080 ms
5.48 s
Gateway route did not return OpenRouter cost metadata
audio/pcm;rate=24000;channels=1
OpenRouter direct
google/gemini-3.1-flash-tts-preview / Kore
200
1625 ms
3915 ms
7.92 s
$0.003991 measured
audio/pcm;rate=24000;channels=1
OpenRouter direct
hexgrad/kokoro-82m / af_heart
200
1273 ms
1922 ms
10.223 s
$0.00009052 measured
audio/mpeg
OpenRouter direct
microsoft/mai-voice-2 / en-US-Harper:MAI-Voice-2
200
1631 ms
2311 ms
10.56 s
$0.003212 measured
audio/mpeg
OpenRouter direct
mistralai/voxtral-mini-tts-2603 / nova
404
391 ms to error
392 ms
n/a
n/a
Provider returned 404
Intelligibility Proxy
Each successful audio sample was converted to 16 kHz mono MP3 and
transcribed with Cloudflare Workers AI @cf/deepgram/nova-3.
This is not a replacement for human voice-quality review, but it catches
missing or garbled spoken content.
Sample
STT status
STT latency
Word-error proxy
Transcript
Cloudflare Aura-2
200
572 ms
0.077
take a read aloud quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
Cloudflare Aura-1
200
340 ms
0.077
tachy read aloud quality sample this is a strategy memo search the decision is whether to enter a new market now or wait for stronger evidence
CF AIG -> OpenRouter Gemini TTS
200
264 ms
0.423
the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter Gemini TTS Direct
200
292 ms
0.308
the strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter Kokoro
200
300 ms
0.115
teddy red allowed quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter MAI-Voice-2
200
429 ms
0.038
peggy read aloud quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
Implementation Decision
Recommended next implementation
Build a Tegy-authenticated WebSocket that fetches the stored assistant
message server-side, streams provider audio, and feeds a Web Audio PCM
player. Start with Cloudflare Aura-2 as primary. Keep OpenRouter
Gemini TTS as a PCM alternate only after resolving the observed clipped
opening / text-omission behavior. Do not keep browser
speechSynthesis as the normal product path.
Use Tegy-managed R2 caching if replay cost matters. Cloudflare AI
Gateway successfully logged OpenRouter TTS, but identical binary TTS
cache probes stayed MISS.