Tegy Read-Aloud TTS Evidence

This report backs the read-aloud design with concrete provider samples. Claude.ai uses server-side TTS over an authenticated WebSocket streaming raw PCM into Web Audio; Tegy's current browser speechSynthesis path should be replaced.

Recommended MVP Primary

Cloudflare Aura-2

Fast first audio, existing Cloudflare credentials, no new vendor.

Best Claude-Like Alternate

Gemini TTS PCM

OpenRouter + CF AIG gives clean audio/pcm at 24 kHz.

Cheapest Sample

Kokoro

Very low measured cost, but MP3 and quality needs human approval.

Quality Prompt

Tegy read aloud quality sample. This is a strategy memo excerpt: the decision is whether to enter a new market now, or wait for stronger evidence.

Listen To The Generated Samples

Cloudflare Workers AI / Deepgram Aura-2

@cf/deepgram/aura-2-en, voice luna

status 200 first audio 344 ms 10.2 s audio $0.00438 estimated

Fastest viable production path using existing Cloudflare credentials. Requested linear16/container:none; Cloudflare's REST response header was audio/mpeg, but bytes were playable as raw signed 16-bit PCM and converted to WAV for this report.

Cloudflare Workers AI / Deepgram Aura-1

@cf/deepgram/aura-1, voice luna

status 200 first audio 345 ms 8.057 s audio $0.00219 estimated

Lower-cost Cloudflare option. Same header mismatch as Aura-2. Useful fallback if quality is acceptable after listening.

Cloudflare AI Gateway -> OpenRouter Gemini TTS

google/gemini-3.1-flash-tts-preview, voice Kore

status 200 first audio 1108 ms 5.48 s audio CF AIG cache MISS

Most Claude-like existing OpenRouter path because it returns audio/pcm;rate=24000;channels=1 and passes through Cloudflare AI Gateway logging. Gateway log id: 01KW97Y5H8M68ZGGBWRQ4CE2B6.

OpenRouter Gemini TTS Direct

google/gemini-3.1-flash-tts-preview, voice Kore

status 200 first audio 1625 ms 7.92 s audio $0.003991 measured

Direct OpenRouter sample with generation metadata available after a short delay. Useful for cost accounting; production should prefer the CF AIG route for Gateway observability if we use Gemini.

OpenRouter Kokoro

hexgrad/kokoro-82m, voice af_heart

status 200 first audio 1273 ms 10.223 s audio $0.00009052 measured

Cost winner in this bakeoff, but it is MP3 rather than raw PCM. It may be a cheap fallback if subjective quality is good enough.

OpenRouter MAI-Voice-2

microsoft/mai-voice-2, voice en-US-Harper:MAI-Voice-2

status 200 first audio 1631 ms 10.56 s audio $0.003212 measured

MP3 output with measured cost close to Gemini direct on this sample. Less aligned with Claude's PCM path and not faster in this run.

Measured Results

Provider route Model / voice Status First audio Total Audio duration Cost Format
Cloudflare Workers AI REST @cf/deepgram/aura-2-en / luna 200 344 ms 4065 ms 10.2 s $0.00438 estimated Requested PCM 16 kHz; header said audio/mpeg
Cloudflare Workers AI REST @cf/deepgram/aura-1 / luna 200 345 ms 463 ms 8.057 s $0.00219 estimated Requested PCM 16 kHz; header said audio/mpeg
CF AI Gateway -> OpenRouter google/gemini-3.1-flash-tts-preview / Kore 200 1108 ms 4080 ms 5.48 s Gateway route did not return OpenRouter cost metadata audio/pcm;rate=24000;channels=1
OpenRouter direct google/gemini-3.1-flash-tts-preview / Kore 200 1625 ms 3915 ms 7.92 s $0.003991 measured audio/pcm;rate=24000;channels=1
OpenRouter direct hexgrad/kokoro-82m / af_heart 200 1273 ms 1922 ms 10.223 s $0.00009052 measured audio/mpeg
OpenRouter direct microsoft/mai-voice-2 / en-US-Harper:MAI-Voice-2 200 1631 ms 2311 ms 10.56 s $0.003212 measured audio/mpeg
OpenRouter direct mistralai/voxtral-mini-tts-2603 / nova 404 391 ms to error 392 ms n/a n/a Provider returned 404

Intelligibility Proxy

Each successful audio sample was converted to 16 kHz mono MP3 and transcribed with Cloudflare Workers AI @cf/deepgram/nova-3. This is not a replacement for human voice-quality review, but it catches missing or garbled spoken content.

Sample STT status STT latency Word-error proxy Transcript
Cloudflare Aura-2 200 572 ms 0.077 take a read aloud quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
Cloudflare Aura-1 200 340 ms 0.077 tachy read aloud quality sample this is a strategy memo search the decision is whether to enter a new market now or wait for stronger evidence
CF AIG -> OpenRouter Gemini TTS 200 264 ms 0.423 the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter Gemini TTS Direct 200 292 ms 0.308 the strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter Kokoro 200 300 ms 0.115 teddy red allowed quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence
OpenRouter MAI-Voice-2 200 429 ms 0.038 peggy read aloud quality sample this is a strategy memo excerpt the decision is whether to enter a new market now or wait for stronger evidence

Implementation Decision

Recommended next implementation

Build a Tegy-authenticated WebSocket that fetches the stored assistant message server-side, streams provider audio, and feeds a Web Audio PCM player. Start with Cloudflare Aura-2 as primary. Keep OpenRouter Gemini TTS as a PCM alternate only after resolving the observed clipped opening / text-omission behavior. Do not keep browser speechSynthesis as the normal product path.

Use Tegy-managed R2 caching if replay cost matters. Cloudflare AI Gateway successfully logged OpenRouter TTS, but identical binary TTS cache probes stayed MISS.

Sources

Measurements file: assets/measurements.json. Generated locally on 2026-06-29. Audio samples are real provider outputs, not mock audio.