Tegy Model Cost & Intelligence Delta

Executive comparison of Qwen, current Tegy runtime, Gemini Flash-Lite, GLM 5.2, latest Mistral, and latest Anthropic Sonnet for Tegy's StrategyOS-heavy agent workload.

Date 2026-06-22 Current runtime qwen/qwen3.7-plus Provider route OpenRouter via Cloudflare AI Gateway Turn cap observed locally $30

Executive Read

Tegy is currently on qwen/qwen3.7-plus, not Gemini Lite. Qwen and Gemini Flash-Lite are effectively the low-cost tier. Gemini Flash-Lite is slightly cheaper on input but slightly more expensive on output; for output-heavy StrategyOS turns, Qwen can still be a little cheaper. GLM 5.2 sits between the cheap tier and the premium tier: roughly 3.1x Qwen token price, 1M context, but text-only in the current OpenRouter catalog.

Mistral Medium 3.5 is meaningfully more expensive, with a smaller 256K context window, but is positioned as stronger for agentic, coding, and multi-tool workflows. Sonnet 4.6 remains the quality benchmark and likely best behavioral fit, but it is roughly 9-12x Qwen's token price and should be treated as a benchmark/rescue model, not the default, unless we accept a major gross-margin hit.

Recommendation: keep Qwen as the production default for now, run a controlled bakeoff against Gemini 3.1 Flash-Lite and Mistral Medium 3.5 on real Tegy tasks, and use Sonnet 4.6 only as the gold-standard comparator or explicit premium path.

Model Cards

Qwen3.7 Plus

current

qwen/qwen3.7-plus

Price: $0.32 / $1.28 per 1M input/output tokens.
Context: 1M tokens.
Vision: yes, text+image input.
OpenRouter modalities: text, image → text.

Cost fit

Likely quality

Gemini 3.1 Flash-Lite

cheapest input

google/gemini-3.1-flash-lite

Price: $0.25 / $1.50 per 1M input/output tokens.
Context: 1.05M tokens.
Vision: yes, plus file/audio/video input.
Best use: high-volume, latency/cost-sensitive work.

Cost fit

Likely quality

Mistral Medium 3.5

agentic contender

mistralai/mistral-medium-3-5

Price: $1.50 / $7.50 per 1M input/output tokens.
Context: 256K tokens.
Vision: yes, text+image+file input.
Best use: focused agentic/tool tasks where context fits.

Cost fit

Likely quality

GLM 5.2

text-only challenger

z-ai/glm-5.2

Price: $1.00 / $4.00 per 1M input/output tokens.
Context: 1.05M tokens.
Vision: no in the current catalog; text-only input.
Best use: text-heavy long-context strategy runs if behavior is strong enough.

Cost fit

Likely quality

Claude Sonnet 4.6

premium

anthropic/claude-sonnet-4.6

Price: $3.00 / $15.00 per 1M input/output tokens.
Context: 1M tokens on current catalog/API surfaces.
Vision: yes, text+image+file input.
Best use: quality benchmark, premium/rescue mode, hard reasoning.

Cost fit

Likely quality

Cost Deltas

Model	Input / 1M	Output / 1M	Input vs Qwen	Output vs Qwen	1M in + 200K out	1M in + 500K out
Qwen3.7 Plus	$0.32	$1.28	1.00x	1.00x	$0.576	$0.960
Gemini 3.1 Flash-Lite	$0.25	$1.50	0.78x	1.17x	$0.550	$1.000
Mistral Medium 3.5	$1.50	$7.50	4.69x	5.86x	$3.000	$5.250
GLM 5.2	$1.00	$4.00	3.13x	3.13x	$1.800	$3.000
Claude Sonnet 4.6	$3.00	$15.00	9.38x	11.72x	$6.000	$10.500

Tegy is token hungry. On long StrategyOS turns, output tokens and repeated context dominate. Qwen and Gemini Flash-Lite are in the same economic band; Mistral and Sonnet change the unit economics materially.

Capabilities & Vision

Model	Catalog modality	Vision	File / PDF suitability	Context risk
Qwen3.7 Plus	text+image → text	Yes	Images yes; file/PDF handling depends on Tegy's extraction/attachment path.	Low: 1M context.
Gemini 3.1 Flash-Lite	text+image+file+audio+video → text	Yes	Best broad multimodal coverage in this comparison.	Low: ~1.05M context.
Mistral Medium 3.5	text+image+file → text	Yes	Good fit for screenshots/docs if the turn fits 256K context.	Medium/high: 256K context may be tight for StrategyOS-heavy runs.
GLM 5.2	text → text	No	Not suitable for image/screenshot turns unless Tegy extracts all content to text first.	Low: ~1.05M context.
Claude Sonnet 4.6	text+image+file → text	Yes	Strong fit for docs, images, and nuanced strategy reasoning.	Low: 1M context.

Intelligence / Behavior Delta

Practical ranking for Tegy

Claude Sonnet 4.6: best expected instruction following, tool-use behavior, and consultant-quality narrative; too expensive as default.
Mistral Medium 3.5: plausible agentic/tool-use upgrade over Qwen/Gemini; context window is the main concern.
GLM 5.2: plausible long-context text-only challenger; useful only if attachment extraction covers the workflow because it lacks vision.
Qwen3.7 Plus: current baseline; strong economics and 1M context; acceptable default if behavior is good enough.
Gemini 3.1 Flash-Lite: best broad multimodal/cost candidate; likely weaker for nuanced strategy judgment than Sonnet and probably weaker than Mistral for long-horizon agentic work.

What this means operationally

If we optimize for gross margin: keep Qwen or test Gemini Flash-Lite.
If we optimize for text-only long-context runs: test GLM 5.2, but do not use it for image/screenshot workflows.
If we optimize for capability at moderate cost: test Mistral Medium 3.5, but only on tasks under 256K context.
If we optimize for best possible output: Sonnet 4.6 wins, but should be premium/rescue/benchmark because of price.
If vision/file support matters broadly: Gemini Flash-Lite has the widest listed input modalities.

Recommendation

Do not switch default directly to Sonnet. Use it as the benchmark and maybe a paid premium/rescue tier.
Keep Qwen as current default until a live bakeoff says otherwise. It is cheap, image-capable, and has 1M context.
Run Gemini Flash-Lite as the first challenger. It is economically close to Qwen, has broader multimodal support, and could improve PDF/file-heavy flows if Claude SDK/OpenRouter compatibility is stable.
Run GLM 5.2 as the text-only long-context challenger. It is cheaper than Mistral/Sonnet and has 1M context, but it cannot cover vision turns.
Run Mistral Medium 3.5 as the quality challenger. It may improve tool-use/agentic reliability, but its 256K context makes it risky for Tegy's longest strategy runs.
Measure on real Tegy tasks, not generic benchmarks. Use Sprinta PDF/DOCX, questionnaire emergence, StrategyOS tool use, multi-agent behavior, artifact generation, cost, latency, and CF AIG cache behavior.

Proposed Bakeoff Criteria

Criterion	Why it matters	Pass signal
StrategyOS usage	Tegy is not a generic chatbot.	Model naturally invokes/uses StrategyOS structures where appropriate.
Questionnaire behavior	Consultant-like intake is core UX.	Asks useful questions without fake/forced UI responses.
Document analysis	PDF/DOCX workflows are central.	Correctly handles Sprinta PDF and DOCX tests.
Vision	User screenshots and deck images matter.	Accepts image attachments through current route without provider errors.
Long-turn stability	StrategyOS can run for many minutes and many tokens.	No premature no-output, budget, or context failures under realistic cap.
Cost per successful answer	Raw token price is not enough.	Cheaper model is only better if success rate and retries stay acceptable.

Sources

Local runtime check: CLAUDE_AGENT_MODEL=qwen/qwen3.7-plus; TEGY_PROVIDER_TURN_PRICE_CAP_USD=30.
OpenRouter public model catalog API: https://openrouter.ai/api/v1/models
Qwen3.7 Plus OpenRouter listing: https://openrouter.ai/qwen/qwen3.7-plus
Gemini 3.1 Flash-Lite OpenRouter listing: https://openrouter.ai/google/gemini-3.1-flash-lite
Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing
GLM 5.2 OpenRouter listing: https://openrouter.ai/z-ai/glm-5.2
Mistral models overview: https://docs.mistral.ai/models/overview
Mistral Medium 3.5 OpenRouter listing: https://openrouter.ai/mistralai/mistral-medium-3-5
Claude pricing and model overview: https://docs.anthropic.com/en/docs/about-claude/pricing, https://docs.anthropic.com/en/docs/about-claude/models/overview