Tegy Model Cost & Intelligence Delta
Executive comparison of Qwen, current Tegy runtime, Gemini Flash-Lite, GLM 5.2, latest Mistral, and latest Anthropic Sonnet for Tegy's StrategyOS-heavy agent workload.
Executive Read
Tegy is currently on qwen/qwen3.7-plus, not Gemini Lite.
Qwen and Gemini Flash-Lite are effectively the low-cost tier. Gemini
Flash-Lite is slightly cheaper on input but slightly more expensive
on output; for output-heavy StrategyOS turns, Qwen can still be a
little cheaper. GLM 5.2 sits between the cheap tier and the premium
tier: roughly 3.1x Qwen token price, 1M context, but text-only in
the current OpenRouter catalog.
Mistral Medium 3.5 is meaningfully more expensive, with a smaller 256K context window, but is positioned as stronger for agentic, coding, and multi-tool workflows. Sonnet 4.6 remains the quality benchmark and likely best behavioral fit, but it is roughly 9-12x Qwen's token price and should be treated as a benchmark/rescue model, not the default, unless we accept a major gross-margin hit.
Recommendation: keep Qwen as the production default for now, run a controlled bakeoff against Gemini 3.1 Flash-Lite and Mistral Medium 3.5 on real Tegy tasks, and use Sonnet 4.6 only as the gold-standard comparator or explicit premium path.
Model Cards
Qwen3.7 Plus
currentqwen/qwen3.7-plus
- Price: $0.32 / $1.28 per 1M input/output tokens.
- Context: 1M tokens.
- Vision: yes, text+image input.
- OpenRouter modalities: text, image → text.
Gemini 3.1 Flash-Lite
cheapest inputgoogle/gemini-3.1-flash-lite
- Price: $0.25 / $1.50 per 1M input/output tokens.
- Context: 1.05M tokens.
- Vision: yes, plus file/audio/video input.
- Best use: high-volume, latency/cost-sensitive work.
Mistral Medium 3.5
agentic contendermistralai/mistral-medium-3-5
- Price: $1.50 / $7.50 per 1M input/output tokens.
- Context: 256K tokens.
- Vision: yes, text+image+file input.
- Best use: focused agentic/tool tasks where context fits.
GLM 5.2
text-only challengerz-ai/glm-5.2
- Price: $1.00 / $4.00 per 1M input/output tokens.
- Context: 1.05M tokens.
- Vision: no in the current catalog; text-only input.
- Best use: text-heavy long-context strategy runs if behavior is strong enough.
Claude Sonnet 4.6
premiumanthropic/claude-sonnet-4.6
- Price: $3.00 / $15.00 per 1M input/output tokens.
- Context: 1M tokens on current catalog/API surfaces.
- Vision: yes, text+image+file input.
- Best use: quality benchmark, premium/rescue mode, hard reasoning.
Cost Deltas
| Model | Input / 1M | Output / 1M | Input vs Qwen | Output vs Qwen | 1M in + 200K out | 1M in + 500K out |
|---|---|---|---|---|---|---|
| Qwen3.7 Plus | $0.32 | $1.28 | 1.00x | 1.00x | $0.576 | $0.960 |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 0.78x | 1.17x | $0.550 | $1.000 |
| Mistral Medium 3.5 | $1.50 | $7.50 | 4.69x | 5.86x | $3.000 | $5.250 |
| GLM 5.2 | $1.00 | $4.00 | 3.13x | 3.13x | $1.800 | $3.000 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 9.38x | 11.72x | $6.000 | $10.500 |
Tegy is token hungry. On long StrategyOS turns, output tokens and repeated context dominate. Qwen and Gemini Flash-Lite are in the same economic band; Mistral and Sonnet change the unit economics materially.
Capabilities & Vision
| Model | Catalog modality | Vision | File / PDF suitability | Context risk |
|---|---|---|---|---|
| Qwen3.7 Plus | text+image → text | Yes | Images yes; file/PDF handling depends on Tegy's extraction/attachment path. | Low: 1M context. |
| Gemini 3.1 Flash-Lite | text+image+file+audio+video → text | Yes | Best broad multimodal coverage in this comparison. | Low: ~1.05M context. |
| Mistral Medium 3.5 | text+image+file → text | Yes | Good fit for screenshots/docs if the turn fits 256K context. | Medium/high: 256K context may be tight for StrategyOS-heavy runs. |
| GLM 5.2 | text → text | No | Not suitable for image/screenshot turns unless Tegy extracts all content to text first. | Low: ~1.05M context. |
| Claude Sonnet 4.6 | text+image+file → text | Yes | Strong fit for docs, images, and nuanced strategy reasoning. | Low: 1M context. |
Intelligence / Behavior Delta
Practical ranking for Tegy
- Claude Sonnet 4.6: best expected instruction following, tool-use behavior, and consultant-quality narrative; too expensive as default.
- Mistral Medium 3.5: plausible agentic/tool-use upgrade over Qwen/Gemini; context window is the main concern.
- GLM 5.2: plausible long-context text-only challenger; useful only if attachment extraction covers the workflow because it lacks vision.
- Qwen3.7 Plus: current baseline; strong economics and 1M context; acceptable default if behavior is good enough.
- Gemini 3.1 Flash-Lite: best broad multimodal/cost candidate; likely weaker for nuanced strategy judgment than Sonnet and probably weaker than Mistral for long-horizon agentic work.
What this means operationally
- If we optimize for gross margin: keep Qwen or test Gemini Flash-Lite.
- If we optimize for text-only long-context runs: test GLM 5.2, but do not use it for image/screenshot workflows.
- If we optimize for capability at moderate cost: test Mistral Medium 3.5, but only on tasks under 256K context.
- If we optimize for best possible output: Sonnet 4.6 wins, but should be premium/rescue/benchmark because of price.
- If vision/file support matters broadly: Gemini Flash-Lite has the widest listed input modalities.
Recommendation
- Do not switch default directly to Sonnet. Use it as the benchmark and maybe a paid premium/rescue tier.
- Keep Qwen as current default until a live bakeoff says otherwise. It is cheap, image-capable, and has 1M context.
- Run Gemini Flash-Lite as the first challenger. It is economically close to Qwen, has broader multimodal support, and could improve PDF/file-heavy flows if Claude SDK/OpenRouter compatibility is stable.
- Run GLM 5.2 as the text-only long-context challenger. It is cheaper than Mistral/Sonnet and has 1M context, but it cannot cover vision turns.
- Run Mistral Medium 3.5 as the quality challenger. It may improve tool-use/agentic reliability, but its 256K context makes it risky for Tegy's longest strategy runs.
- Measure on real Tegy tasks, not generic benchmarks. Use Sprinta PDF/DOCX, questionnaire emergence, StrategyOS tool use, multi-agent behavior, artifact generation, cost, latency, and CF AIG cache behavior.
Proposed Bakeoff Criteria
| Criterion | Why it matters | Pass signal |
|---|---|---|
| StrategyOS usage | Tegy is not a generic chatbot. | Model naturally invokes/uses StrategyOS structures where appropriate. |
| Questionnaire behavior | Consultant-like intake is core UX. | Asks useful questions without fake/forced UI responses. |
| Document analysis | PDF/DOCX workflows are central. | Correctly handles Sprinta PDF and DOCX tests. |
| Vision | User screenshots and deck images matter. | Accepts image attachments through current route without provider errors. |
| Long-turn stability | StrategyOS can run for many minutes and many tokens. | No premature no-output, budget, or context failures under realistic cap. |
| Cost per successful answer | Raw token price is not enough. | Cheaper model is only better if success rate and retries stay acceptable. |
Sources
- Local runtime check:
CLAUDE_AGENT_MODEL=qwen/qwen3.7-plus;TEGY_PROVIDER_TURN_PRICE_CAP_USD=30. - OpenRouter public model catalog API: https://openrouter.ai/api/v1/models
- Qwen3.7 Plus OpenRouter listing: https://openrouter.ai/qwen/qwen3.7-plus
- Gemini 3.1 Flash-Lite OpenRouter listing: https://openrouter.ai/google/gemini-3.1-flash-lite
- Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing
- GLM 5.2 OpenRouter listing: https://openrouter.ai/z-ai/glm-5.2
- Mistral models overview: https://docs.mistral.ai/models/overview
- Mistral Medium 3.5 OpenRouter listing: https://openrouter.ai/mistralai/mistral-medium-3-5
- Claude pricing and model overview: https://docs.anthropic.com/en/docs/about-claude/pricing, https://docs.anthropic.com/en/docs/about-claude/models/overview