|
| 1 | +--- |
| 2 | +title: "Model Selection Guide" |
| 3 | +sidebarTitle: "Choose the Right Model" |
| 4 | +description: |
| 5 | + "Select the optimal model for your agent based on your goals and use case." |
| 6 | +--- |
| 7 | + |
| 8 | +Choosing the right model is essential to building effective agents. This guide |
| 9 | +helps you evaluate trade-offs, pick the right model for your use case, and |
| 10 | +iterate quickly. |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +## Key considerations |
| 15 | + |
| 16 | +- **Accuracy and output quality:** Advanced logic, mathematical problem-solving, |
| 17 | + and multi-step analysis may require high-capability models. |
| 18 | +- **Domain expertise:** Performance varies by domain (for example, creative |
| 19 | + writing, code, scientific analysis). Review model benchmarks or test with your |
| 20 | + own examples. |
| 21 | +- **Context window:** Long documents, extensive conversations, or large |
| 22 | + codebases require models with longer context windows. |
| 23 | +- **Embeddings:** For semantic search or similarity, consider embedding models. |
| 24 | + These aren't for text generation. |
| 25 | +- **Latency:** Real-time apps may need low-latency responses. Smaller models (or |
| 26 | + “Mini,” “Nano,” and “Flash” variants) typically respond faster than larger |
| 27 | + models. |
| 28 | + |
| 29 | +## Models by task / use case at a glance |
| 30 | + |
| 31 | +| Task / use case | Example models | Key strengths | Considerations | |
| 32 | +| --------------------------------------- | -------------------------------------------------- | ---------------------------------------------- | ------------------------------------ | |
| 33 | +| General-purpose conversation | Claude 4 Sonnet, GPT-4.1, Gemini Pro | Balanced, reliable, creative | May not handle edge cases as well | |
| 34 | +| Complex reasoning and research | Claude 4 Opus, O3, Gemini 2.5 Pro | Highest accuracy, multi-step analysis | Higher cost, quality critical | |
| 35 | +| Creative writing and content | Claude 4 Opus, GPT-4.1, Gemini 2.5 Pro | High-quality output, creativity, style control | High cost for premium content | |
| 36 | +| Document analysis and summarization | Claude 4 Opus, Gemini 2.5 Pro, Llama 3.3 | Handles long inputs, comprehension | Higher cost, slower | |
| 37 | +| Real-time apps | Claude 3.5 Haiku, GPT-4o Mini, Gemini 1.5 Flash 8B | Low latency, high throughput | Less nuanced, shorter context | |
| 38 | +| Semantic search and embeddings | OpenAI Embedding 3, Nomic AI, Hugging Face | Vector search, similarity, retrieval | Not for text generation | |
| 39 | +| Custom model training & experimentation | Llama 4 Scout, Llama 3.3, DeepSeek, Mistral | Open source, customizable | Requires setup, variable performance | |
| 40 | + |
| 41 | +<Note> |
| 42 | + Hypermode provides access to the most popular open source and commercial |
| 43 | + models through [Hypermode Model Router documentation](/model-router). We're |
| 44 | + constantly evaluating model usage and adding new models to our catalog based |
| 45 | + on demand. |
| 46 | +</Note> |
| 47 | + |
| 48 | +## Get started |
| 49 | + |
| 50 | +You can change models at any time in your agent settings. Start with a |
| 51 | +general-purpose model, then iterate and optimize as you learn more about your |
| 52 | +agent's needs. |
| 53 | + |
| 54 | +1. [**Create an agent**](/create-agent) with GPT-4.1 (default). |
| 55 | +2. **Define clear instructions and [connections](/connections)** for the agent's |
| 56 | + role. |
| 57 | +3. **Test with real examples** from your workflow. |
| 58 | +4. **Refine and iterate** based on results. |
| 59 | +5. **Evaluate alternatives** once you understand patterns and outcomes. |
| 60 | + |
| 61 | +<Tip> |
| 62 | + **Value first, optimize second.** Clarify the task requirements before tuning |
| 63 | + for specialized capabilities or cost. |
| 64 | +</Tip> |
| 65 | + |
| 66 | +## Comparison of select large language models |
| 67 | + |
| 68 | +| Model | Best For | Considerations | Context Window+ | Speed | Cost++ | |
| 69 | +| -------------------- | ----------------------------------- | --------------------------------------- | -------------------- | --------- | ------ | |
| 70 | +| **Claude 4 Opus** | Complex reasoning, long docs | Higher cost, slower than lighter models | Very long (200K+) | Moderate | $$$$ | |
| 71 | +| **Claude 4 Sonnet** | General-purpose, balanced workloads | Less capable than Opus for edge cases | Long (100K+) | Fast | $$$ | |
| 72 | +| **GPT-4.1** | Most tasks, nuanced output | Higher cost, moderate speed | Long (128K) | Moderate | $$$ | |
| 73 | +| **GPT-4.1 Mini** | High-volume, cost-sensitive | Less nuanced, shorter context | Medium (32K-64K) | Very Fast | $$ | |
| 74 | +| **GPT o3** | General chat, broad compatibility | May lack latest features/capabilities | Medium (32K-64K) | Fast | $$ | |
| 75 | +| **Gemini 2.5 Pro** | Up-to-date info | Limited access, higher cost | Long (128K+) | Moderate | $$$ | |
| 76 | +| **Gemini 2.5 Flash** | Real-time, rapid responses | Shorter context, less nuanced | Medium (32K-64K) | Very Fast | $$ | |
| 77 | +| **Llama 4 Scout** | Privacy, customization, open source | Variable performance | Medium-Long (varies) | Fast | $ | |
| 78 | + |
| 79 | +<sup> |
| 80 | + \+ Context window sizes are approximate and may vary by deployment/version. |
| 81 | +</sup> |
| 82 | +<sup>++ Relative cost per 1K tokens ($ = lowest, $$$$ = highest)</sup> |
0 commit comments