hypermodeinc · johnymontana · Jul 11, 2025 · Jun 17, 2025 · Jun 17, 2025 · Jun 17, 2025
@@ -0,0 +1,82 @@
+---
+title: "Model Selection Guide"
+sidebarTitle: "Choose the Right Model"
+description:
+  "Select the optimal model for your agent based on your goals and use case."
+---
+
+Choosing the right model is essential to building effective agents. This guide
+helps you evaluate trade-offs, pick the right model for your use case, and
+iterate quickly.
+
+![Select your model](/images/agents/model-selection.png)
+
+## Key considerations
+
+- **Accuracy and output quality:** Advanced logic, mathematical problem-solving,
+  and multi-step analysis may require high-capability models.
+- **Domain expertise:** Performance varies by domain (for example, creative
+  writing, code, scientific analysis). Review model benchmarks or test with your
+  own examples.
+- **Context window:** Long documents, extensive conversations, or large
+  codebases require models with longer context windows.
+- **Embeddings:** For semantic search or similarity, consider embedding models.
+  These aren't for text generation.
+- **Latency:** Real-time apps may need low-latency responses. Smaller models (or
+  “Mini,” “Nano,” and “Flash” variants) typically respond faster than larger
+  models.
+
+## Models by task / use case at a glance
+
+| Task / use case                         | Example models                                     | Key strengths                                  | Considerations                       |
+| --------------------------------------- | -------------------------------------------------- | ---------------------------------------------- | ------------------------------------ |
+| General-purpose conversation            | Claude 4 Sonnet, GPT-4.1, Gemini Pro               | Balanced, reliable, creative                   | May not handle edge cases as well    |
+| Complex reasoning and research          | Claude 4 Opus, O3, Gemini 2.5 Pro                  | Highest accuracy, multi-step analysis          | Higher cost, quality critical        |
+| Creative writing and content            | Claude 4 Opus, GPT-4.1, Gemini 2.5 Pro             | High-quality output, creativity, style control | High cost for premium content        |
+| Document analysis and summarization     | Claude 4 Opus, Gemini 2.5 Pro, Llama 3.3           | Handles long inputs, comprehension             | Higher cost, slower                  |
+| Real-time apps                          | Claude 3.5 Haiku, GPT-4o Mini, Gemini 1.5 Flash 8B | Low latency, high throughput                   | Less nuanced, shorter context        |
+| Semantic search and embeddings          | OpenAI Embedding 3, Nomic AI, Hugging Face         | Vector search, similarity, retrieval           | Not for text generation              |
+| Custom model training & experimentation | Llama 4 Scout, Llama 3.3, DeepSeek, Mistral        | Open source, customizable                      | Requires setup, variable performance |
+
+<Note>
+  Hypermode provides access to the most popular open source and commercial
+  models through [Hypermode Model Router documentation](/model-router). We're
+  constantly evaluating model usage and adding new models to our catalog based
+  on demand.
+</Note>
+
+## Get started
+
+You can change models at any time in your agent settings. Start with a
+general-purpose model, then iterate and optimize as you learn more about your
+agent's needs.
+
+1. [**Create an agent**](/create-agent) with GPT-4.1 (default).
+2. **Define clear instructions and [connections](/connections)** for the agent's
+   role.
+3. **Test with real examples** from your workflow.
+4. **Refine and iterate** based on results.
+5. **Evaluate alternatives** once you understand patterns and outcomes.
+
+<Tip>
+  **Value first, optimize second.** Clarify the task requirements before tuning
+  for specialized capabilities or cost.
+</Tip>
+
+## Comparison of select large language models
+
+| Model                | Best For                            | Considerations                          | Context Window+      | Speed     | Cost++ |
+| -------------------- | ----------------------------------- | --------------------------------------- | -------------------- | --------- | ------ |
+| **Claude 4 Opus**    | Complex reasoning, long docs        | Higher cost, slower than lighter models | Very long (200K+)    | Moderate  | $$$$   |
+| **Claude 4 Sonnet**  | General-purpose, balanced workloads | Less capable than Opus for edge cases   | Long (100K+)         | Fast      | $$$    |
+| **GPT-4.1**          | Most tasks, nuanced output          | Higher cost, moderate speed             | Long (128K)          | Moderate  | $$$    |
+| **GPT-4.1 Mini**     | High-volume, cost-sensitive         | Less nuanced, shorter context           | Medium (32K-64K)     | Very Fast | $$     |
+| **GPT o3**           | General chat, broad compatibility   | May lack latest features/capabilities   | Medium (32K-64K)     | Fast      | $$     |
+| **Gemini 2.5 Pro**   | Up-to-date info                     | Limited access, higher cost             | Long (128K+)         | Moderate  | $$$    |
+| **Gemini 2.5 Flash** | Real-time, rapid responses          | Shorter context, less nuanced           | Medium (32K-64K)     | Very Fast | $$     |
+| **Llama 4 Scout**    | Privacy, customization, open source | Variable performance                    | Medium-Long (varies) | Fast      | $      |
+
+<sup>
+  \+ Context window sizes are approximate and may vary by deployment/version.
+</sup>
+<sup>++ Relative cost per 1K tokens ($ = lowest, $$$$ = highest)</sup>