> For the complete documentation index, see [llms.txt](https://alludium.gitbook.io/alludium-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://alludium.gitbook.io/alludium-docs/administration/3.-agent-builder/3.3-ai-model-configuration.md).

# AI Model Configuration

To access model configuration, open your agent in Agent Builder, click **Configuration**, then select the **AI Model** sub-tab. You can also configure the model conversationally via the Agent Configurator chat panel (e.g., "change the model to Opus").

When you build an agent, the Agent Configurator will select the model family and specific version based on task requirements. Higher reasoning models suit multi-step logic, structured analysis, and complex drafting. Lighter models work well for summarization, classification, and monitoring tasks. Match configuration to the specificity of the task.

***

### Model Selection

The AI Model sub-tab uses a card-based selection interface. You choose in three steps:

1. **Choose Model Family** — Select Anthropic or Google.
2. **Choose Infrastructure** — Select Amazon Web Services or Google.
3. **Choose Model** — Pick a specific model from the available cards.

**Available models:**

#### Claude Family

* **Claude Haiku 4.5** — Fast, efficient for high-volume simple tasks
* **Claude Sonnet 4.5** — Balanced performance for most business workflows
* **Claude Opus 4.5** — Advanced reasoning for complex analysis and multi-step tasks
* **Claude Opus 4.6** — Latest generation advanced reasoning

#### Gemini Family

* **Gemini 2.5 Flash Lite** — Fastest, most cost-effective for simple tasks
* **Gemini 2.5 Flash** — Fast processing with good quality
* **Gemini 2.5 Pro** — Strong reasoning for complex workflows
* **Gemini 3 Flash Preview** — Latest generation fast model (preview)
* **Gemini 3 Pro Preview** — Latest generation advanced reasoning (preview)

**Model characteristics:**

| Model                 | Best For                                                    | Speed    | Cost     |
| --------------------- | ----------------------------------------------------------- | -------- | -------- |
| Claude Opus 4.6       | Latest generation complex reasoning and multi-step tasks    | Slower   | Higher   |
| Claude Opus 4.5       | Complex reasoning, technical analysis, multi-step workflows | Slower   | Higher   |
| Gemini 2.5 Pro        | Advanced analysis, strategic reasoning                      | Balanced | Higher   |
| Claude Sonnet 4.5     | Research, drafting, general business tasks                  | Balanced | Moderate |
| Gemini 2.5 Flash      | High-volume processing with quality                         | Fast     | Moderate |
| Claude Haiku 4.5      | Summarization, classification, monitoring                   | Fastest  | Lower    |
| Gemini 2.5 Flash Lite | Simple, high-volume tasks                                   | Fastest  | Lowest   |

**When to use each:**

**Advanced reasoning (Opus 4.6, Opus 4.5, Gemini 2.5 Pro)** — Legal document analysis, investment memo generation, complex data synthesis, strategic analysis

**Balanced (Sonnet, Gemini 2.5 Flash)** — Company research, email drafting, meeting summaries, report generation (default for most agents)

**Fast processing (Haiku, Gemini Flash Lite)** — News monitoring, simple classification, short summaries, high-volume processing

**Preview models (Gemini 3 Flash/Pro)** — Testing latest capabilities, early access to new features (may have different behavior than stable models)

***

### Reasoning Configuration

Some models allocate a reasoning budget — tokens dedicated to working through a problem before producing a response. The **Reasoning Scale (1-10)** slider controls this budget, with endpoints labeled **Minimal** and **Maximal**. Increasing the slider gives the model more capacity for complex, multi-step reasoning.

**Reasoning levels:**

**Low (1-3)** — Straightforward tasks with clear answers\
Use for: Summaries, classifications, simple drafts

**Medium (4-6)** — Standard business workflows (default)\
Use for: Research, analysis, structured drafting

**High (7-10)** — Complex multi-step reasoning\
Use for: Strategic analysis, technical problem-solving, complex synthesis

**When to increase reasoning:**

* Agent outputs lack depth or miss nuances
* Task requires multi-step logic
* Outputs need to consider multiple perspectives
* Complex synthesis from many sources

**When to decrease reasoning:**

* Task is straightforward
* Speed is more important than depth
* High-volume simple processing
* Cost optimization for simple tasks

**Trade-off:** Higher reasoning improves quality but increases response time and cost per invocation.

***

### Temperature

Controls output randomness and creativity. Lower values produce more consistent, predictable outputs. Higher values increase variety but reduce consistency.

**Scale:** 0.0 (deterministic) to 1.0 (creative)

**Recommended settings:**

**0.0-0.3** — Factual tasks requiring consistency\
Use for: Data extraction, classification, structured reports

**0.4-0.7** — Balanced tasks (default: 0.5)\
Use for: Research, analysis, business drafting

**0.8-1.0** — Creative tasks\
Use for: Marketing copy, brainstorming, varied phrasing

**Most agents should use default (0.5)** unless you have specific requirements for consistency or creativity.

***

### Configuration Strategy

#### Start Conservative

1. Use Claude Sonnet 4.5 or Gemini 2.5 Flash with default settings
2. Deploy and run 10-20 tasks
3. Identify quality or performance issues
4. Adjust one parameter at a time
5. Test changes before full deployment

#### Optimize Based on Performance

**If outputs lack depth:**\
Increase reasoning configuration or upgrade to Opus/Gemini 2.5 Pro

**If outputs are inconsistent:**\
Lower temperature, tighten prompt constraints

**If outputs are too verbose:**\
Reduce response length limit, refine prompt for conciseness

**If agent is too slow:**\
Reduce reasoning configuration or switch to Flash/Haiku for simple tasks

**If costs are too high:**\
Use Flash Lite/Haiku for high-volume tasks, reduce response length, lower reasoning

***

### Choosing Between Model Families

**Use Claude models when:**

* You need proven, stable performance
* Task requires nuanced understanding
* Quality is more important than speed
* You're familiar with Claude's capabilities
* Standard context window (200K tokens) is sufficient

**Use Gemini models when:**

* Speed and cost optimization are priorities
* High-volume processing is required
* Very large context windows needed (up to 2M tokens for Gemini 2.5 Pro, 1M+ for Flash models)
* Processing extremely long documents or multiple files
* Testing latest preview features
* Multimodal capabilities are needed and enabled for the selected workflow

**Context window comparison:**

* **Claude models:** 200K tokens
* **Gemini 2.5 Flash Lite/Flash:** 1M+ tokens
* **Gemini 2.5 Pro:** Up to 2M tokens
* **Gemini 3 Preview models:** 1M+ tokens

**When large context matters:** Processing entire codebases, analyzing multiple lengthy documents simultaneously, maintaining extended conversation history, working with comprehensive datasets.

***

### Common Mistakes

❌ **Using Opus/Pro for everything:** Wastes budget on simple tasks\
✅ **Match model to complexity:** Flash Lite/Haiku for monitoring, Sonnet/Flash for research, Opus/Pro for complex analysis

❌ **Maximum reasoning for all agents:** Slow and expensive\
✅ **Tune per agent:** Simple tasks need less reasoning

❌ **Temperature 1.0 for consistency tasks:** Creates unreliable outputs\
✅ **Low temperature for facts:** 0.2-0.3 for data extraction and classification

❌ **Extended response length by default:** Wastes tokens\
✅ **Minimum viable length:** Start short, increase only if needed

❌ **Using preview models in production:** Unpredictable behavior changes\
✅ **Stable models for production:** Use previews for testing only

***

### Next Steps

With model configuration complete, continue to **Integrations** to connect your agent to external tools and applications for live data access.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://alludium.gitbook.io/alludium-docs/administration/3.-agent-builder/3.3-ai-model-configuration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
