Call AI Model
Learn how to use the Call AI Model node to interact with various AI providers and models, with support for structured output.
The Call AI Model node is a powerful component that allows you to make calls to various AI providers and their models. This guide explains how to configure and use AI model calls effectively in your Neurons.
Functionality
The Call AI Model node allows you to:
- Select from configured AI providers
- Choose specific models for each provider
- Configure model parameters
- Define structured output schemas (when supported)
- Chain multiple model calls for advanced workflows
Node Properties
AI Provider Configuration
- AI Provider: Select from your configured providers (OpenAI, Google, Anthropic, Cohere, Mistral, or Cloudflare Workers AI)
Providers must be configured with valid API keys in the Providers section before they appear in the list. You can manage multiple API keys per provider for different environments or projects.
MCP Server tools selection
Using an MCP Server in a Neuron workflow
Model Selection
- Model: Choose from available models for the selected provider
- Model availability depends on the selected provider
- Cloudflare Workers AI provides access to 50+ models including Llama, Mistral, etc.
- Some models support structured output, while others don’t
Model Parameters
Available parameters and their valid ranges vary depending on the selected model. The interface will automatically update to only show the parameters supported by your chosen model.
Common parameters across many models include:
Parameter | Description | Notes |
---|---|---|
Max Output Tokens | Maximum number of tokens in the model’s response | Higher values allow longer responses but increase costs. Each model has its own maximum limit. |
Temperature | Controls response randomness | Lower values (0.1-0.3): More focused, deterministic responses. Higher values (0.7-1.0): More creative, varied responses. Recommended: 0.1-0.3 for structured output. |
Top P (Nucleus Sampling) | Controls response diversity | Works alongside temperature. Lower values: More focused on likely tokens. Higher values: More diverse vocabulary. Not available in all models. |
Top K | Limits token selection to K most likely tokens | Helps prevent unlikely token selections. Only available in specific models (e.g., Google’s Gemini). |
Frequency Penalty | Reduces repetition based on token frequency | Higher values discourage repeating information. Useful for diverse content. Primarily in OpenAI models. |
Presence Penalty | Penalizes tokens that have appeared at all | Higher values encourage new topics. Helps prevent theme repetition. Primarily in OpenAI models. |
Some models may expose additional parameters not listed here. Always check the provider’s documentation for model-specific parameter details.
Output Configuration
- Output Format:
- Text (default): Regular text output
- Structured Output: JSON-formatted output following a schema
- Schema Configuration: (When using Structured Output)
- Inline JSON: Define the schema directly
- URL: Reference an external JSON Schema
Advanced Usage: Chaining Models
You can create powerful workflows by chaining different models using nodes:
- Use a more capable model for initial processing
- Connect to a Use Output node
- Change the system instructions of the second model, to ask it to structure the output into a specific format, for example:
Extract useful information from the following text
- Feed the result to a fast (and cheap!) model with structured output support
This pattern allows you to:
- Leverage the strengths of different models
- Enforce structured output even with models that don’t natively support it
- Optimize for both quality and cost
Tips and Best Practices
- Start with lower temperatures (0.1-0.3) when using structured output to get more consistent results
- Use Top K and Top P carefully as they can significantly impact output quality
- When using structured output:
- Ensure your schema is valid and well-defined
- Test with simple schemas before moving to complex ones
- Monitor token usage and costs through execution logs when chaining multiple model calls
Testing Undeployed Versions
You can test undeployed versions of your Neurons using the Snippets button in the editor. This feature:
- Generates code snippets that can be used to call your Neuron in its current state
- Allows you to test revisions before deploying
- Provides example code in various programming languages
- Helps verify that your changes work as expected
This is particularly useful when:
- Making changes to model parameters
- Testing new structured output schemas
- Verifying model chaining behavior
- Debugging issues with specific configurations