Call AI Model

The Call AI Model node is a powerful component that allows you to make calls to various AI providers and their models. This guide explains how to configure and use AI model calls effectively in your Neurons.

Functionality

The Call AI Model node allows you to:

Select from configured AI providers
Choose specific models for each provider
Configure model parameters
Define structured output schemas (when supported)
Chain multiple model calls for advanced workflows

Node Properties

AI Provider Configuration

AI Provider: Select from your configured providers (OpenAI, Google, Anthropic, Cohere, Mistral, or Cloudflare Workers AI)
Providers must be configured with valid API keys in the Providers section before they appear in the list. You can manage multiple API keys per provider for different environments or projects.

MCP Server tools selection

Using an MCP Server in a Neuron workflow

Model Selection

Model: Choose from available models for the selected provider
- Model availability depends on the selected provider
- Cloudflare Workers AI provides access to 50+ models including Llama, Mistral, etc.
- Some models support structured output, while others don’t

Model Parameters

Available parameters and their valid ranges vary depending on the selected model. The interface will automatically update to only show the parameters supported by your chosen model.

Common parameters across many models include:

Parameter	Description	Notes
Max Output Tokens	Maximum number of tokens in the model’s response	Higher values allow longer responses but increase costs. Each model has its own maximum limit.
Temperature	Controls response randomness	Lower values (0.1-0.3): More focused, deterministic responses. Higher values (0.7-1.0): More creative, varied responses. Recommended: 0.1-0.3 for structured output.
Top P (Nucleus Sampling)	Controls response diversity	Works alongside temperature. Lower values: More focused on likely tokens. Higher values: More diverse vocabulary. Not available in all models.
Top K	Limits token selection to K most likely tokens	Helps prevent unlikely token selections. Only available in specific models (e.g., Google’s Gemini).
Frequency Penalty	Reduces repetition based on token frequency	Higher values discourage repeating information. Useful for diverse content. Primarily in OpenAI models.
Presence Penalty	Penalizes tokens that have appeared at all	Higher values encourage new topics. Helps prevent theme repetition. Primarily in OpenAI models.

Some models may expose additional parameters not listed here. Always check the provider’s documentation for model-specific parameter details.

Output Configuration

Output Format:
- Text (default): Regular text output
- Structured Output: JSON-formatted output following a schema
Schema Configuration: (When using Structured Output)
- Inline JSON: Define the schema directly
- URL: Reference an external JSON Schema

Advanced Usage: Chaining Models

You can create powerful workflows by chaining different models using nodes:

Use a more capable model for initial processing
Connect to a Use Output node
Change the system instructions of the second model, to ask it to structure the output into a specific format, for example: Extract useful information from the following text
Feed the result to a fast (and cheap!) model with structured output support

This pattern allows you to:

Leverage the strengths of different models
Enforce structured output even with models that don’t natively support it
Optimize for both quality and cost

Tips and Best Practices

Start with lower temperatures (0.1-0.3) when using structured output to get more consistent results
Use Top K and Top P carefully as they can significantly impact output quality
When using structured output:
- Ensure your schema is valid and well-defined
- Test with simple schemas before moving to complex ones
Monitor token usage and costs through execution logs when chaining multiple model calls

Testing Undeployed Versions

You can test undeployed versions of your Neurons using the Snippets button in the editor. This feature:

Generates code snippets that can be used to call your Neuron in its current state
Allows you to test revisions before deploying
Provides example code in various programming languages
Helps verify that your changes work as expected

This is particularly useful when:

Making changes to model parameters
Testing new structured output schemas
Verifying model chaining behavior
Debugging issues with specific configurations

Introduction

Neurons

Features

Settings & Billing

Functionality

Node Properties

AI Provider Configuration

MCP Server tools selection

Model Selection

Model Parameters

Output Configuration

Advanced Usage: Chaining Models

Tips and Best Practices

Testing Undeployed Versions

Introduction

Neurons

Features

Settings & Billing

​Functionality

​Node Properties

​AI Provider Configuration

​MCP Server tools selection

​Model Selection

​Model Parameters

​Output Configuration

​Advanced Usage: Chaining Models

​Tips and Best Practices

​Testing Undeployed Versions

Functionality

Node Properties

AI Provider Configuration

MCP Server tools selection

Model Selection

Model Parameters

Output Configuration

Advanced Usage: Chaining Models

Tips and Best Practices

Testing Undeployed Versions