The Call AI Model node is a powerful component that allows you to make calls to various AI providers and their models. This guide explains how to configure and use AI model calls effectively in your Neurons.

Functionality

The Call AI Model node allows you to:

  • Select from configured AI providers
  • Choose specific models for each provider
  • Configure model parameters
  • Define structured output schemas (when supported)
  • Chain multiple model calls for advanced workflows

Node Properties

AI Provider Configuration

  • AI Provider: Select from your configured providers (OpenAI, Google, Anthropic, Cohere, Mistral, or Cloudflare Workers AI)

    Providers must be configured with valid API keys in the Providers section before they appear in the list. You can manage multiple API keys per provider for different environments or projects.

MCP Server tools selection

Using an MCP Server in a Neuron workflow

Model Selection

  • Model: Choose from available models for the selected provider
    • Model availability depends on the selected provider
    • Cloudflare Workers AI provides access to 50+ models including Llama, Mistral, etc.
    • Some models support structured output, while others don’t

Model Parameters

Available parameters and their valid ranges vary depending on the selected model. The interface will automatically update to only show the parameters supported by your chosen model.

Common parameters across many models include:

ParameterDescriptionNotes
Max Output TokensMaximum number of tokens in the model’s responseHigher values allow longer responses but increase costs.
Each model has its own maximum limit.
TemperatureControls response randomnessLower values (0.1-0.3): More focused, deterministic responses.
Higher values (0.7-1.0): More creative, varied responses.
Recommended: 0.1-0.3 for structured output.
Top P
(Nucleus Sampling)
Controls response diversityWorks alongside temperature.
Lower values: More focused on likely tokens.
Higher values: More diverse vocabulary.
Not available in all models.
Top KLimits token selection to K most likely tokensHelps prevent unlikely token selections.
Only available in specific models (e.g., Google’s Gemini).
Frequency PenaltyReduces repetition based on token frequencyHigher values discourage repeating information.
Useful for diverse content.
Primarily in OpenAI models.
Presence PenaltyPenalizes tokens that have appeared at allHigher values encourage new topics.
Helps prevent theme repetition.
Primarily in OpenAI models.

Some models may expose additional parameters not listed here. Always check the provider’s documentation for model-specific parameter details.

Output Configuration

  • Output Format:
    • Text (default): Regular text output
    • Structured Output: JSON-formatted output following a schema
  • Schema Configuration: (When using Structured Output)
    • Inline JSON: Define the schema directly
    • URL: Reference an external JSON Schema

Advanced Usage: Chaining Models

You can create powerful workflows by chaining different models using nodes:

  1. Use a more capable model for initial processing
  2. Connect to a Use Output node
  3. Change the system instructions of the second model, to ask it to structure the output into a specific format, for example: Extract useful information from the following text
  4. Feed the result to a fast (and cheap!) model with structured output support

This pattern allows you to:

  • Leverage the strengths of different models
  • Enforce structured output even with models that don’t natively support it
  • Optimize for both quality and cost

Tips and Best Practices

  • Start with lower temperatures (0.1-0.3) when using structured output to get more consistent results
  • Use Top K and Top P carefully as they can significantly impact output quality
  • When using structured output:
    • Ensure your schema is valid and well-defined
    • Test with simple schemas before moving to complex ones
  • Monitor token usage and costs through execution logs when chaining multiple model calls

Testing Undeployed Versions

You can test undeployed versions of your Neurons using the Snippets button in the editor. This feature:

  • Generates code snippets that can be used to call your Neuron in its current state
  • Allows you to test revisions before deploying
  • Provides example code in various programming languages
  • Helps verify that your changes work as expected

This is particularly useful when:

  • Making changes to model parameters
  • Testing new structured output schemas
  • Verifying model chaining behavior
  • Debugging issues with specific configurations