Rate limiting helps you control how often your Neurons can be called.

Rate Limiting settings in the Neuron Settings page

Rate Limiting Rules

You can define as many rate limiting rules as you want. Each rule can have a different limit and time period. Rules are executed in order, and the first match blocks the request.

To speed up execution, rate limiting rules are cached for a few minutes. If you need to deploy an immediate change, we recommend creating and deploying a new version of your Neuron.

Metrics and Scopes

Rate limiting rules can be defined using the following metrics:

MetricDescription
RequestsThe number of requests made to the Neuron.
TokensThe number of tokens processed by the Neuron, across all AI providers calls.

Tokens are mixed across all AI providers calls, and are not specific to a single provider. Each provider counts tokens slightly differently, and is not normalized.

Rate limiting rules are also defined by a scope:

ScopeDescription
TotalThe total number of the chosen Metrics
Per IPThe number of the chosen Metrics per IP address
Per UserThe number of the chosen Metrics per user

Per User is only enforced if you are using JWT Authentication with a username property path defined, otherwise it will be ignored. See Access Control for more information.

What happens when a request is being rate limited?

When a request is being rate limited, the Neuron will return a 429 HTTP status code and an error message. Make sure to handle this error in your application.

You can monitor rate limit hits and usage patterns through the execution logs to fine-tune your rate limiting rules.