For all models that are compatible with the Free-tier, the following limits apply:
1 request per second
500,000 tokens per minute
1 billion tokens per month
How does rate limits rate work?
To prevent misuse and manage the capacity of our API, we have implemented limits on how much a workspace can utilize the Mistral API.
We offer two types of rate limits:
Requests per second (RPS)
Tokens per minute/month
Key points to note:
Rate limits are set at the workspace level.
Limits are defined by usage tier, where each tier is associated with a different set of rate limits. In case you need to raise your usage limits, please feel free to contact us by utilizing the support button, providing details about your specific use case.
Usage tiers
You can view the rate and usage limits for your workspace under the limits section on la Plateforme.
We offer various tiers on the platform, including a free API tier with restrictive rate limits. The free API tier is designed to allow you to try and explore our API. For actual projects and production use, we recommend upgrading to a higher tier.