Skip to content

Rate limits & TPM

Each key is provisioned with a tokens-per-minute (TPM) and requests-per-minute (RPM) allowance based on your forecast. Standard ranges run from 100 to 5,000 RPM; higher is available on request.

Because every model is backed by many upstream keys, your effective throughput is the sum across channels — you can burst higher than any single official provider tier allows.

  • Every model is fronted by a pool of upstream keys.
  • Each channel is health-checked continuously; failing keys are quarantined within seconds.
  • Requests automatically fail over to a healthy channel, so a single bad key never breaks your traffic.
  • On 429, back off and retry with jitter.
  • If you consistently hit limits, contact us to raise your allowance — we size to your workload.
  • Avoid sending a single request with an extreme max_tokens; some pooled channels clamp to a safe ceiling.