Rate limits & TPM

TPM / RPM

Each key is provisioned with a tokens-per-minute (TPM) and requests-per-minute (RPM) allowance based on your forecast. Standard ranges run from 100 to 5,000 RPM; higher is available on request.

Because every model is backed by many upstream keys, your effective throughput is the sum across channels — you can burst higher than any single official provider tier allows.

Multi-channel high availability

Every model is fronted by a pool of upstream keys.
Each channel is health-checked continuously; failing keys are quarantined within seconds.
Requests automatically fail over to a healthy channel, so a single bad key never breaks your traffic.

Handling limits

On 429, back off and retry with jitter.
If you consistently hit limits, contact us to raise your allowance — we size to your workload.
Avoid sending a single request with an extreme max_tokens; some pooled channels clamp to a safe ceiling.