# Rate Limits

Rate limits apply to all API requests.

## How It Works

The API uses a token bucket algorithm to rate limit requests. When you start making requests, you are granted a bucket of tokens. Each request consumes one token. If you have no tokens remaining, you will receive a `429 Too Many Requests` response with the problem type [Rate Limited](/problems#rate-limited).

Tokens are continuously replenished over time. If your bucket is empty, you will need to wait for tokens to be replenished before making further requests.

Rate limits are subject to change without notice.

## Response Headers

Every API response includes rate limit headers so you can monitor your current usage:

| Header | Description |
|  --- | --- |
| `X-Rate-Limit-Remaining` | The number of tokens remaining in your bucket. |
| `X-Rate-Limit-Policy` | The bucket capacity and refill window (e.g. `10 / 10s`). |


## Handling Rate Limits

When you receive a `429` response:

1. Stop making requests immediately.
2. Retry with exponential backoff.


To avoid hitting rate limits:

- Monitor the `X-Rate-Limit-Remaining` header and slow down as it approaches zero.
- Spread requests out over time rather than making them in bursts.
- Cache responses where possible to reduce the number of requests.