API rate limits are implemented to ensure fair and efficient resource usage, maintain service stability, and protect against abuse or overuse. Controlling the number of requests within a defined time period prevents server overloads, maintains consistent performance for all customers, and safeguards the system from malicious activity.
Rate limit headers
Rate limits use a token bucket model. Each response includes headers that indicate your current usage and limits:
| Header | Description |
|---|---|
X-RateLimit-Remaining | Number of tokens currently available. This is how many requests you can make immediately before requests are refused. |
X-RateLimit-Requested-Tokens | Number of tokens consumed by the request. Each request costs one token. |
X-RateLimit-Burst-Capacity | Maximum number of tokens the bucket can hold. This defines the largest burst of requests you can make at once when the bucket is full. |
X-RateLimit-Replenish-Rate | Number of tokens refilled into the bucket per second. This defines the sustained request rate you can maintain over time |
How it works
- Each request deducts one token from your bucket (
X-RateLimit-Requested-Tokens). - When the bucket is empty (
X-RateLimit-Remaining = 0), further requests are refused until tokens are replenished. - Tokens refill automatically at the
X-RateLimit-Replenish-Rate, up to theX-RateLimit-Burst-Capacity.
This model allows short bursts of traffic up to the burst capacity, while enforcing a steady average rate over time.
Limits
By default, each customer is assigned an X-RateLimit-Replenish-Rate of 50 reads and 50 writes per second, with an X-RateLimit-Burst-Capacity of double that value. Increased limits are configurable upon request for high-traffic customers or for peak events (Black Friday, Singles’ Day).
Read and write limits are reduced to 5 requests per second in the sandbox environment.
Exceeding the limit will result in an HTTP 429 Too Many Requests error.