API rate limits are implemented to ensure fair and efficient resource usage, maintain service stability, and protect against abuse or overuse. Controlling the number of requests within a defined time period prevents server overloads, maintains consistent performance for all customers, and safeguards the system from malicious activity.

Rate limit headers

Rate limits use a token bucket model. Each response includes headers that indicate your current usage and limits:

Header	Description
`X-RateLimit-Remaining`	Number of tokens currently available. This is how many requests you can make immediately before requests are refused.
`X-RateLimit-Requested-Tokens`	Number of tokens consumed by the request. Each request costs one token.
`X-RateLimit-Burst-Capacity`	Maximum number of tokens the bucket can hold. This defines the largest burst of requests you can make at once when the bucket is full.
`X-RateLimit-Replenish-Rate`	Number of tokens refilled into the bucket per second. This defines the sustained request rate you can maintain over time

How it works

Each request deducts one token from your bucket (X-RateLimit-Requested-Tokens).
When the bucket is empty (X-RateLimit-Remaining = 0), further requests are refused until tokens are replenished.
Tokens refill automatically at the X-RateLimit-Replenish-Rate, up to the X-RateLimit-Burst-Capacity.

This model allows short bursts of traffic up to the burst capacity, while enforcing a steady average rate over time.

Limits

By default, each customer is assigned an X-RateLimit-Replenish-Rate of 50 reads and 50 writes per second, with an X-RateLimit-Burst-Capacity of double that value. Increased limits are configurable upon request for high-traffic customers or for peak events (Black Friday, Singles’ Day).

Read and write limits are reduced to 5 requests per second in the sandbox environment.

Exceeding the limit will result in an HTTP 429 Too Many Requests error.