API rate limiting

API rate limits are implemented to ensure fair and efficient resource usage, maintain service stability, and protect against abuse or overuse. Controlling the number of requests within a defined time period prevents server overloads, maintains consistent performance for all customers, and safeguards the system from malicious activity.

Rate limit headers

Rate limits use a token bucket model. Each response includes headers that indicate your current usage and limits:

HeaderDescription
X-RateLimit-RemainingNumber of tokens currently available. This is how many requests you can make immediately before requests are refused.
X-RateLimit-Requested-TokensNumber of tokens consumed by the request. Each request costs one token.
X-RateLimit-Burst-CapacityMaximum number of tokens the bucket can hold. This defines the largest burst of requests you can make at once when the bucket is full.
X-RateLimit-Replenish-RateNumber of tokens refilled into the bucket per second. This defines the sustained request rate you can maintain over time

How it works

  • Each request deducts one token from your bucket (X-RateLimit-Requested-Tokens).
  • When the bucket is empty (X-RateLimit-Remaining = 0), further requests are refused until tokens are replenished.
  • Tokens refill automatically at the X-RateLimit-Replenish-Rate, up to the X-RateLimit-Burst-Capacity.

This model allows short bursts of traffic up to the burst capacity, while enforcing a steady average rate over time.

Limits

By default, each customer is assigned an X-RateLimit-Replenish-Rate of 50 reads and 50 writes per second, with an X-RateLimit-Burst-Capacity of double that value. Increased limits are configurable upon request for high-traffic customers or for peak events (Black Friday, Singles’ Day).

Read and write limits are reduced to 5 requests per second in the sandbox environment.

Exceeding the limit will result in an HTTP 429 Too Many Requests error.