Documentation

Rate Limits

Understand API rate limits and how to handle them gracefully

Rate Limit Headers

Every API response includes rate limit headers so you can monitor your usage and avoid hitting limits.

Response Headers

X-RateLimit-Limit

Maximum requests allowed in the current window

X-RateLimit-Remaining

Requests remaining in the current window

X-RateLimit-Reset

Unix timestamp when the rate limit resets

Retry-After

Seconds to wait before retrying (only on 429 responses)

Limits by Subscription Tier

Limit TypeFREESTARTERPROFESSIONALENTERPRISE
API Requests/minute1030100500
Concurrent uploads10203050
Batch upload size10 files20 files20 files50 files
Max file size (images)25MB50MB100MB200MB
Max file size (documents)50MB50MB50MB50MB
Max file size (videos)1GB2GB5GB10GB
Storage quota1GB25GB100GBUnlimited
AI descriptions/month153002,50010,000
Chat messages/month102002,0005,000

FREE Tier API Access

The FREE tier does not include API key access. API keys are available from the STARTER tier and above. FREE tier users can only access the platform through the web dashboard.

Session Authentication

Rate limits apply to API key authenticated requests. Users authenticated via browser session (web dashboard) are not subject to per-minute rate limits.

Upload Limits

Uploads are subject to concurrency and size limits to ensure reliability.

Per-File Limit

100MB per file for streaming uploads (use presigned URL for larger files)

Concurrent Uploads

See the tier table above for concurrent upload limits per plan

429 Too Many Requests

When upload limits are exceeded, the API returns 429 Too Many Requests. Wait for the Retry-After duration before retrying.

Handling Rate Limits

1. Monitor Headers Proactively

def check_rate_limits(response):
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
reset_time = int(response.headers.get('X-RateLimit-Reset', 0))
if remaining < 10:
wait_time = reset_time - time.time()
print(f"Warning: Only {remaining} requests left. Resets in {wait_time:.0f}s")
return remaining > 0

2. Implement Exponential Backoff

import time
import random
def exponential_backoff(attempt, base_delay=1, max_delay=60):
"""Calculate delay with jitter for retry."""
delay = min(base_delay * (2 ** attempt), max_delay)
jitter = random.uniform(0, delay * 0.1)
return delay + jitter
def request_with_backoff(func, max_retries=5):
for attempt in range(max_retries):
response = func()
if response.status_code == 429:
retry_after = response.headers.get('Retry-After')
if retry_after:
time.sleep(int(retry_after))
else:
time.sleep(exponential_backoff(attempt))
continue
return response
raise Exception("Max retries exceeded")

3. Use Request Queuing

import asyncio
from asyncio import Semaphore
class RateLimitedClient:
def __init__(self, requests_per_second=10):
self.semaphore = Semaphore(requests_per_second)
self.delay = 1.0 / requests_per_second
async def request(self, func):
async with self.semaphore:
result = await func()
await asyncio.sleep(self.delay)
return result
# Usage
client = RateLimitedClient(requests_per_second=10)
results = await asyncio.gather(*[
client.request(lambda: upload_file(f))
for f in files
])

Best Practices

Use Batch Endpoints

Instead of 50 individual uploads, use /upload/stream-batch to upload multiple files in one request. This counts as one API call.

Implement Client-Side Throttling

Don't wait for 429 errors. Track your usage and slow down proactively when approaching limits.

Cache Responses

Cache API responses locally to avoid repeated calls for the same data. File listings and search results are good candidates.

Spread Requests Over Time

If you have batch jobs, spread them across the minute rather than sending all requests at once.

Enterprise Rate Limits

Enterprise customers can request custom rate limits based on their needs. Contact your account manager or email enterprise@aionvision.tech.

Need Higher Limits?

If you're consistently hitting rate limits, consider upgrading your plan or contact us to discuss enterprise options with custom limits.