Rate Limiting
Primatomic enforces per-tenant rate limits to ensure fair resource allocation. This document specifies the rate limiting behavior and client requirements.
Rate Limit Specifications
Section titled “Rate Limit Specifications”| Parameter | Value |
|---|---|
| Sustained rate | 100 requests per second |
| Burst capacity | 50 requests |
| Scope | Per tenant (shared across all API keys) |
All API keys belonging to a tenant share the same rate limit bucket.
Token Bucket Algorithm
Section titled “Token Bucket Algorithm”The service uses a token bucket algorithm with these properties:
| Property | Specification |
|---|---|
| Refill rate | 100 tokens per second |
| Bucket capacity | 150 tokens (sustained + burst) |
| Request cost | 1 token per request |
| Rejection | Requests are rejected with 429 when bucket is empty |
429 Response Format
Section titled “429 Response Format”When rate limited, the service responds:
HTTP/1.1 429 Too Many RequestsRetry-After: 1Content-Type: text/plain
Rate limit exceededThe Retry-After header indicates the minimum seconds to wait before retrying.
Client Requirements
Section titled “Client Requirements”Handling 429 Responses
Section titled “Handling 429 Responses”| Requirement | Level |
|---|---|
Respect Retry-After header | Required |
| Implement exponential backoff | Required |
| Add jitter to backoff | Recommended |
| Implement client-side throttling | Recommended |
Retry Implementation
Section titled “Retry Implementation”Clients should implement retry logic for 429 responses:
async function withRateLimitRetry<T>( fn: () => Promise<T>, maxRetries: number = 5): Promise<T> { for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await fn(); } catch (error) { if (error.status !== 429) { throw error; }
// Respect Retry-After header const retryAfter = parseInt(error.headers.get('Retry-After') || '1');
// Use exponential backoff const baseDelay = retryAfter * 1000 * Math.pow(2, attempt);
// Add jitter const jitter = Math.random() * 1000;
await sleep(baseDelay + jitter); } } throw new Error('Rate limit: max retries exceeded');}Best Practices
Section titled “Best Practices”Batch Appends (Recommended)
Section titled “Batch Appends (Recommended)”The /logs/{log_name}/append_batch endpoint accepts multiple events in a single request, reducing overhead and rate limit consumption:
// Single batch request with raw bytesconst response = await api.appendBatch(logName, events);// response.sequences contains sequence numbers for all eventsconst lastSequence = response.sequences[response.sequences.length - 1];Batch requests:
- Accept up to 10 MB total request size
- Return sequence numbers for all events in order
- Count as a single request against the rate limit
- Fail atomically on validation errors
Concurrent Appends (Alternative)
Section titled “Concurrent Appends (Alternative)”For single-event appends, use controlled concurrency:
import pLimit from 'p-limit';
// Limit to 50 concurrent requests (within burst capacity)const limit = pLimit(50);
const results = await Promise.all( events.map(event => limit(() => api.appendLog(logName, event))));Client-Side Rate Limiting (Recommended)
Section titled “Client-Side Rate Limiting (Recommended)”Implement client-side throttling to avoid hitting limits:
import pLimit from 'p-limit';
// Limit concurrent requestsconst limit = pLimit(50);
const results = await Promise.all( items.map(item => limit(() => api.process(item))));Response Caching (Recommended)
Section titled “Response Caching (Recommended)”Cache responses when freshness is not critical:
const cache = new Map<string, { data: any; expires: number }>();
async function cachedQuery( logName: string, viewName: string): Promise<Uint8Array> { const key = `${logName}:${viewName}`; const cached = cache.get(key);
if (cached && cached.expires > Date.now()) { return cached.data; // Cache hit }
const data = await api.queryView(logName, viewName, new Uint8Array()); cache.set(key, { data, expires: Date.now() + 5000 // 5 second TTL }); return data;}Stale Reads (Optional)
Section titled “Stale Reads (Optional)”Use stale reads to reduce request volume when consistency is not required:
// With sequence: waits for consistency, counts toward rate limitawait api.queryView(logName, viewName, input, sequence);
// Without sequence: immediate response, may be staleawait api.queryView(logName, viewName, input);Monitoring Usage
Section titled “Monitoring Usage”Monitor usage to anticipate rate limiting:
curl https://api.primatomic.com/billing/usage \ -H "Authorization: Bearer $TOKEN"{ "storage_gb_seconds": 1234.56, "compute_seconds": 789.01}Storage is metered in GB-seconds. Divide by 3,600 to convert to GB-hours.
Higher Limits
Section titled “Higher Limits”Contact support if you require higher limits. Available options may include:
- Increased per-tenant rate limits
- Dedicated infrastructure
- Batch API endpoints
Rate limit increases are evaluated based on use case and tenant history.