Rate Limiting

Primatomic enforces per-tenant rate limits to ensure fair resource allocation. This document specifies the rate limiting behavior and client requirements.

Rate Limit Specifications

Parameter	Value
Sustained rate	100 requests per second
Burst capacity	50 requests
Scope	Per tenant (shared across all API keys)

All API keys belonging to a tenant share the same rate limit bucket.

Token Bucket Algorithm

The service uses a token bucket algorithm with these properties:

Property	Specification
Refill rate	100 tokens per second
Bucket capacity	150 tokens (sustained + burst)
Request cost	1 token per request
Rejection	Requests are rejected with `429` when bucket is empty

429 Response Format

When rate limited, the service responds:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: text/plain

Rate limit exceeded

The Retry-After header indicates the minimum seconds to wait before retrying.

Client Requirements

Handling 429 Responses

Requirement	Level
Respect `Retry-After` header	Required
Implement exponential backoff	Required
Add jitter to backoff	Recommended
Implement client-side throttling	Recommended

Retry Implementation

Clients should implement retry logic for 429 responses:

async function withRateLimitRetry<T>(
  fn: () => Promise<T>,
  maxRetries: number = 5
): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status !== 429) {
        throw error;
      }

      // Respect Retry-After header
      const retryAfter = parseInt(error.headers.get('Retry-After') || '1');

      // Use exponential backoff
      const baseDelay = retryAfter * 1000 * Math.pow(2, attempt);

      // Add jitter
      const jitter = Math.random() * 1000;

      await sleep(baseDelay + jitter);
    }
  }
  throw new Error('Rate limit: max retries exceeded');
}

Best Practices

Batch Appends (Recommended)

The /logs/{log_name}/append_batch endpoint accepts multiple events in a single request, reducing overhead and rate limit consumption:

// Single batch request with raw bytes
const response = await api.appendBatch(logName, events);
// response.sequences contains sequence numbers for all events
const lastSequence = response.sequences[response.sequences.length - 1];

Batch requests:

Accept up to 10 MB total request size
Return sequence numbers for all events in order
Count as a single request against the rate limit
Fail atomically on validation errors

Concurrent Appends (Alternative)

For single-event appends, use controlled concurrency:

import pLimit from 'p-limit';

// Limit to 50 concurrent requests (within burst capacity)
const limit = pLimit(50);

const results = await Promise.all(
  events.map(event => limit(() => api.appendLog(logName, event)))
);

Client-Side Rate Limiting (Recommended)

Implement client-side throttling to avoid hitting limits:

import pLimit from 'p-limit';

// Limit concurrent requests
const limit = pLimit(50);

const results = await Promise.all(
  items.map(item => limit(() => api.process(item)))
);

Response Caching (Recommended)

Cache responses when freshness is not critical:

const cache = new Map<string, { data: any; expires: number }>();

async function cachedQuery(
  logName: string,
  viewName: string
): Promise<Uint8Array> {
  const key = `${logName}:${viewName}`;
  const cached = cache.get(key);

  if (cached && cached.expires > Date.now()) {
    return cached.data;  // Cache hit
  }

  const data = await api.queryView(logName, viewName, new Uint8Array());
  cache.set(key, {
    data,
    expires: Date.now() + 5000  // 5 second TTL
  });
  return data;
}

Stale Reads (Optional)

Use stale reads to reduce request volume when consistency is not required:

// With sequence: waits for consistency, counts toward rate limit
await api.queryView(logName, viewName, input, sequence);

// Without sequence: immediate response, may be stale
await api.queryView(logName, viewName, input);

Monitoring Usage

Monitor usage to anticipate rate limiting:

curl https://api.primatomic.com/billing/usage \
  -H "Authorization: Bearer $TOKEN"

{
  "storage_gb_seconds": 1234.56,
  "compute_seconds": 789.01
}

Storage is metered in GB-seconds. Divide by 3,600 to convert to GB-hours.

Higher Limits

Contact support if you require higher limits. Available options may include:

Increased per-tenant rate limits
Dedicated infrastructure
Batch API endpoints

Rate limit increases are evaluated based on use case and tenant history.