Rate Limits - AJ STUDIOZ Cloud Infra

Overview

AJ STUDIOZ Cloud Infra enforces rate limits to ensure fair access and platform stability. Limits apply per API key and are reset on a rolling or monthly basis depending on your plan.

Rate Limit Headers

Every API response includes headers showing your current usage:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds to wait when rate limited (429 responses only)

Plan Limits

Plan	Requests / min	Tokens / day	Concurrent Requests
Free	10	100K	2
Developer	60	1M	10
Pro	200	10M	30
Enterprise	Custom	Custom	Custom

Token limits apply across all models. Larger models consume more tokens per request.

Rate Limit Errors

When you exceed your rate limit, the API returns a 429 Too Many Requests response:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please wait before retrying.",
    "retry_after": 30
  }
}

Handling Rate Limits

Python (with retry)

import time
import requests

def chat_with_retry(payload, max_retries=3):
    url = "https://api.ajstudioz.co.in/api/chat"
    headers = {"Authorization": "Bearer YOUR_API_KEY"}

    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 10))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)
        else:
            response.raise_for_status()

    raise Exception("Max retries exceeded")

Python (with tenacity)

from tenacity import retry, stop_after_attempt, wait_exponential
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.ajstudioz.co.in/v1",
    api_key="YOUR_API_KEY"
)

@retry(
    retry=lambda e: isinstance(e, RateLimitError),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    stop=stop_after_attempt(5)
)
def get_completion(prompt: str) -> str:
    return client.chat.completions.create(
        model="gemma3:27b",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

Best Practices

Batch requests — combine multiple prompts where possible instead of making individual calls
Stream responses — use streaming to get faster first tokens without increasing rate limit usage
Cache results — cache identical prompts/responses to avoid redundant API calls
Use smaller models for dev — use gemma3:4b or gemma3:12b during development to save quota
Monitor headers — track X-RateLimit-Remaining to proactively back off before hitting limits

Upgrade Your Plan

Need higher limits? Upgrade at cloud.ajstudioz.com or contact us for Enterprise pricing.

Documentation Index

​Overview

​Rate Limit Headers

​Plan Limits

​Rate Limit Errors

​Handling Rate Limits

​Python (with retry)

​Python (with tenacity)

​Best Practices

​Upgrade Your Plan

Overview

Rate Limit Headers

Plan Limits

Rate Limit Errors

Handling Rate Limits

Python (with retry)

Python (with tenacity)

Best Practices

Upgrade Your Plan