Skip to main content

Documentation Index

Fetch the complete documentation index at: https://student-213fb9fc.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

AJ STUDIOZ Cloud Infra enforces rate limits to ensure fair access and platform stability. Limits apply per API key and are reset on a rolling or monthly basis depending on your plan.

Rate Limit Headers

Every API response includes headers showing your current usage:
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait when rate limited (429 responses only)

Plan Limits

PlanRequests / minTokens / dayConcurrent Requests
Free10100K2
Developer601M10
Pro20010M30
EnterpriseCustomCustomCustom
Token limits apply across all models. Larger models consume more tokens per request.

Rate Limit Errors

When you exceed your rate limit, the API returns a 429 Too Many Requests response:
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please wait before retrying.",
    "retry_after": 30
  }
}

Handling Rate Limits

Python (with retry)

import time
import requests

def chat_with_retry(payload, max_retries=3):
    url = "https://api.ajstudioz.co.in/api/chat"
    headers = {"Authorization": "Bearer YOUR_API_KEY"}

    for attempt in range(max_retries):
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 10))
            print(f"Rate limited. Retrying in {retry_after}s...")
            time.sleep(retry_after)
        else:
            response.raise_for_status()

    raise Exception("Max retries exceeded")

Python (with tenacity)

from tenacity import retry, stop_after_attempt, wait_exponential
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.ajstudioz.co.in/v1",
    api_key="YOUR_API_KEY"
)

@retry(
    retry=lambda e: isinstance(e, RateLimitError),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    stop=stop_after_attempt(5)
)
def get_completion(prompt: str) -> str:
    return client.chat.completions.create(
        model="gemma3:27b",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

Best Practices

  • Batch requests — combine multiple prompts where possible instead of making individual calls
  • Stream responses — use streaming to get faster first tokens without increasing rate limit usage
  • Cache results — cache identical prompts/responses to avoid redundant API calls
  • Use smaller models for dev — use gemma3:4b or gemma3:12b during development to save quota
  • Monitor headers — track X-RateLimit-Remaining to proactively back off before hitting limits

Upgrade Your Plan

Need higher limits? Upgrade at cloud.ajstudioz.com or contact us for Enterprise pricing.