Skip to main content

Documentation Index

Fetch the complete documentation index at: https://student-213fb9fc.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

All models on AJ STUDIOZ Cloud Infra support streaming responses. With streaming, tokens are returned incrementally as they’re generated — giving users a much faster perceived response time.

Ollama-Compatible Streaming

Set "stream": true in the request body:
curl https://api.ajstudioz.co.in/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:27b",
    "messages": [{ "role": "user", "content": "Tell me about black holes." }],
    "stream": true
  }'
Each streamed chunk is a JSON object:
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:01Z","message":{"role":"assistant","content":"Black"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:01Z","message":{"role":"assistant","content":" holes"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:02Z","message":{"role":"assistant","content":" are"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:05Z","message":{"role":"assistant","content":""},"done":true,"total_duration":5000000000}
The final chunk has "done": true and includes timing metrics.

OpenAI-Compatible Streaming

Use stream=True with the OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ajstudioz.co.in/v1",
    api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
    model="kimi-k2:1t",
    messages=[{"role": "user", "content": "Write a poem about the cosmos."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print()  # final newline

Ollama Python SDK Streaming

from ollama import Client

client = Client(
    host="https://api.ajstudioz.co.in",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

stream = client.chat(
    model="deepseek-v3.2",
    messages=[{"role": "user", "content": "Explain reinforcement learning."}],
    stream=True
)

for chunk in stream:
    content = chunk["message"]["content"]
    print(content, end="", flush=True)
print()

Node.js Streaming

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.ajstudioz.co.in/v1",
  apiKey: "YOUR_API_KEY",
});

const stream = await openai.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "What is quantum entanglement?" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(content);
}
console.log();

Streaming Metrics

The final message in an Ollama-compatible stream includes performance metrics:
{
  "model": "gemma3:27b",
  "created_at": "2026-03-07T12:00:10Z",
  "message": { "role": "assistant", "content": "" },
  "done": true,
  "done_reason": "stop",
  "total_duration": 8432156789,
  "load_duration": 123456789,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 456789012,
  "eval_count": 298,
  "eval_duration": 7851234567
}
FieldDescription
total_durationTotal time in nanoseconds
load_durationModel load time in nanoseconds
prompt_eval_countTokens in the prompt
eval_countTokens generated
eval_durationGeneration time in nanoseconds
Tokens per second = eval_count / (eval_duration / 1e9)