Documentation Index
Fetch the complete documentation index at: https://student-213fb9fc.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
All models on AJ STUDIOZ Cloud Infra support streaming responses. With streaming, tokens are returned incrementally as they’re generated — giving users a much faster perceived response time.
Ollama-Compatible Streaming
Set "stream": true in the request body:
curl https://api.ajstudioz.co.in/api/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3:27b",
"messages": [{ "role": "user", "content": "Tell me about black holes." }],
"stream": true
}'
Each streamed chunk is a JSON object:
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:01Z","message":{"role":"assistant","content":"Black"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:01Z","message":{"role":"assistant","content":" holes"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:02Z","message":{"role":"assistant","content":" are"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:05Z","message":{"role":"assistant","content":""},"done":true,"total_duration":5000000000}
The final chunk has "done": true and includes timing metrics.
OpenAI-Compatible Streaming
Use stream=True with the OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://api.ajstudioz.co.in/v1",
api_key="YOUR_API_KEY"
)
stream = client.chat.completions.create(
model="kimi-k2:1t",
messages=[{"role": "user", "content": "Write a poem about the cosmos."}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print() # final newline
Ollama Python SDK Streaming
from ollama import Client
client = Client(
host="https://api.ajstudioz.co.in",
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
stream = client.chat(
model="deepseek-v3.2",
messages=[{"role": "user", "content": "Explain reinforcement learning."}],
stream=True
)
for chunk in stream:
content = chunk["message"]["content"]
print(content, end="", flush=True)
print()
Node.js Streaming
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.ajstudioz.co.in/v1",
apiKey: "YOUR_API_KEY",
});
const stream = await openai.chat.completions.create({
model: "glm-5",
messages: [{ role: "user", content: "What is quantum entanglement?" }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(content);
}
console.log();
Streaming Metrics
The final message in an Ollama-compatible stream includes performance metrics:
{
"model": "gemma3:27b",
"created_at": "2026-03-07T12:00:10Z",
"message": { "role": "assistant", "content": "" },
"done": true,
"done_reason": "stop",
"total_duration": 8432156789,
"load_duration": 123456789,
"prompt_eval_count": 26,
"prompt_eval_duration": 456789012,
"eval_count": 298,
"eval_duration": 7851234567
}
| Field | Description |
|---|
total_duration | Total time in nanoseconds |
load_duration | Model load time in nanoseconds |
prompt_eval_count | Tokens in the prompt |
eval_count | Tokens generated |
eval_duration | Generation time in nanoseconds |
Tokens per second = eval_count / (eval_duration / 1e9)