Streaming - AJ STUDIOZ Cloud Infra

Overview

All models on AJ STUDIOZ Cloud Infra support streaming responses. With streaming, tokens are returned incrementally as they’re generated — giving users a much faster perceived response time.

Ollama-Compatible Streaming

Set "stream": true in the request body:

curl https://api.ajstudioz.co.in/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:27b",
    "messages": [{ "role": "user", "content": "Tell me about black holes." }],
    "stream": true
  }'

Each streamed chunk is a JSON object:

{"model":"gemma3:27b","created_at":"2026-03-07T12:00:01Z","message":{"role":"assistant","content":"Black"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:01Z","message":{"role":"assistant","content":" holes"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:02Z","message":{"role":"assistant","content":" are"},"done":false}
{"model":"gemma3:27b","created_at":"2026-03-07T12:00:05Z","message":{"role":"assistant","content":""},"done":true,"total_duration":5000000000}

The final chunk has "done": true and includes timing metrics.

OpenAI-Compatible Streaming

Use stream=True with the OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ajstudioz.co.in/v1",
    api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
    model="kimi-k2:1t",
    messages=[{"role": "user", "content": "Write a poem about the cosmos."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print()  # final newline

Ollama Python SDK Streaming

from ollama import Client

client = Client(
    host="https://api.ajstudioz.co.in",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

stream = client.chat(
    model="deepseek-v3.2",
    messages=[{"role": "user", "content": "Explain reinforcement learning."}],
    stream=True
)

for chunk in stream:
    content = chunk["message"]["content"]
    print(content, end="", flush=True)
print()

Node.js Streaming

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.ajstudioz.co.in/v1",
  apiKey: "YOUR_API_KEY",
});

const stream = await openai.chat.completions.create({
  model: "glm-5",
  messages: [{ role: "user", content: "What is quantum entanglement?" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(content);
}
console.log();

Streaming Metrics

The final message in an Ollama-compatible stream includes performance metrics:

{
  "model": "gemma3:27b",
  "created_at": "2026-03-07T12:00:10Z",
  "message": { "role": "assistant", "content": "" },
  "done": true,
  "done_reason": "stop",
  "total_duration": 8432156789,
  "load_duration": 123456789,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 456789012,
  "eval_count": 298,
  "eval_duration": 7851234567
}

Field	Description
`total_duration`	Total time in nanoseconds
`load_duration`	Model load time in nanoseconds
`prompt_eval_count`	Tokens in the prompt
`eval_count`	Tokens generated
`eval_duration`	Generation time in nanoseconds

Tokens per second = eval_count / (eval_duration / 1e9)

Documentation Index

​Overview

​Ollama-Compatible Streaming

​OpenAI-Compatible Streaming

​Ollama Python SDK Streaming

​Node.js Streaming

​Streaming Metrics