AkashML

Documentation

Introduction
Claude Code
List models (Anthropic shape) GETAnthropic base health probe GETCreate a message (Anthropic shape) POST
Anthropic SDK
List models GET
Create chat completion POSTCreate completion POST
Platform
Parameter controlsPresetsInferences
Models
Settings
API ReferenceOpenai

Create completion

POST
/v1/completions

Creates a completion for the provided prompt. Supports streaming via SSE when stream: true.

Response headers

  • Inference-Id — Unique ID for this request. Include this when contacting support.

Error codes

StatusMeaning
402Insufficient credits
429Rate limited
504No backend available
529No healthy backends are available for the requested model

Authorization

BearerAuth
AuthorizationBearer <token>

API key passed as Bearer token

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

application/json

curl -X POST "https://example.com/v1/completions" \  -H "Content-Type: application/json" \  -d '{    "model": "string",    "prompt": "string"  }'
{
  "id": "string",
  "object": "string",
  "created": 0,
  "model": "string",
  "choices": [
    {
      "index": 0,
      "text": "string",
      "finish_reason": "string"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Create chat completion POST

Creates a model response for the given chat conversation. Supports streaming via SSE when `stream: true`. **Response headers** - `Inference-Id` — Unique ID for this request. Include this when contacting support. **Error codes** | Status | Meaning | |--------|---------| | `402` | Insufficient credits | | `429` | Rate limited | | `504` | No backend available | | `529` | No healthy backends are available for the requested model |

model*string

The model to use for inference.

prompt*|array<any>

The prompt to complete. Can be a string, array of strings, or array of token IDs.

temperature?number

Sampling temperature. Higher values produce more random output.

max_tokens?integer

Maximum number of tokens to generate.

top_p?number

Nucleus sampling probability mass threshold.

frequency_penalty?number

Penalizes repeated tokens based on frequency.

presence_penalty?number

Penalizes tokens that have already appeared.

stop?|array<string>

Sequence(s) at which generation stops.

stream?boolean

Stream the response as SSE.

n?integer

Number of completions to generate. Defaults to 1.

user?string

End-user identifier for abuse monitoring.

[key: string]?any