Playground

The AkashML Playground is a chat interface for interacting with open source AI models in real time. Configure model parameters, compare outputs, and test prompts before integrating them into your application.

Starting a conversation

Use the model selector in the sidebar to browse available models. Each model card shows:

Provider and family (e.g. Meta Llama, DeepSeek)
Parameter count and context length
Pricing per million input and output tokens

Select a model, adjust parameters (temperature, max tokens, system prompt), and type a message. The Playground streams responses token by token using the same API your application would call.

You can switch models mid-session without clearing history, useful for comparing how different models handle the same prompt.

Parameter controls

Common parameters

Temperature — Controls randomness. Lower values (0.1–0.3) produce focused, deterministic outputs. Higher values (0.8–1.5) increase creativity and diversity.
Max Tokens — Caps the length of the generated response. The maximum depends on the model's context window.
Top P — Nucleus sampling. The model considers tokens whose cumulative probability reaches this threshold. An alternative to temperature for controlling diversity.
Top K — Limits token selection to the top K most probable tokens at each step. Some models hide this parameter.
Frequency Penalty — Reduces repetition of tokens proportional to how often they have already appeared.
Presence Penalty — Reduces repetition of any token that has appeared at all, regardless of frequency.
Repetition Penalty — A general multiplier that penalizes repeated tokens. Available on some models as an alternative to frequency/presence penalty.

Some models hide certain parameters that don't apply to their architecture. The Playground automatically shows only the relevant controls.

Reasoning mode

Models that support reasoning (indicated by supportsForceReasoning in their settings) can be switched into thinking mode. When enabled, the model produces a reasoning_content trace alongside the regular response, showing its step-by-step thought process.

Toggle reasoning mode in the sidebar when available. The reasoning trace appears in the chat alongside the final answer.

System prompts

Set a system prompt to define the model's behavior for the entire conversation. The system prompt is sent as the first message with role: "system". Common uses:

Define a persona — "You are a senior Python developer who writes concise, well-tested code."
Set output format — "Always respond in valid JSON with keys: answer, confidence, sources."
Add constraints — "Keep responses under 100 words. Do not use markdown."

Some models have a suggested system prompt placeholder. You can see it as grey text in the system prompt field.

Presets

A preset bundles a model, system prompt, and parameter set into a named configuration. Create presets from the Playground sidebar to quickly load a proven setup.

Click Save Preset to store the current configuration:

Model selection
All parameter values (temperature, max tokens, top P, top K, penalties)
System prompt

Presets are saved to your account and persist across sessions. Load a preset to instantly restore a full configuration.

Inferences

Every inference you run is logged and accessible from Playground → Inferences. Each entry shows the model used, token counts, and the total costs.

Use the inferences log to:

Review token usage per request
Monitor and compare costs

Conversation management

Each conversation maintains a full message history that is sent with every request, so the model has context of the entire chat.
Clear the conversation to start fresh — this resets the message history but keeps your parameter settings.
Messages include the role (user or assistant) and content. When reasoning mode is enabled, assistant messages may also include reasoning_content.

Keyboard shortcuts

Shortcut	Action
`Enter`	Send message
`Shift + Enter`	New line in message

Tips for better results

Be specific — Instead of "tell me about AI", try "explain the difference between supervised and unsupervised learning in 3 bullet points".
Iterate on system prompts — Small wording changes can significantly alter output quality and format.
Use lower temperature for factual tasks — Set temperature to 0.1–0.3 for code generation, data extraction, or factual Q&A.
Use reasoning mode for complex problems — Enable thinking on supported models for math, logic, and multi-step reasoning tasks.
Compare models — Try the same prompt across different models to find the best fit for your use case. Featured models are a good starting point.
Watch your credits — The Dashboard shows real-time token consumption and cost breakdowns per model. Use the usage chart to track spending trends.

Playground

Starting a conversation

Use the model selector in the sidebar to browse available models. Each model card shows:

Provider and family (e.g. Meta Llama, DeepSeek)
Parameter count and context length
Pricing per million input and output tokens

Select a model, adjust parameters (temperature, max tokens, system prompt), and type a message. The Playground streams responses token by token using the same API your application would call.

You can switch models mid-session without clearing history, useful for comparing how different models handle the same prompt.

Parameter controls

Common parameters

Temperature — Controls randomness. Lower values (0.1–0.3) produce focused, deterministic outputs. Higher values (0.8–1.5) increase creativity and diversity.
Max Tokens — Caps the length of the generated response. The maximum depends on the model's context window.
Top P — Nucleus sampling. The model considers tokens whose cumulative probability reaches this threshold. An alternative to temperature for controlling diversity.
Top K — Limits token selection to the top K most probable tokens at each step. Some models hide this parameter.
Frequency Penalty — Reduces repetition of tokens proportional to how often they have already appeared.
Presence Penalty — Reduces repetition of any token that has appeared at all, regardless of frequency.
Repetition Penalty — A general multiplier that penalizes repeated tokens. Available on some models as an alternative to frequency/presence penalty.

Some models hide certain parameters that don't apply to their architecture. The Playground automatically shows only the relevant controls.

Reasoning mode

Toggle reasoning mode in the sidebar when available. The reasoning trace appears in the chat alongside the final answer.

System prompts

Set a system prompt to define the model's behavior for the entire conversation. The system prompt is sent as the first message with role: "system". Common uses:

Define a persona — "You are a senior Python developer who writes concise, well-tested code."
Set output format — "Always respond in valid JSON with keys: answer, confidence, sources."
Add constraints — "Keep responses under 100 words. Do not use markdown."

Some models have a suggested system prompt placeholder. You can see it as grey text in the system prompt field.

Presets

A preset bundles a model, system prompt, and parameter set into a named configuration. Create presets from the Playground sidebar to quickly load a proven setup.

Click Save Preset to store the current configuration:

Model selection
All parameter values (temperature, max tokens, top P, top K, penalties)
System prompt

Presets are saved to your account and persist across sessions. Load a preset to instantly restore a full configuration.

Inferences

Every inference you run is logged and accessible from Playground → Inferences. Each entry shows the model used, token counts, and the total costs.

Use the inferences log to:

Review token usage per request
Monitor and compare costs

Conversation management

Each conversation maintains a full message history that is sent with every request, so the model has context of the entire chat.
Clear the conversation to start fresh — this resets the message history but keeps your parameter settings.
Messages include the role (user or assistant) and content. When reasoning mode is enabled, assistant messages may also include reasoning_content.

Keyboard shortcuts

Shortcut	Action
`Enter`	Send message
`Shift + Enter`	New line in message

Tips for better results

Be specific — Instead of "tell me about AI", try "explain the difference between supervised and unsupervised learning in 3 bullet points".
Iterate on system prompts — Small wording changes can significantly alter output quality and format.
Use lower temperature for factual tasks — Set temperature to 0.1–0.3 for code generation, data extraction, or factual Q&A.
Use reasoning mode for complex problems — Enable thinking on supported models for math, logic, and multi-step reasoning tasks.
Compare models — Try the same prompt across different models to find the best fit for your use case. Featured models are a good starting point.
Watch your credits — The Dashboard shows real-time token consumption and cost breakdowns per model. Use the usage chart to track spending trends.

Documentation

Playground

Starting a conversation

Parameter controls

Common parameters

Reasoning mode

System prompts

Presets

Inferences

Conversation management

Keyboard shortcuts

Tips for better results

On this page

Documentation

Playground

Starting a conversation

Parameter controls

Common parameters

Reasoning mode

System prompts

Presets

Inferences

Conversation management

Keyboard shortcuts

Tips for better results

On this page