DeepSeek V4 - OpenAI-Compatible API
- Call the DeepSeek V4 model using the OpenAI Chat Completions protocol
- Supports two models:
deepseek-v4-flash(fast general-purpose) anddeepseek-v4-pro(deep reasoning) - Plain text conversation: Single- or multi-turn contextual dialogue with 1M ultra-long context
- System prompts: Customize the AI's role and behavior
- Thinking mode: Control deep reasoning via
thinking.type;deepseek-v4-proreturns thinking content throughreasoning_content - Streaming output: SSE streaming returns are supported
- Tool calling: Supports Function Calling (up to 128 tools)
- JSON mode: Enabled via
response_format - Context caching: Requests with identical prefixes automatically hit the cache, substantially lowering input cost
Authorization
##All APIs require Bearer Token authentication## **Get API Key**: Visit the [API Key Management Page](https://starmagic.ai/app/api-keys) to obtain your API Key **Add to request header**: ``` Authorization: Bearer YOUR_API_KEY ```
Authorization: Bearer YOUR_API_KEYRequest body
application/jsonChat model name - `deepseek-v4-flash`: Fast general-purpose model, 1M context - `deepseek-v4-pro`: Deep reasoning model, excels at math, programming, and complex logic **Tip**: Both models **have `thinking` enabled by default**, and responses include `reasoning_content`. Set `thinking.type="disabled"` to turn it off and reduce output token cost. Both models share identical parameters.
"deepseek-v4-flash"List of conversation messages, supports multi-turn dialogue Messages with different roles have different field structures; select the corresponding role to view details
[
null
]Thinking mode control (new in V4) **Notes**: - Controls the deep thinking (Chain of Thought) feature - **Enabled by default on both models** (`type=enabled`) - When enabled, the reasoning process is returned through `choices[].message.reasoning_content` and billed as output tokens ⚠️ **Multi-turn / tool-calling caveat**: If the current response includes `reasoning_content`, **the corresponding assistant message in the `messages` history of the next request must echo that field verbatim**, otherwise the API returns 400 `The reasoning_content in the thinking mode must be passed back to the API`. If you would rather not handle it, set `thinking.type="disabled"` explicitly for the whole session.
{
"type": "enabled",
"reasoning_effort": "medium"
}Sampling temperature, controls randomness of output **Notes**: - Lower values (e.g., 0.2): More deterministic, more focused output - Higher values (e.g., 1.5): More random, more creative output - Default: 1
1Nucleus sampling parameter **Notes**: - Controls sampling from tokens with cumulative probability - For example, 0.9 means sampling from tokens whose cumulative probability reaches 90% - Default: 1.0 (considers all tokens) **Suggestion**: Do not adjust temperature and top_p simultaneously
1Limits the maximum number of tokens generated **Notes**: - The V4 series can reach up to **384,000 tokens** - When thinking is enabled, reasoning_tokens also count toward the max_tokens limit - If not set, the model decides the generation length on its own
4096Frequency penalty, used to reduce repetitive content **Notes**: - Positive values penalize tokens based on their frequency in the already-generated text - The higher the value, the less likely repetition becomes - Default: 0 (no penalty)
0Presence penalty, used to encourage new topics **Notes**: - Positive values penalize tokens based on whether they have already appeared in the text - The higher the value, the more the model tends to discuss new topics - Default: 0 (no penalty)
0Specifies the response format **Notes**: - Set to `{"type": "json_object"}` to enable JSON mode - In JSON mode the model outputs valid JSON content - For best results, explicitly ask for JSON output in your system or user message
{
"type": "text"
}Stop sequences; generation stops when the model encounters any of these strings **Notes**: - Can be a single string or an array of strings - Up to 16 stop sequences are supported
Whether to stream the response - `true`: Stream response; returns content chunk by chunk in real time via SSE (Server-Sent Events) - `false`: Wait for the full response and return it at once (default)
falseStreaming response options Only effective when `stream=true`
{
"include_usage": true
}List of tool definitions for Function Calling **Notes**: - Up to 128 tool definitions are supported - Each tool must define a name, description, and parameter schema
[
{
"type": "function",
"function": {
"name": "string",
"description": "string",
"parameters": {},
"strict": false
}
}
]Controls tool-calling behavior **Options**: - `none`: Do not call any tool - `auto`: Let the model decide whether to call a tool (default when tools are provided) - `required`: Force the model to call one or more tools - Object form `{"type":"function","function":{"name":"xxx"}}`: Call the specified tool **Default**: `none` when no tools are provided, `auto` when tools are provided
Whether to return token log probabilities **Notes**: - When set to `true`, the response includes log probability information for each token
falseReturn log probabilities of the top N tokens **Notes**: - Requires `logprobs` to be `true` - Range: `[0, 20]`
0Token bias map **Notes**: - Keys are token IDs in the tokenizer; values are bias values between -100 and 100 - -100 completely bans the token, 100 forces it to be generated - Typical values in the range -1 to 1 already produce observable effects
{}Number of chat completion choices to generate for each input message **Notes**: - Default 1; if set to N, N candidates are returned (billed as N × output_tokens)
1Random seed (Beta) **Notes**: - When specified, the model attempts deterministic sampling - Same seed + same other parameters → same output (not guaranteed 100%)
0Unique identifier representing the end user **Notes**: - Helps the platform monitor and detect abuse - A hashed user ID is recommended
"string"Response
application/jsonResponse body
Unique identifier for the chat completion
"53c548dc-ec02-4a2f-bbb6-eca4184630b8"Model name actually used
"deepseek-v4-flash"Response type
"chat.completion"Creation timestamp (Unix seconds)
1777021417List of completion choices
[
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I am DeepSeek V4. I excel at general conversation, code generation, mathematical reasoning, and many other tasks.",
"reasoning_content": "Let me analyze this question...",
"tool_calls": [
{
"id": null,
"type": null,
"function": null
}
]
},
"logprobs": null,
"finish_reason": "stop"
}
]Token usage statistics (including cache and reasoning breakdowns)
{
"prompt_tokens": 694,
"completion_tokens": 20,
"total_tokens": 714,
"prompt_cache_hit_tokens": 640,
"prompt_cache_miss_tokens": 54,
"prompt_tokens_details": {
"cached_tokens": 640
},
"completion_tokens_details": {
"reasoning_tokens": 10
}
}System fingerprint identifier
"fp_evolink_v4_20260402"
