TextDeepSeekDeepseek.V4

DeepSeek V4 - OpenAI-Compatible API

Call the DeepSeek V4 model using the OpenAI Chat Completions protocol

Supports two models: deepseek-v4-flash (fast general-purpose) and deepseek-v4-pro (deep reasoning)
Plain text conversation: Single- or multi-turn contextual dialogue with 1M ultra-long context
System prompts: Customize the AI's role and behavior
Thinking mode: Control deep reasoning via thinking.type; deepseek-v4-pro returns thinking content through reasoning_content
Streaming output: SSE streaming returns are supported
Tool calling: Supports Function Calling (up to 128 tools)
JSON mode: Enabled via response_format
Context caching: Requests with identical prefixes automatically hit the cache, substantially lowering input cost

<Note> **BaseURL**: The default BaseURL is `https://api.starmagic.ai`, which has better support for text models and long-lived connections. `https://api.starmagic.ai` is the primary endpoint for multimodal services and serves as a fallback address for text models. </Note>

Authorization

AuthorizationstringheaderRequired

##All APIs require Bearer Token authentication## **Get API Key**: Visit the [API Key Management Page](https://starmagic.ai/app/api-keys) to obtain your API Key **Add to request header**: ``` Authorization: Bearer YOUR_API_KEY ```

Authorization: Bearer YOUR_API_KEY

Request body

application/json

modelenum<deepseek-v4-flash | deepseek-v4-pro>Required

Chat model name - `deepseek-v4-flash`: Fast general-purpose model, 1M context - `deepseek-v4-pro`: Deep reasoning model, excels at math, programming, and complex logic **Tip**: Both models **have `thinking` enabled by default**, and responses include `reasoning_content`. Set `thinking.type="disabled"` to turn it off and reduce output token cost. Both models share identical parameters.

"deepseek-v4-flash"

messagesobject[]Required

List of conversation messages, supports multi-turn dialogue Messages with different roles have different field structures; select the corresponding role to view details

[
  null
]

thinkingobject

Thinking mode control (new in V4) **Notes**: - Controls the deep thinking (Chain of Thought) feature - **Enabled by default on both models** (`type=enabled`) - When enabled, the reasoning process is returned through `choices[].message.reasoning_content` and billed as output tokens ⚠️ **Multi-turn / tool-calling caveat**: If the current response includes `reasoning_content`, **the corresponding assistant message in the `messages` history of the next request must echo that field verbatim**, otherwise the API returns 400 `The reasoning_content in the thinking mode must be passed back to the API`. If you would rather not handle it, set `thinking.type="disabled"` explicitly for the whole session.

{
  "type": "enabled",
  "reasoning_effort": "medium"
}

temperaturenumber

Sampling temperature, controls randomness of output **Notes**: - Lower values (e.g., 0.2): More deterministic, more focused output - Higher values (e.g., 1.5): More random, more creative output - Default: 1

top_pnumber

Nucleus sampling parameter **Notes**: - Controls sampling from tokens with cumulative probability - For example, 0.9 means sampling from tokens whose cumulative probability reaches 90% - Default: 1.0 (considers all tokens) **Suggestion**: Do not adjust temperature and top_p simultaneously

max_tokensinteger

Limits the maximum number of tokens generated **Notes**: - The V4 series can reach up to **384,000 tokens** - When thinking is enabled, reasoning_tokens also count toward the max_tokens limit - If not set, the model decides the generation length on its own

frequency_penaltynumber

Frequency penalty, used to reduce repetitive content **Notes**: - Positive values penalize tokens based on their frequency in the already-generated text - The higher the value, the less likely repetition becomes - Default: 0 (no penalty)

presence_penaltynumber

Presence penalty, used to encourage new topics **Notes**: - Positive values penalize tokens based on whether they have already appeared in the text - The higher the value, the more the model tends to discuss new topics - Default: 0 (no penalty)

response_formatobject

Specifies the response format **Notes**: - Set to `{"type": "json_object"}` to enable JSON mode - In JSON mode the model outputs valid JSON content - For best results, explicitly ask for JSON output in your system or user message

{
  "type": "text"
}

stopobject

Stop sequences; generation stops when the model encounters any of these strings **Notes**: - Can be a single string or an array of strings - Up to 16 stop sequences are supported

streamboolean

Whether to stream the response - `true`: Stream response; returns content chunk by chunk in real time via SSE (Server-Sent Events) - `false`: Wait for the full response and return it at once (default)

false

stream_optionsobject

Streaming response options Only effective when `stream=true`

{
  "include_usage": true
}

toolsobject[]

List of tool definitions for Function Calling **Notes**: - Up to 128 tool definitions are supported - Each tool must define a name, description, and parameter schema

[
  {
    "type": "function",
    "function": {
      "name": "string",
      "description": "string",
      "parameters": {},
      "strict": false
    }
  }
]

tool_choiceobject

Controls tool-calling behavior **Options**: - `none`: Do not call any tool - `auto`: Let the model decide whether to call a tool (default when tools are provided) - `required`: Force the model to call one or more tools - Object form `{"type":"function","function":{"name":"xxx"}}`: Call the specified tool **Default**: `none` when no tools are provided, `auto` when tools are provided

logprobsboolean

Whether to return token log probabilities **Notes**: - When set to `true`, the response includes log probability information for each token

false

top_logprobsinteger

Return log probabilities of the top N tokens **Notes**: - Requires `logprobs` to be `true` - Range: `[0, 20]`

logit_biasobject

Token bias map **Notes**: - Keys are token IDs in the tokenizer; values are bias values between -100 and 100 - -100 completely bans the token, 100 forces it to be generated - Typical values in the range -1 to 1 already produce observable effects

{}

ninteger

Number of chat completion choices to generate for each input message **Notes**: - Default 1; if set to N, N candidates are returned (billed as N × output_tokens)

seedinteger

Random seed (Beta) **Notes**: - When specified, the model attempts deterministic sampling - Same seed + same other parameters → same output (not guaranteed 100%)

userstring

Unique identifier representing the end user **Notes**: - Helps the platform monitor and detect abuse - A hashed user ID is recommended

"string"

Response

application/json

成功

Response body

idstring

Unique identifier for the chat completion

"53c548dc-ec02-4a2f-bbb6-eca4184630b8"

modelstring

Model name actually used

"deepseek-v4-flash"

objectenum<chat.completion>

Response type

"chat.completion"

createdinteger

Creation timestamp (Unix seconds)

1777021417

choicesobject[]

List of completion choices

[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! I am DeepSeek V4. I excel at general conversation, code generation, mathematical reasoning, and many other tasks.",
      "reasoning_content": "Let me analyze this question...",
      "tool_calls": [
        {
          "id": null,
          "type": null,
          "function": null
        }
      ]
    },
    "logprobs": null,
    "finish_reason": "stop"
  }
]

usageobject

Token usage statistics (including cache and reasoning breakdowns)

{
  "prompt_tokens": 694,
  "completion_tokens": 20,
  "total_tokens": 714,
  "prompt_cache_hit_tokens": 640,
  "prompt_cache_miss_tokens": 54,
  "prompt_tokens_details": {
    "cached_tokens": 640
  },
  "completion_tokens_details": {
    "reasoning_tokens": 10
  }
}

system_fingerprintstring

System fingerprint identifier

"fp_evolink_v4_20260402"

POST/v1/chat/completions

curl --request POST \
  --url https://api.starmagic.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "deepseek-v4-flash",
  "messages": [
    {
      "role": "user",
      "content": "Please introduce yourself"
    }
  ]
}'

Response: 成功

{
  "id": "837f529d-00f9-4731-b2e1-4a54fc31790a",
  "object": "chat.completion",
  "created": 1777026806,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I am the DeepSeek assistant, always ready to answer your questions and help you out."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 31,
    "total_tokens": 38,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "prompt_cache_hit_tokens": 0,
    "prompt_cache_miss_tokens": 7
  },
  "system_fingerprint": "fp_evolink_v4_20260402"
}

DeepSeek V4 - OpenAI-Compatible API

Call the DeepSeek V4 model using the OpenAI Chat Completions protocol

Supports two models: deepseek-v4-flash (fast general-purpose) and deepseek-v4-pro (deep reasoning)

Plain text conversation: Single- or multi-turn contextual dialogue with 1M ultra-long context

System prompts: Customize the AI's role and behavior

Thinking mode: Control deep reasoning via thinking.type; deepseek-v4-pro returns thinking content through reasoning_content

Streaming output: SSE streaming returns are supported

Tool calling: Supports Function Calling (up to 128 tools)

JSON mode: Enabled via response_format

Context caching: Requests with identical prefixes automatically hit the cache, substantially lowering input cost

[ { "index": 0, "message": { "role": "assistant", "content": "Hello! I am DeepSeek V4. I excel at general conversation, code generation, mathematical reasoning, and many other tasks.", "reasoning_content": "Let me analyze this question...", "tool_calls": [ { "id": null, "type": null, "function": null } ] }, "logprobs": null, "finish_reason": "stop" } ]

{ "prompt_tokens": 694, "completion_tokens": 20, "total_tokens": 714, "prompt_cache_hit_tokens": 640, "prompt_cache_miss_tokens": 54, "prompt_tokens_details": { "cached_tokens": 640 }, "completion_tokens_details": { "reasoning_tokens": 10 } }

curl --request POST \ --url https://api.starmagic.ai/v1/chat/completions \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{ "model": "deepseek-v4-flash", "messages": [ { "role": "user", "content": "Please introduce yourself" } ] }'

{ "id": "837f529d-00f9-4731-b2e1-4a54fc31790a", "object": "chat.completion", "created": 1777026806, "model": "deepseek-v4-flash", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! I am the DeepSeek assistant, always ready to answer your questions and help you out." }, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 7, "completion_tokens": 31, "total_tokens": 38, "prompt_tokens_details": { "cached_tokens": 0 }, "prompt_cache_hit_tokens": 0, "prompt_cache_miss_tokens": 7 }, "system_fingerprint": "fp_evolink_v4_20260402" }