curl --request POST \
--url https://api.friendli.ai/dedicated/beta/endpoint \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"projectId": "<string>",
"name": "<string>",
"instanceOptionId": "<string>",
"advanced": {
"tokenizer_skip_special_tokens": true,
"tokenizer_add_special_tokens": true,
"max_batch_size": 123,
"max_token_count": 2560,
"enable_content_logging": true,
"max_input_length": 123
},
"hfModelRepo": "<string>"
}
'{
"status": "INITIALIZING",
"createdAt": "2025-01-01T00:00:00Z",
"updatedAt": "2025-01-01T00:00:00Z",
"phase": "DOWNLOADING_MODEL"
}Dedicated create endpoint
Create a Friendli Dedicated Endpoint deployment for a Hugging Face model via the API. Specify GPU type, replica count, and model configuration.
curl --request POST \
--url https://api.friendli.ai/dedicated/beta/endpoint \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"projectId": "<string>",
"name": "<string>",
"instanceOptionId": "<string>",
"advanced": {
"tokenizer_skip_special_tokens": true,
"tokenizer_add_special_tokens": true,
"max_batch_size": 123,
"max_token_count": 2560,
"enable_content_logging": true,
"max_input_length": 123
},
"hfModelRepo": "<string>"
}
'{
"status": "INITIALIZING",
"createdAt": "2025-01-01T00:00:00Z",
"updatedAt": "2025-01-01T00:00:00Z",
"phase": "DOWNLOADING_MODEL"
}Create a Dedicated Endpoint deployment for a Hugging Face model. To request successfully, it is mandatory to enter a Personal API Key (e.g. flp_XXX) value in the Bearer Token field. Refer to the authentication section on our introduction page to learn how to acquire this variable and visit here to generate your API Key.Documentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Headers
ID of team to run requests as (optional parameter).
Body
Dedicated endpoint create request.
The ID of the project that owns the endpoint.
The name of the endpoint.
The ID of the instance option.
Available options:
- 1x NVIDIA A100 80GB:
ShbPuOs4tfGb - 2x NVIDIA A100 80GB:
mrAHuYt7T40o - 4x NVIDIA A100 80GB:
JkNob0NMdoF3 - 8x NVIDIA A100 80GB:
sYH4kHmAcA5P - 1x NVIDIA H100:
TwD5AqnBSVN0 - 2x NVIDIA H100:
zfTutSiLn0Hq - 4x NVIDIA H100:
lfkRz5G48REc - 8x NVIDIA H100:
GUA4qYFmsYz8 - 1x NVIDIA H200:
LnK1wTaKc7WO - 2x NVIDIA H200:
Tu6GjBnfHPe4 - 4x NVIDIA H200:
OhTzYtZuomzI - 8x NVIDIA H200:
ahBzWtOuomsI - 1x NVIDIA B200:
8GiQTLKfJNOr - 2x NVIDIA B200:
brTZGIuYgVrs - 4x NVIDIA B200:
AFoZMFXZnAdD - 8x NVIDIA B200:
drbc6G9FxJWZ
The advanced configuration of the endpoint.
HF ID of the model.
Autoscaling policy.
Hide child attributes
Hide child attributes
Setting minReplica to 0 allows the endpoint to sleep when idle, reducing costs. The minimum value is 0.
x >= 0The maximum replicas that the endpoint can scale up to. The maximum value is 10.
x <= 10Determines how long the endpoint waits before scaling down after the last request.
HF commit hash of the model.
The comment for the initial version.
Response
Successfully created the endpoint.
Dedicated endpoint status.
The current status of the endpoint deployment.
UNKNOWN, INITIALIZING, RUNNING, UPDATING, SLEEPING, AWAKING, FAILED, STOPPING, TERMINATING, TERMINATED, READY When the endpoint was created.
ErrorCode type.
WORKLOAD_INIT_UNKNOWN_ERROR, WORKLOAD_INIT_SETTINGS_ERROR, WORKLOAD_INIT_GRPC_ERROR, WORKLOAD_INIT_MANIFEST_NOT_FOUND_ERROR, WORKLOAD_INIT_MANIFEST_TYPE_ERROR, WORKLOAD_INIT_DOWNLOAD_ERROR, WORKLOAD_INIT_INVALID_TOKEN_ERROR, WORKLOAD_INIT_CANNOT_ACCESS_REPO_ERROR, WORKLOAD_INIT_HF_WANDB_API_ERROR, WORKLOAD_INIT_INSUFFICIENT_DISK_ERROR, INFERENCE_ENGINE_UNKNOWN_ERROR, INFERENCE_ENGINE_INVALID_ARGUMENT_ERROR, INFERENCE_ENGINE_MEMORY_ERROR, INFERENCE_ENGINE_METERING_CLIENT_CONFIG_ERROR When the endpoint was last updated.
The current phase of the endpoint.
REQUESTING_VIRTUAL_MACHINE, DOWNLOADING_MODEL, ENGINE_INITIALIZING