Dedicated create endpoint

Authorizations

Authorization

string

header

required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team

string | null

ID of team to run requests as (optional parameter).

Body

application/json

Dedicated endpoint create request.

projectId

string

required

The ID of the project that owns the endpoint.

name

string

required

The name of the endpoint.

instanceOptionId

string

required

The ID of the instance option.

Available options:

1x NVIDIA A100 80GB: ShbPuOs4tfGb
2x NVIDIA A100 80GB: mrAHuYt7T40o
4x NVIDIA A100 80GB: JkNob0NMdoF3
8x NVIDIA A100 80GB: sYH4kHmAcA5P
1x NVIDIA H100: TwD5AqnBSVN0
2x NVIDIA H100: zfTutSiLn0Hq
4x NVIDIA H100: lfkRz5G48REc
8x NVIDIA H100: GUA4qYFmsYz8
1x NVIDIA H200: LnK1wTaKc7WO
2x NVIDIA H200: Tu6GjBnfHPe4
4x NVIDIA H200: OhTzYtZuomzI
8x NVIDIA H200: ahBzWtOuomsI
1x NVIDIA B200: 8GiQTLKfJNOr
2x NVIDIA B200: brTZGIuYgVrs
4x NVIDIA B200: AFoZMFXZnAdD
8x NVIDIA B200: drbc6G9FxJWZ

advanced

Advanced · object

required

The advanced configuration of the endpoint.

Hide child attributes

advanced.tokenizer_skip_special_tokens

boolean

required

advanced.tokenizer_add_special_tokens

boolean

required

advanced.max_batch_size

integer | null

advanced.max_token_count

integer

default:2560

advanced.enable_content_logging

boolean | null

advanced.max_input_length

integer | null

hfModelRepo

string

required

HF ID of the model.

simplescale

EndpointSimplescaleConfig · object

Simple scaling options.

Hide child attributes

simplescale.replicas

integer

required

Required range: x >= 1

autoscalingPolicy

AutoscalingPolicy · object

Autoscaling policy.

Hide child attributes

autoscalingPolicy.minReplica

integer

default:0

Setting minReplica to 0 allows the endpoint to sleep when idle, reducing costs. The minimum value is 0.

Required range: x >= 0

autoscalingPolicy.maxReplica

integer

default:1

The maximum replicas that the endpoint can scale up to. The maximum value is 10.

Required range: x <= 10

autoscalingPolicy.cooldownPeriod

integer

default:300

Determines how long the endpoint waits before scaling down after the last request.

hfModelRepoRevision

string | null

HF commit hash of the model.

initialVersionComment

string | null

The comment for the initial version.

Response

Successfully created the endpoint.

Dedicated endpoint status.

status

enum<string>

required

The current status of the endpoint deployment.

Available options:

UNKNOWN,

INITIALIZING,

RUNNING,

UPDATING,

SLEEPING,

AWAKING,

FAILED,

STOPPING,

TERMINATING,

TERMINATED,

READY

createdAt

string<date-time>

required

When the endpoint was created.

errorCode

enum<string> | null

ErrorCode type.

Available options:

WORKLOAD_INIT_UNKNOWN_ERROR,

WORKLOAD_INIT_SETTINGS_ERROR,

WORKLOAD_INIT_GRPC_ERROR,

WORKLOAD_INIT_MANIFEST_NOT_FOUND_ERROR,

WORKLOAD_INIT_MANIFEST_TYPE_ERROR,

WORKLOAD_INIT_DOWNLOAD_ERROR,

WORKLOAD_INIT_INVALID_TOKEN_ERROR,

WORKLOAD_INIT_CANNOT_ACCESS_REPO_ERROR,

WORKLOAD_INIT_HF_WANDB_API_ERROR,

WORKLOAD_INIT_INSUFFICIENT_DISK_ERROR,

INFERENCE_ENGINE_UNKNOWN_ERROR,

INFERENCE_ENGINE_INVALID_ARGUMENT_ERROR,

INFERENCE_ENGINE_MEMORY_ERROR,

INFERENCE_ENGINE_METERING_CLIENT_CONFIG_ERROR

updatedAt

string<date-time> | null

When the endpoint was last updated.

phase

enum<string> | null

The current phase of the endpoint.

Available options:

REQUESTING_VIRTUAL_MACHINE,

DOWNLOADING_MODEL,

ENGINE_INITIALIZING

Documentation Index

Authorizations

Headers

Body

Response