InferenceEndpointsLLM¶

InferenceEndpoints LLM 实现，运行异步 API 客户端。

此 LLM 将在内部使用 huggingface_hub.AsyncInferenceClient。

属性¶

model_id: 用于 LLM 的模型 ID，可在 Hugging Face Hub 中找到，将用于解析无服务器推理端点 API 请求的基本 URL。默认为 None。
endpoint_name: 用于 LLM 的推理端点的名称。默认为 None。
endpoint_namespace: 用于 LLM 的推理端点的命名空间。默认为 None。
base_url: 用于推理端点 API 请求的基本 URL。
api_key: 用于验证对推理端点 API 的请求的 API 密钥。
tokenizer_id: 用于 LLM 的 tokenizer ID，可在 Hugging Face Hub 中找到。默认为 None，但建议定义一个以正确格式化提示。
model_display_name: 用于 LLM 的模型显示名称。默认为 None。
use_magpie_template: 用于启用/禁用应用 Magpie 预查询模板的标志。默认为 False。
magpie_pre_query_template: 要应用于提示或发送到 LLM 以生成指令或后续用户消息的预查询模板。有效值为 "llama3"、"qwen2" 或提供的另一个预查询模板。默认为 None。
structured_output: 包含结构化输出配置的字典，或者如果需要更精细的控制，则为 OutlinesStructuredOutput 的实例。默认为 None。

示例¶

免费的无服务器推理 API，设置使用此 API 的 Task 的 input_batch_size 以避免模型过载¶

from distilabel.models.llms.huggingface import InferenceEndpointsLLM

llm = InferenceEndpointsLLM(
    model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
)

llm.load()

output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Hello world!"}]])

专用推理端点¶

from distilabel.models.llms.huggingface import InferenceEndpointsLLM

llm = InferenceEndpointsLLM(
    endpoint_name="<ENDPOINT_NAME>",
    api_key="<HF_API_KEY>",
    endpoint_namespace="<USER|ORG>",
)

llm.load()

output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Hello world!"}]])

专用推理端点或 TGI¶

from distilabel.models.llms.huggingface import InferenceEndpointsLLM

llm = InferenceEndpointsLLM(
    api_key="<HF_API_KEY>",
    base_url="<BASE_URL>",
)

llm.load()

output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Hello world!"}]])

生成结构化数据¶

from pydantic import BaseModel
from distilabel.models.llms import InferenceEndpointsLLM

class User(BaseModel):
    name: str
    last_name: str
    id: int

llm = InferenceEndpointsLLM(
    model_id="meta-llama/Meta-Llama-3-70B-Instruct",
    tokenizer_id="meta-llama/Meta-Llama-3-70B-Instruct",
    api_key="api.key",
    structured_output={"format": "json", "schema": User.model_json_schema()}
)

llm.load()

output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Create a user profile for the Tour De France"}]])