vLLM¶

vLLM 库 LLM 实现。

属性¶

model: 模型 Hugging Face Hub repo id 或包含模型权重和配置文件目录的路径。
dtype: 用于模型的数据类型。默认为 auto。
trust_remote_code: 加载模型时是否信任远程代码。默认为 False。
quantization: 用于模型的量化模式。默认为 None。
revision: 要加载的模型修订版本。默认为 None。
tokenizer: tokenizer Hugging Face Hub repo id 或包含 tokenizer 文件目录的路径。如果未提供，tokenizer 将从模型目录加载。默认为 None。
tokenizer_mode: 用于 tokenizer 的模式。默认为 auto。
tokenizer_revision: 要加载的 tokenizer 修订版本。默认为 None。
skip_tokenizer_init: 是否跳过 tokenizer 的初始化。默认为 False。
chat_template: 一个 chat 模板，用于在将 prompt 发送到模型之前构建 prompt。如果未提供，将使用 tokenizer 配置中定义的 chat 模板。如果未提供且 tokenizer 没有 chat 模板，则将使用 ChatML 模板。默认为 None。
structured_output: 一个字典，包含结构化输出配置；如果需要更细粒度的控制，则为 OutlinesStructuredOutput 的实例。默认为 None。
seed: 用于随机数生成器的种子。默认为 0。
extra_kwargs: 将传递给 vllm 库的 LLM 类的关键字参数的附加字典。默认为 {}。
_model: vLLM 模型实例。此属性旨在内部使用，不应直接访问。它将在 load 方法中设置。
_tokenizer: 用于在将 prompt 传递给 LLM 之前格式化 prompt 的 tokenizer 实例。此属性旨在内部使用，不应直接访问。它将在 load 方法中设置。
use_magpie_template: 用于启用/禁用应用 Magpie 预查询模板的标志。默认为 False。
magpie_pre_query_template: 要应用于 prompt 或发送到 LLM 以生成指令或后续用户消息的预查询模板。有效值为 "llama3"、"qwen2" 或提供的另一个预查询模板。默认为 None。

运行时参数¶

extra_kwargs: 将传递给 vllm 库的 LLM 类的关键字参数的附加字典。

示例¶

生成文本¶

from distilabel.models.llms import vLLM

# You can pass a custom chat_template to the model
llm = vLLM(
    model="prometheus-eval/prometheus-7b-v2.0",
    chat_template="[INST] {{ messages[0]"content" }}\n{{ messages[1]"content" }}[/INST]",
)

llm.load()

# Call the model
output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Hello world!"}]])

生成结构化数据¶

from pathlib import Path
from distilabel.models.llms import vLLM

class User(BaseModel):
    name: str
    last_name: str
    id: int

llm = vLLM(
    model="prometheus-eval/prometheus-7b-v2.0"
    structured_output={"format": "json", "schema": Character},
)

llm.load()

# Call the model
output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Create a user profile for the following marathon"}]])