LlamaCppLLM¶

llama.cpp LLM 实现，运行 C++ 代码的 Python 绑定。

属性¶

model_path: 包含 GGUF 量化模型的路径，与已安装的 llama.cpp Python 绑定版本兼容。
n_gpu_layers: 用于 GPU 的层数。默认为 -1，表示将使用可用的 GPU 设备。
chat_format: 用于模型的聊天格式。默认为 None，表示将使用 Llama 格式。
n_ctx: 用于模型的上下文大小。默认为 512。
n_batch: 用于模型的提示处理最大批次大小。默认为 512。
seed: 用于生成的随机种子。默认为 4294967295。
verbose: 是否打印详细输出。默认为 False。
structured_output: 一个字典，包含结构化输出配置；如果需要更细粒度的控制，则为 OutlinesStructuredOutput 的实例。默认为 None。
extra_kwargs: 将传递给 llama_cpp 库的 Llama 类的附加关键字参数字典。默认为 {}。
tokenizer_id: tokenizer Hugging Face Hub 仓库 ID 或包含 tokenizer 配置文件目录的路径。如果未提供，将使用与 model 关联的 ID。默认为 None。
use_magpie_template: 用于启用/禁用应用 Magpie 预查询模板的标志。默认为 False。
magpie_pre_query_template: 应用于提示或发送到 LLM 以生成指令或后续用户消息的预查询模板。有效值为 "llama3"、"qwen2" 或提供的另一个预查询模板。默认为 None。
_model: Llama 模型实例。此属性旨在内部使用，不应直接访问。它将在 load 方法中设置。

运行时参数¶

model_path: GGUF 量化模型的路径。
n_gpu_layers: 用于 GPU 的层数。默认为 -1。
chat_format: 用于模型的聊天格式。默认为 None。
verbose: 是否打印详细输出。默认为 False。
extra_kwargs: 将传递给 llama_cpp 库的 Llama 类的附加关键字参数字典。默认为 {}。

示例¶

生成文本¶

from pathlib import Path
from distilabel.models.llms import LlamaCppLLM

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/openhermes-2.5-mistral-7b.Q4_K_M.gguf https://hugging-face.cn/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/resolve/main/openhermes-2.5-mistral-7b.Q4_K_M.gguf

model_path = "Downloads/openhermes-2.5-mistral-7b.Q4_K_M.gguf"

llm = LlamaCppLLM(
    model_path=str(Path.home() / model_path),
    n_gpu_layers=-1,  # To use the GPU if available
    n_ctx=1024,       # Set the context size
)

llm.load()

# Call the model
output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Hello world!"}]])

生成结构化数据¶

from pathlib import Path
from distilabel.models.llms import LlamaCppLLM

model_path = "Downloads/openhermes-2.5-mistral-7b.Q4_K_M.gguf"

class User(BaseModel):
    name: str
    last_name: str
    id: int

llm = LlamaCppLLM(
    model_path=str(Path.home() / model_path),  # type: ignore
    n_gpu_layers=-1,
    n_ctx=1024,
    structured_output={"format": "json", "schema": Character},
)

llm.load()

# Call the model
output = llm.generate_outputs(inputs=[[{"role": "user", "content": "Create a user profile for the following marathon"}]])

LlamaCppLLM¶

属性¶

运行时参数¶

示例¶

生成文本¶

生成结构化数据¶

参考¶