ClientvLLM¶

用于 vLLM 服务器的客户端，实现了 OpenAI API 规范。

属性¶

base_url: vLLM 服务器的基本 URL。默认为 "http://localhost:8000"。
max_retries: 在失败之前重试 API 请求的最大次数。默认为 6。
timeout: 等待 API 响应的最长时间（秒）。默认为 120。
httpx_client_kwargs: 将传递给创建用于与 vLLM 服务器通信的 httpx.AsyncClient 的额外 kwargs。默认为 None。
tokenizer: 将用于应用聊天模板并在发送到服务器之前对输入进行分词的 tokenizer 的 Hugging Face Hub repo id 或路径。默认为 None。
tokenizer_revision: 要加载的 tokenizer 的修订版本。默认为 None。
_aclient: 用于与 vLLM 服务器通信的 httpx.AsyncClient。默认为 None。

运行时参数¶

base_url: vLLM 服务器的基本 url。默认为 "http://localhost:8000"。
max_retries: 在失败之前重试 API 请求的最大次数。默认为 6。
timeout: 等待 API 响应的最长时间（秒）。默认为 120。
httpx_client_kwargs: 将传递给创建用于与 vLLM 服务器通信的 httpx.AsyncClient 的额外 kwargs。默认为 None。

示例¶

生成文本¶

from distilabel.models.llms import ClientvLLM

llm = ClientvLLM(
    base_url="http://localhost:8000/v1",
    tokenizer="meta-llama/Meta-Llama-3.1-8B-Instruct"
)

llm.load()

results = llm.generate_outputs(
    inputs=[[{"role": "user", "content": "Hello, how are you?"}]],
    temperature=0.7,
    top_p=1.0,
    max_new_tokens=256,
)
# [
#     [
#         "I'm functioning properly, thank you for asking. How can I assist you today?",
#         "I'm doing well, thank you for asking. I'm a large language model, so I don't have feelings or emotions like humans do, but I'm here to help answer any questions or provide information you might need. How can I assist you today?",
#         "I'm just a computer program, so I don't have feelings like humans do, but I'm functioning properly and ready to help you with any questions or tasks you have. What's on your mind?"
#     ]
# ]