QualityScorer¶
使用 LLM
根据响应的质量对其进行评分。
QualityScorer
是一个预定义的任务,它将 instruction
定义为输入,将 score
定义为输出。此任务用于评估指令和响应的质量。它是论文“什么使对齐的数据更好?指令调优中自动数据选择的综合研究”中质量评分任务的实现。该任务遵循与复杂度评分器相同的方案,但根据质量对指令-响应对进行评分,从而为每个指令获得质量分数。
属性¶
- _template: 用于格式化 LLM 输入的 Jinja2 模板。
输入和输出列¶
graph TD
subgraph Dataset
subgraph Columns
ICOL0[instruction]
ICOL1[responses]
end
subgraph New columns
OCOL0[scores]
OCOL1[model_name]
end
end
subgraph QualityScorer
StepInput[Input Columns: instruction, responses]
StepOutput[Output Columns: scores, model_name]
end
ICOL0 --> StepInput
ICOL1 --> StepInput
StepOutput --> OCOL0
StepOutput --> OCOL1
StepInput --> StepOutput
输入¶
-
instruction (
str
): 用于生成responses
的指令。 -
responses (
List[str]
): 要评分的响应。每个响应与指令形成一对。
输出¶
-
scores (
List[float]
): 每个指令的分数。 -
model_name (
str
): 用于生成分数的模型名称。
示例¶
评估您的指令的质量¶
from distilabel.steps.tasks import QualityScorer
from distilabel.models import InferenceEndpointsLLM
# Consider this as a placeholder for your actual LLM.
scorer = QualityScorer(
llm=InferenceEndpointsLLM(
model_id="mistralai/Mistral-7B-Instruct-v0.2",
)
)
scorer.load()
result = next(
scorer.process(
[
{
"instruction": "instruction",
"responses": ["good response", "weird response", "bad response"]
}
]
)
)
# result
[
{
'instructions': 'instruction',
'model_name': 'test',
'scores': [5, 3, 1],
}
]
使用默认模式生成结构化输出¶
from distilabel.steps.tasks import QualityScorer
from distilabel.models import InferenceEndpointsLLM
scorer = QualityScorer(
llm=InferenceEndpointsLLM(
model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
),
use_default_structured_output=True
)
scorer.load()
result = next(
scorer.process(
[
{
"instruction": "instruction",
"responses": ["good response", "weird response", "bad response"]
}
]
)
)
# result
[{'instruction': 'instruction',
'responses': ['good response', 'weird response', 'bad response'],
'scores': [1, 2, 3],
'distilabel_metadata': {'raw_output_quality_scorer_0': '{ "scores": [1, 2, 3] }'},
'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct'}]