QualityScorer¶

使用 LLM 根据响应的质量对其进行评分。

QualityScorer 是一个预定义的任务，它将 instruction 定义为输入，将 score 定义为输出。此任务用于评估指令和响应的质量。它是论文“什么使对齐的数据更好？指令调优中自动数据选择的综合研究”中质量评分任务的实现。该任务遵循与复杂度评分器相同的方案，但根据质量对指令-响应对进行评分，从而为每个指令获得质量分数。

属性¶

_template: 用于格式化 LLM 输入的 Jinja2 模板。

输入和输出列¶

graph TD
    subgraph Dataset
        subgraph Columns
            ICOL0[instruction]
            ICOL1[responses]
        end
        subgraph New columns
            OCOL0[scores]
            OCOL1[model_name]
        end
    end

    subgraph QualityScorer
        StepInput[Input Columns: instruction, responses]
        StepOutput[Output Columns: scores, model_name]
    end

    ICOL0 --> StepInput
    ICOL1 --> StepInput
    StepOutput --> OCOL0
    StepOutput --> OCOL1
    StepInput --> StepOutput

输入¶

instruction (str): 用于生成 responses 的指令。
responses (List[str]): 要评分的响应。每个响应与指令形成一对。

输出¶

scores (List[float]): 每个指令的分数。
model_name (str): 用于生成分数的模型名称。

示例¶

评估您的指令的质量¶

from distilabel.steps.tasks import QualityScorer
from distilabel.models import InferenceEndpointsLLM

# Consider this as a placeholder for your actual LLM.
scorer = QualityScorer(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    )
)

scorer.load()

result = next(
    scorer.process(
        [
            {
                "instruction": "instruction",
                "responses": ["good response", "weird response", "bad response"]
            }
        ]
    )
)
# result
[
    {
        'instructions': 'instruction',
        'model_name': 'test',
        'scores': [5, 3, 1],
    }
]

使用默认模式生成结构化输出¶

from distilabel.steps.tasks import QualityScorer
from distilabel.models import InferenceEndpointsLLM

scorer = QualityScorer(
    llm=InferenceEndpointsLLM(
        model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
    ),
    use_default_structured_output=True
)

scorer.load()

result = next(
    scorer.process(
        [
            {
                "instruction": "instruction",
                "responses": ["good response", "weird response", "bad response"]
            }
        ]
    )
)

# result
[{'instruction': 'instruction',
'responses': ['good response', 'weird response', 'bad response'],
'scores': [1, 2, 3],
'distilabel_metadata': {'raw_output_quality_scorer_0': '{  "scores": [1, 2, 3] }'},
'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct'}]

参考文献¶

什么使对齐的数据更好？指令调优中自动数据选择的综合研究