跳到内容

EvolInstruct

使用 LLM 进化指令。

WizardLM: Empowering Large Language Models to Follow Complex Instructions

属性

  • num_evolutions: 要执行的进化次数。

  • store_evolutions: 是否存储所有进化过程,还是仅存储最后一个。默认为 False

  • generate_answers: 是否为进化的指令生成答案。默认为 False

  • include_original_instruction: 是否在 evolved_instructions 输出列中包含原始指令。默认为 False

  • mutation_templates: 用于进化指令的突变模板。默认为 utils.py 文件中提供的模板。

  • seed: 为 numpy 设置的种子,以便随机选择突变方法。默认为 42

运行时参数

  • seed: 为 numpy 设置的种子,以便随机选择突变方法。

输入 & 输出列

graph TD
    subgraph Dataset
        subgraph Columns
            ICOL0[instruction]
        end
        subgraph New columns
            OCOL0[evolved_instruction]
            OCOL1[evolved_instructions]
            OCOL2[model_name]
            OCOL3[answer]
            OCOL4[answers]
        end
    end

    subgraph EvolInstruct
        StepInput[Input Columns: instruction]
        StepOutput[Output Columns: evolved_instruction, evolved_instructions, model_name, answer, answers]
    end

    ICOL0 --> StepInput
    StepOutput --> OCOL0
    StepOutput --> OCOL1
    StepOutput --> OCOL2
    StepOutput --> OCOL3
    StepOutput --> OCOL4
    StepInput --> StepOutput

输入

  • instruction (str): 要进化的指令。

输出

  • evolved_instruction (str): 如果 store_evolutions=False,则为进化的指令。

  • evolved_instructions (List[str]): 如果 store_evolutions=True,则为进化的指令列表。

  • model_name (str): 用于进化指令的 LLM 的名称。

  • answer (str): 如果 generate_answers=Truestore_evolutions=False,则为进化指令的答案。

  • answers (List[str]): 如果 generate_answers=Truestore_evolutions=True,则为进化指令的答案列表。

示例

使用 LLM 进化指令

from distilabel.steps.tasks import EvolInstruct
from distilabel.models import InferenceEndpointsLLM

# Consider this as a placeholder for your actual LLM.
evol_instruct = EvolInstruct(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    ),
    num_evolutions=2,
)

evol_instruct.load()

result = next(evol_instruct.process([{"instruction": "common instruction"}]))
# result
# [{'instruction': 'common instruction', 'evolved_instruction': 'evolved instruction', 'model_name': 'model_name'}]

保留进化的迭代

from distilabel.steps.tasks import EvolInstruct
from distilabel.models import InferenceEndpointsLLM

# Consider this as a placeholder for your actual LLM.
evol_instruct = EvolInstruct(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    ),
    num_evolutions=2,
    store_evolutions=True,
)

evol_instruct.load()

result = next(evol_instruct.process([{"instruction": "common instruction"}]))
# result
# [
#     {
#         'instruction': 'common instruction',
#         'evolved_instructions': ['initial evolution', 'final evolution'],
#         'model_name': 'model_name'
#     }
# ]

在单个步骤中为指令生成答案

from distilabel.steps.tasks import EvolInstruct
from distilabel.models import InferenceEndpointsLLM

# Consider this as a placeholder for your actual LLM.
evol_instruct = EvolInstruct(
    llm=InferenceEndpointsLLM(
        model_id="mistralai/Mistral-7B-Instruct-v0.2",
    ),
    num_evolutions=2,
    generate_answers=True,
)

evol_instruct.load()

result = next(evol_instruct.process([{"instruction": "common instruction"}]))
# result
# [
#     {
#         'instruction': 'common instruction',
#         'evolved_instruction': 'evolved instruction',
#         'answer': 'answer to the instruction',
#         'model_name': 'model_name'
#     }
# ]

参考文献