TextGeneration¶
使用 LLM
根据提示生成文本。
TextGeneration
是一个预定义的任务,允许使用 Jinja2 语法传递自定义提示。默认情况下,输入中需要一个 instruction
,但使用 template
和 columns
属性可以定义自定义提示和文本中期望的列。此任务应该足以满足不需要对 LLM 生成的响应进行后处理的任务。
属性¶
-
system_prompt: 生成中使用的系统提示。如果未提供,则将检查输入行是否有名为
system_prompt
的列并使用它。如果否,则不使用系统提示。默认为None
。 -
template: 生成中使用的模板。它必须遵循 Jinja2 模板语法。如果未提供,它将假定传递的文本是指令并构建适当的模板。
-
columns: 模板中期望的列的字符串或列的列表。请查看示例以获取更多信息。默认为
instruction
。 -
use_system_prompt: 已弃用。将在 1.5.0 版本中删除。是否在生成中使用系统提示。默认为
True
,这意味着如果在输入批次中定义了列system_prompt
,则将使用system_prompt
,否则将被忽略。
输入和输出列¶
graph TD
subgraph Dataset
subgraph Columns
ICOL0[dynamic]
end
subgraph New columns
OCOL0[generation]
OCOL1[model_name]
end
end
subgraph TextGeneration
StepInput[Input Columns: dynamic]
StepOutput[Output Columns: generation, model_name]
end
ICOL0 --> StepInput
StepOutput --> OCOL0
StepOutput --> OCOL1
StepInput --> StepOutput
输入¶
- dynamic (由
columns
属性确定): 默认情况下将设置为instruction
。列可以指向str
或List[str]
,用于模板中。
输出¶
-
generation (
str
): 生成的文本。 -
model_name (
str
): 用于生成文本的模型的名称。
示例¶
从指令生成文本¶
from distilabel.steps.tasks import TextGeneration
from distilabel.models import InferenceEndpointsLLM
# Consider this as a placeholder for your actual LLM.
text_gen = TextGeneration(
llm=InferenceEndpointsLLM(
model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
)
)
text_gen.load()
result = next(
text_gen.process(
[{"instruction": "your instruction"}]
)
)
# result
# [
# {
# 'instruction': 'your instruction',
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct',
# 'generation': 'generation',
# }
# ]
使用自定义模板生成文本¶
from distilabel.steps.tasks import TextGeneration
from distilabel.models import InferenceEndpointsLLM
CUSTOM_TEMPLATE = '''Document:
{{ document }}
Question: {{ question }}
Please provide a clear and concise answer to the question based on the information in the document and your general knowledge:
'''.rstrip()
text_gen = TextGeneration(
llm=InferenceEndpointsLLM(
model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
),
system_prompt="You are a helpful AI assistant. Your task is to answer the following question based on the provided document. If the answer is not explicitly stated in the document, use your knowledge to provide the most relevant and accurate answer possible. If you cannot answer the question based on the given information, state that clearly.",
template=CUSTOM_TEMPLATE,
columns=["document", "question"],
)
text_gen.load()
result = next(
text_gen.process(
[
{
"document": "The Great Barrier Reef, located off the coast of Australia, is the world's largest coral reef system. It stretches over 2,300 kilometers and is home to a diverse array of marine life, including over 1,500 species of fish. However, in recent years, the reef has faced significant challenges due to climate change, with rising sea temperatures causing coral bleaching events.",
"question": "What is the main threat to the Great Barrier Reef mentioned in the document?"
}
]
)
)
# result
# [
# {
# 'document': 'The Great Barrier Reef, located off the coast of Australia, is the world's largest coral reef system. It stretches over 2,300 kilometers and is home to a diverse array of marine life, including over 1,500 species of fish. However, in recent years, the reef has faced significant challenges due to climate change, with rising sea temperatures causing coral bleaching events.',
# 'question': 'What is the main threat to the Great Barrier Reef mentioned in the document?',
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct',
# 'generation': 'According to the document, the main threat to the Great Barrier Reef is climate change, specifically rising sea temperatures causing coral bleaching events.',
# }
# ]
使用不同系统提示的少样本学习¶
from distilabel.steps.tasks import TextGeneration
from distilabel.models import InferenceEndpointsLLM
CUSTOM_TEMPLATE = '''Generate a clear, single-sentence instruction based on the following examples:
{% for example in examples %}
Example {{ loop.index }}:
Instruction: {{ example }}
{% endfor %}
Now, generate a new instruction in a similar style:
'''.rstrip()
text_gen = TextGeneration(
llm=InferenceEndpointsLLM(
model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
),
template=CUSTOM_TEMPLATE,
columns="examples",
)
text_gen.load()
result = next(
text_gen.process(
[
{
"examples": ["This is an example", "Another relevant example"],
"system_prompt": "You are an AI assistant specialised in cybersecurity and computing in general, you make your point clear without any explanations."
}
]
)
)
# result
# [
# {
# 'examples': ['This is an example', 'Another relevant example'],
# 'system_prompt': 'You are an AI assistant specialised in cybersecurity and computing in general, you make your point clear without any explanations.',
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct',
# 'generation': 'Disable the firewall on the router',
# }
# ]