MergeColumns¶
合并行中的列。
MergeColumns
是一个 Step
,它实现了 process
方法,该方法调用 merge_columns
函数来处理和合并 StepInput
中的列。MergeColumns
提供了两个属性 columns
和 output_column
,用于指定要合并的列和结果输出列。
This step can be useful if you have a `Task` that generates instructions for example, and you
want to have more examples of those. In such a case, you could for example use another `Task`
to multiply your instructions synthetically, what would yield two different columns splitted.
Using `MergeColumns` you can merge them and use them as a single column in your dataset for
further processing.
Attributes¶
-
columns: 包含要合并的列的名称的字符串列表。
-
output_column: 输出列的字符串名称
Input & Output Columns¶
graph TD
subgraph Dataset
subgraph Columns
ICOL0[dynamic]
end
subgraph New columns
OCOL0[dynamic]
end
end
subgraph MergeColumns
StepInput[Input Columns: dynamic]
StepOutput[Output Columns: dynamic]
end
ICOL0 --> StepInput
StepOutput --> OCOL0
StepInput --> StepOutput
Inputs¶
- dynamic (由
columns
属性决定): 要合并的列。
Outputs¶
- dynamic (由
columns
和output_column
属性决定): 合并后的列。
Examples¶
合并数据集行中的列¶
from distilabel.steps import MergeColumns
combiner = MergeColumns(
columns=["queries", "multiple_queries"],
output_column="queries",
)
combiner.load()
result = next(
combiner.process(
[
{
"queries": "How are you?",
"multiple_queries": ["What's up?", "Everything ok?"]
}
],
)
)
# >>> result
# [{'queries': ['How are you?', "What's up?", 'Everything ok?']}]