TextClassification¶

将文本分类为一个或多个类别或标签。

此任务可用于文本分类问题，其目标是将一个或多个标签分配给给定的文本。默认情况下，它使用参考文献论文中的结构化生成，它可以帮助生成更简洁的标签。请参阅参考文献中的第 4.1 节。

属性¶

system_prompt: 在任务开始前向用户显示的提示。包含默认消息，使模型表现得像分类专家。
n: 要生成的标签数量。如果只需要 1 个，则对应于标签分类问题；如果 >1，则表示返回文本最具代表性的“n”个标签。默认为 1。
context: 生成标签时使用的上下文。默认情况下包含通用消息，但可用于自定义任务的上下文。
examples: 帮助模型理解任务的示例列表，少量样本。
available_labels: 用于在对文本进行分类时选择的可用标签列表，或包含标签及其描述的字典。
default_label: 当文本模糊或缺乏足够的分类信息时使用的默认标签。在多个标签 (n>1) 的情况下，可以是一个列表。

输入和输出列¶

graph TD
    subgraph Dataset
        subgraph Columns
            ICOL0[text]
        end
        subgraph New columns
            OCOL0[labels]
            OCOL1[model_name]
        end
    end

    subgraph TextClassification
        StepInput[Input Columns: text]
        StepOutput[Output Columns: labels, model_name]
    end

    ICOL0 --> StepInput
    StepOutput --> OCOL0
    StepOutput --> OCOL1
    StepInput --> StepOutput

输入¶

text (str): 我们想要获取标签的参考文本。

输出¶

labels (Union[str, List[str]]): 文本的标签或标签列表。
model_name (str): 用于生成标签的模型名称。

示例¶

为文本分配情感¶

from distilabel.steps.tasks import TextClassification
from distilabel.models import InferenceEndpointsLLM

llm = InferenceEndpointsLLM(
    model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
    tokenizer_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
)

text_classification = TextClassification(
    llm=llm,
    context="You are an AI system specialized in assigning sentiment to movies.",
    available_labels=["positive", "negative"],
)

text_classification.load()

result = next(
    text_classification.process(
        [{"text": "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."}]
    )
)
# result
# [{'text': 'This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three.',
# 'labels': 'positive',
# 'distilabel_metadata': {'raw_output_text_classification_0': '{\n    "labels": "positive"\n}',
# 'raw_input_text_classification_0': [{'role': 'system',
#     'content': 'You are an AI system specialized in generating labels to classify pieces of text. Your sole purpose is to analyze the given text and provide appropriate classification labels.'},
#     {'role': 'user',
#     'content': '# Instruction\nPlease classify the user query by assigning the most appropriate labels.\nDo not explain your reasoning or provide any additional commentary.\nIf the text is ambiguous or lacks sufficient information for classification, respond with "Unclassified".\nProvide the label that best describes the text.\nYou are an AI system specialized in assigning sentiment to movie the user queries.\n## Labeling the user input\nUse the available labels to classify the user query. Analyze the context of each label specifically:\navailable_labels = [\n    "positive",  # The text shows positive sentiment\n    "negative",  # The text shows negative sentiment\n]\n\n\n## User Query\n```\nThis was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three.\n```\n\n## Output Format\nNow, please give me the labels in JSON format, do not include any other text in your response:\n```\n{\n    "labels": "label"\n}\n```'}]},
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct'}]

使用指定的描述分配预定义的标签¶

from distilabel.steps.tasks import TextClassification

text_classification = TextClassification(
    llm=llm,
    n=1,
    context="Determine the intent of the text.",
    available_labels={
        "complaint": "A statement expressing dissatisfaction or annoyance about a product, service, or experience. It's a negative expression of discontent, often with the intention of seeking a resolution or compensation.",
        "inquiry": "A question or request for information about a product, service, or situation. It's a neutral or curious expression seeking clarification or details.",
        "feedback": "A statement providing evaluation, opinion, or suggestion about a product, service, or experience. It can be positive, negative, or neutral, and is often intended to help improve or inform.",
        "praise": "A statement expressing admiration, approval, or appreciation for a product, service, or experience. It's a positive expression of satisfaction or delight, often with the intention of encouraging or recommending."
    },
    query_title="Customer Query",
)

text_classification.load()

result = next(
    text_classification.process(
        [{"text": "Can you tell me more about your return policy?"}]
    )
)
# result
# [{'text': 'Can you tell me more about your return policy?',
# 'labels': 'inquiry',
# 'distilabel_metadata': {'raw_output_text_classification_0': '{\n    "labels": "inquiry"\n}',
# 'raw_input_text_classification_0': [{'role': 'system',
#     'content': 'You are an AI system specialized in generating labels to classify pieces of text. Your sole purpose is to analyze the given text and provide appropriate classification labels.'},
#     {'role': 'user',
#     'content': '# Instruction\nPlease classify the customer query by assigning the most appropriate labels.\nDo not explain your reasoning or provide any additional commentary.\nIf the text is ambiguous or lacks sufficient information for classification, respond with "Unclassified".\nProvide the label that best describes the text.\nDetermine the intent of the text.\n## Labeling the user input\nUse the available labels to classify the user query. Analyze the context of each label specifically:\navailable_labels = [\n    "complaint",  # A statement expressing dissatisfaction or annoyance about a product, service, or experience. It\'s a negative expression of discontent, often with the intention of seeking a resolution or compensation.\n    "inquiry",  # A question or request for information about a product, service, or situation. It\'s a neutral or curious expression seeking clarification or details.\n    "feedback",  # A statement providing evaluation, opinion, or suggestion about a product, service, or experience. It can be positive, negative, or neutral, and is often intended to help improve or inform.\n    "praise",  # A statement expressing admiration, approval, or appreciation for a product, service, or experience. It\'s a positive expression of satisfaction or delight, often with the intention of encouraging or recommending.\n]\n\n\n## Customer Query\n```\nCan you tell me more about your return policy?\n```\n\n## Output Format\nNow, please give me the labels in JSON format, do not include any other text in your response:\n```\n{\n    "labels": "label"\n}\n```'}]},
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct'}]

无需预定义标签的自由多标签分类¶

from distilabel.steps.tasks import TextClassification

text_classification = TextClassification(
    llm=llm,
    n=3,
    context=(
        "Describe the main themes, topics, or categories that could describe the "
        "following type of persona."
    ),
    query_title="Example of Persona",
)

text_classification.load()

result = next(
    text_classification.process(
        [{"text": "A historian or curator of Mexican-American history and culture focused on the cultural, social, and historical impact of the Mexican presence in the United States."}]
    )
)
# result
# [{'text': 'A historian or curator of Mexican-American history and culture focused on the cultural, social, and historical impact of the Mexican presence in the United States.',
# 'labels': ['Historical Researcher',
# 'Cultural Specialist',
# 'Ethnic Studies Expert'],
# 'distilabel_metadata': {'raw_output_text_classification_0': '{\n    "labels": ["Historical Researcher", "Cultural Specialist", "Ethnic Studies Expert"]\n}',
# 'raw_input_text_classification_0': [{'role': 'system',
#     'content': 'You are an AI system specialized in generating labels to classify pieces of text. Your sole purpose is to analyze the given text and provide appropriate classification labels.'},
#     {'role': 'user',
#     'content': '# Instruction\nPlease classify the example of persona by assigning the most appropriate labels.\nDo not explain your reasoning or provide any additional commentary.\nIf the text is ambiguous or lacks sufficient information for classification, respond with "Unclassified".\nProvide a list of 3 labels that best describe the text.\nDescribe the main themes, topics, or categories that could describe the following type of persona.\nUse clear, widely understood terms for labels.Avoid overly specific or obscure labels unless the text demands it.\n\n\n## Example of Persona\n```\nA historian or curator of Mexican-American history and culture focused on the cultural, social, and historical impact of the Mexican presence in the United States.\n```\n\n## Output Format\nNow, please give me the labels in JSON format, do not include any other text in your response:\n```\n{\n    "labels": ["label_0", "label_1", "label_2"]\n}\n```'}]},
# 'model_name': 'meta-llama/Meta-Llama-3.1-70B-Instruct'}]

参考文献¶

Let Me Speak Freely? 关于格式限制对大型语言模型性能影响的研究