类型¶

本节包含 distilabel 代码库中使用的不同类型。

`base` ¶

`ChatType = List[ChatItem]` `module-attribute` ¶

ChatType 是 dict 列表的类型别名，遵循 OpenAI 对话格式。

`ImageUrl` ¶

基类: TypedDict

源代码在 src/distilabel/typing/base.py

class ImageUrl(TypedDict):
    url: Required[str]
    """Either a URL of the image or the base64 encoded image data."""

`url` `instance-attribute` ¶

图像的 URL 或 base64 编码的图像数据。

`ImageContent` ¶

基类: TypedDict

用户在对话中可以包含文本或图像的消息的类型别名。它是视觉语言模型的标准类型：https://platform.openai.com/docs/guides/vision

源代码在 src/distilabel/typing/base.py

class ImageContent(TypedDict, total=False):
    """Type alias for the user's message in a conversation that can include text or an image.
    It's the standard type for vision language models:
    https://platform.openai.com/docs/guides/vision
    """

    type: Required[Literal["image_url"]]
    image_url: Required[ImageUrl]

`steps` ¶

`StepOutput = Iterator[List[Dict[str, Any]]]` `module-attribute` ¶

StepOutput 是类型 Iterator[List[Dict[str, Any]]] 的别名

`GeneratorStepOutput = Iterator[Tuple[List[Dict[str, Any]], bool]]` `module-attribute` ¶

GeneratorStepOutput 是类型 Iterator[Tuple[List[Dict[str, Any]], bool]] 的别名

`StepColumns = Union[List[str], Dict[str, bool]]` `module-attribute` ¶

StepColumns 是类型 Union[List[str], Dict[str, bool]] 的别名，由 Step 的 inputs 和 outputs 属性使用。对于 List[str] 的情况，它是一个包含所需列的列表。对于 Dict[str, bool] 的情况，它是一个字典，其中键是列，值是布尔值，指示列是否是必需的。

`models` ¶

`LLMLogprobs = List[List[List[Logprob]]]` `module-attribute` ¶

一种类型别名，表示 LLM 输出的概率分布。

结构

最外层列表：包含采样时的多个生成选择（n 个序列）
中间列表：表示生成序列中的每个位置
最内层列表：包含词汇表中每个 token 在该位置的对数概率

`LLMStatistics = Union[TokenCount, Dict[str, Any]]` `module-attribute` ¶

最初，LLMStatistics 将包含 token 计数，但可以有更多变量。一旦我们为每个 LLM 定义了它们，就可以添加它们。

`StructuredOutputType = Union[OutlinesStructuredOutputType, InstructorStructuredOutputType]` `module-attribute` ¶

StructuredOutputType 是 OutlinesStructuredOutputType 和 InstructorStructuredOutputType 的联合类型的别名。

`StandardInput = ChatType` `module-attribute` ¶

StandardInput 是 ChatType 的别名，它定义了 format_input 生成的默认/标准输入。

`StructuredInput = Tuple[StandardInput, Union[StructuredOutputType, None]]` `module-attribute` ¶

StructuredInput 定义了当使用 StructuredGeneration 或其子类时，format_input 生成的类型。

`FormattedInput = Union[StandardInput, StructuredInput, str]` `module-attribute` ¶

FormattedInput 是 StandardInput 和 StructuredInput 的联合类型的别名，由 format_input 生成并由 LLM 期望，以及视觉语言模型的 ConversationType。

`OutlinesStructuredOutputType` ¶

基类: TypedDict

TypedDict 用于表示来自 outlines 的结构化输出配置。

源代码在 src/distilabel/typing/models.py

class OutlinesStructuredOutputType(TypedDict, total=False):
    """TypedDict to represent the structured output configuration from `outlines`."""

    format: Literal["json", "regex"]
    """One of "json" or "regex"."""
    schema: Union[str, Type[BaseModel], Dict[str, Any]]
    """The schema to use for the structured output. If "json", it
    can be a pydantic.BaseModel class, or the schema as a string,
    as obtained from `model_to_schema(BaseModel)`, if "regex", it
    should be a regex pattern as a string.
    """
    whitespace_pattern: Optional[Union[str, List[str]]]
    """If "json" corresponds to a string or a list of
    strings with a pattern (doesn't impact string literals).
    For example, to allow only a single space or newline with
    `whitespace_pattern=r"[\n ]?"`
    """

`format` `instance-attribute` ¶

“json” 或 “regex” 之一。

`schema` `instance-attribute` ¶

用于结构化输出的 schema。如果为 “json”，则可以是 pydantic.BaseModel 类，或者作为字符串的 schema，从 model_to_schema(BaseModel) 获取；如果为 “regex”，则应为作为字符串的正则表达式模式。

`whitespace_pattern` `instance-attribute` ¶

如果 “json” 对应于带有模式的字符串或字符串列表（不影响字符串字面量）。例如，要仅允许单个空格或换行符，使用 whitespace_pattern=r"[ ]?"

`InstructorStructuredOutputType` ¶

基类: TypedDict

TypedDict 用于表示来自 instructor 的结构化输出配置。

源代码在 src/distilabel/typing/models.py

class InstructorStructuredOutputType(TypedDict, total=False):
    """TypedDict to represent the structured output configuration from `instructor`."""

    format: Optional[Literal["json"]]
    """One of "json"."""
    schema: Union[Type[BaseModel], Dict[str, Any]]
    """The schema to use for the structured output, a `pydantic.BaseModel` class. """
    mode: Optional[str]
    """Generation mode. Take a look at `instructor.Mode` for more information, if not informed it will
    be determined automatically. """
    max_retries: int
    """Number of times to reask the model in case of error, if not set will default to the model's default. """

`format` `instance-attribute` ¶

“json” 之一。

`schema` `instance-attribute` ¶

用于结构化输出的 schema，一个 pydantic.BaseModel 类。

`mode` `instance-attribute` ¶

生成模式。查看 instructor.Mode 以获取更多信息，如果未告知，将自动确定。

`max_retries` `instance-attribute` ¶

在发生错误时重新询问模型的次数，如果未设置，将默认为模型的默认值。

`pipeline` ¶

`DownstreamConnectable = Union['Step', 'GlobalStep']` `module-attribute` ¶

可以作为下游 steps 连接的 Step 类型的别名。

`UpstreamConnectableSteps = TypeVar('UpstreamConnectableSteps', bound=Union['Step', 'GlobalStep', 'GeneratorStep'])` `module-attribute` ¶

可以作为上游 steps 连接的 Step 类型的类型。

`DownstreamConnectableSteps = TypeVar('DownstreamConnectableSteps', bound=DownstreamConnectable, covariant=True)` `module-attribute` ¶

可以作为下游 steps 连接的 Step 类型的类型。

`PipelineRuntimeParametersInfo = Dict[str, Union[List['RuntimeParameterInfo'], Dict[str, 'RuntimeParameterInfo']]]` `module-attribute` ¶

Pipeline 的运行时参数信息的别名。

`InputDataset = Union['Dataset', 'pd.DataFrame', List[Dict[str, str]]]` `module-attribute` ¶

我们可以作为输入数据集处理的类型的别名。

`LoadGroups = Union[List[List[Any]], Literal['sequential_step_execution']]` `module-attribute` ¶

可以用作加载组的类型的别名。

如果 List[List[Any]]，则它是一个列表，其中包含必须隔离加载的 steps 列表。
如果 “sequential_step_execution”，则每个 step 将在不同的阶段加载，即一次只执行一个 step。

`StepLoadStatus` ¶

基类: TypedDict

包含有关是否加载/卸载一个 step 或其加载是否失败的信息的字典

源代码在 src/distilabel/typing/pipeline.py

class StepLoadStatus(TypedDict):
    """Dict containing information about if one step was loaded/unloaded or if it's load
    failed"""

    name: str
    status: Literal["loaded", "unloaded", "load_failed"]

类型¶

base ¶

ChatType = List[ChatItem] module-attribute ¶

ImageUrl ¶

url instance-attribute ¶

ImageContent ¶

steps ¶

StepOutput = Iterator[List[Dict[str, Any]]] module-attribute ¶

GeneratorStepOutput = Iterator[Tuple[List[Dict[str, Any]], bool]] module-attribute ¶

StepColumns = Union[List[str], Dict[str, bool]] module-attribute ¶

models ¶

LLMLogprobs = List[List[List[Logprob]]] module-attribute ¶

LLMStatistics = Union[TokenCount, Dict[str, Any]] module-attribute ¶

StructuredOutputType = Union[OutlinesStructuredOutputType, InstructorStructuredOutputType] module-attribute ¶

StandardInput = ChatType module-attribute ¶

StructuredInput = Tuple[StandardInput, Union[StructuredOutputType, None]] module-attribute ¶

FormattedInput = Union[StandardInput, StructuredInput, str] module-attribute ¶

OutlinesStructuredOutputType ¶

format instance-attribute ¶

schema instance-attribute ¶

whitespace_pattern instance-attribute ¶

InstructorStructuredOutputType ¶

format instance-attribute ¶

schema instance-attribute ¶

mode instance-attribute ¶

max_retries instance-attribute ¶

pipeline ¶

DownstreamConnectable = Union['Step', 'GlobalStep'] module-attribute ¶

UpstreamConnectableSteps = TypeVar('UpstreamConnectableSteps', bound=Union['Step', 'GlobalStep', 'GeneratorStep']) module-attribute ¶

DownstreamConnectableSteps = TypeVar('DownstreamConnectableSteps', bound=DownstreamConnectable, covariant=True) module-attribute ¶

PipelineRuntimeParametersInfo = Dict[str, Union[List['RuntimeParameterInfo'], Dict[str, 'RuntimeParameterInfo']]] module-attribute ¶

InputDataset = Union['Dataset', 'pd.DataFrame', List[Dict[str, str]]] module-attribute ¶

LoadGroups = Union[List[List[Any]], Literal['sequential_step_execution']] module-attribute ¶

StepLoadStatus ¶

`base` ¶

`ChatType = List[ChatItem]` `module-attribute` ¶

`ImageUrl` ¶

`url` `instance-attribute` ¶

`ImageContent` ¶

`steps` ¶

`StepOutput = Iterator[List[Dict[str, Any]]]` `module-attribute` ¶

`GeneratorStepOutput = Iterator[Tuple[List[Dict[str, Any]], bool]]` `module-attribute` ¶

`StepColumns = Union[List[str], Dict[str, bool]]` `module-attribute` ¶

`models` ¶

`LLMLogprobs = List[List[List[Logprob]]]` `module-attribute` ¶

`LLMStatistics = Union[TokenCount, Dict[str, Any]]` `module-attribute` ¶

`StructuredOutputType = Union[OutlinesStructuredOutputType, InstructorStructuredOutputType]` `module-attribute` ¶

`StandardInput = ChatType` `module-attribute` ¶

`StructuredInput = Tuple[StandardInput, Union[StructuredOutputType, None]]` `module-attribute` ¶

`FormattedInput = Union[StandardInput, StructuredInput, str]` `module-attribute` ¶

`OutlinesStructuredOutputType` ¶

`format` `instance-attribute` ¶

`schema` `instance-attribute` ¶

`whitespace_pattern` `instance-attribute` ¶

`InstructorStructuredOutputType` ¶

`format` `instance-attribute` ¶

`schema` `instance-attribute` ¶

`mode` `instance-attribute` ¶

`max_retries` `instance-attribute` ¶

`pipeline` ¶

`DownstreamConnectable = Union['Step', 'GlobalStep']` `module-attribute` ¶

`UpstreamConnectableSteps = TypeVar('UpstreamConnectableSteps', bound=Union['Step', 'GlobalStep', 'GeneratorStep'])` `module-attribute` ¶

`DownstreamConnectableSteps = TypeVar('DownstreamConnectableSteps', bound=DownstreamConnectable, covariant=True)` `module-attribute` ¶

`PipelineRuntimeParametersInfo = Dict[str, Union[List['RuntimeParameterInfo'], Dict[str, 'RuntimeParameterInfo']]]` `module-attribute` ¶

`InputDataset = Union['Dataset', 'pd.DataFrame', List[Dict[str, str]]]` `module-attribute` ¶

`LoadGroups = Union[List[List[Any]], Literal['sequential_step_execution']]` `module-attribute` ¶

`StepLoadStatus` ¶