Embedding Gallery¶

本节包含在 distilabel 中实现的现有 Embeddings 子类。

`embeddings` ¶

`LlamaCppEmbeddings` ¶

基类：Embeddings, CudaDevicePlacementMixin

用于生成 embedding 的 LlamaCpp 库实现。

属性

名称	类型	描述
`model_name`	`str`	包含 GGUF 量化模型的名称，与已安装的 `llama.cpp` Python 绑定版本兼容。
`model_path`	`RuntimeParameter[str]`	包含 GGUF 量化模型的路径，与已安装的 `llama.cpp` Python 绑定版本兼容。
`repo_id`	`RuntimeParameter[str]`	Hugging Face Hub 仓库 ID。
`verbose`	`RuntimeParameter[bool]`	是否打印详细输出。默认为 `False`。
`n_gpu_layers`	`RuntimeParameter[int]`	要在 GPU 上运行的层数。默认为 `-1`（如果可用则使用 GPU）。
`disable_cuda_device_placement`	`RuntimeParameter[bool]`	是否禁用 CUDA 设备放置。默认为 `True`。
`normalize_embeddings`	`RuntimeParameter[bool]`	是否标准化 embeddings。默认为 `False`。
`seed`	`int`	RNG 种子，-1 表示随机
`n_ctx`	`int`	文本上下文，0 = 来自模型
`n_batch`	`int`	提示处理最大批次大小
`extra_kwargs`	`Optional[RuntimeParameter[Dict[str, Any]]]`	将传递给 `llama_cpp` 库的 `Llama` 类的其他关键字参数字典。默认为 `{}`。

运行时参数

n_gpu_layers：用于 GPU 的层数。默认为 -1。
verbose：是否打印详细输出。默认为 False。
normalize_embeddings：是否标准化 embeddings。默认为 False。
extra_kwargs：将传递给 llama_cpp 库的 Llama 类的其他关键字参数字典。默认为 {}。

参考

离线推理 embeddings

示例

使用本地模型生成句子 embeddings

from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
    model=model,
    model_path=str(Path.home() / model_path),
)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()

使用 HuggingFace Hub 模型生成句子 embeddings

from distilabel.models.embeddings import LlamaCppEmbeddings
# You need to set environment variable to download private model to the local machine

repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

使用 cpu 生成句子 embeddings

from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
    model=model,
    model_path=str(Path.home() / model_path),
    n_gpu_layers=0,
    disable_cuda_device_placement=True,
)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

源代码位于 src/distilabel/models/embeddings/llamacpp.py

class LlamaCppEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`LlamaCpp` library implementation for embedding generation.

    Attributes:
        model_name: contains the name of the GGUF quantized model, compatible with the
            installed version of the `llama.cpp` Python bindings.
        model_path: contains the path to the GGUF quantized model, compatible with the
            installed version of the `llama.cpp` Python bindings.
        repo_id: the Hugging Face Hub repository id.
        verbose: whether to print verbose output. Defaults to `False`.
        n_gpu_layers: number of layers to run on the GPU. Defaults to `-1` (use the GPU if available).
        disable_cuda_device_placement: whether to disable CUDA device placement. Defaults to `True`.
        normalize_embeddings: whether to normalize the embeddings. Defaults to `False`.
        seed: RNG seed, -1 for random
        n_ctx: Text context, 0 = from model
        n_batch: Prompt processing maximum batch size
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `Llama` class of `llama_cpp` library. Defaults to `{}`.

    Runtime parameters:
        - `n_gpu_layers`: the number of layers to use for the GPU. Defaults to `-1`.
        - `verbose`: whether to print verbose output. Defaults to `False`.
        - `normalize_embeddings`: whether to normalize the embeddings. Defaults to `False`.
        - `extra_kwargs`: additional dictionary of keyword arguments that will be passed to the
            `Llama` class of `llama_cpp` library. Defaults to `{}`.

    References:
        - [Offline inference embeddings](https://llama-cpp-python.readthedocs.io/en/stable/#embeddings)

    Examples:
        Generate sentence embeddings using a local model:

        ```python
        from pathlib import Path
        from distilabel.models.embeddings import LlamaCppEmbeddings

        # You can follow along this example downloading the following model running the following
        # command in the terminal, that will download the model to the `Downloads` folder:
        # curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

        model_path = "Downloads/"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(
            model=model,
            model_path=str(Path.home() / model_path),
        )

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        ```

        Generate sentence embeddings using a HuggingFace Hub model:

        ```python
        from distilabel.models.embeddings import LlamaCppEmbeddings
        # You need to set environment variable to download private model to the local machine

        repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```

        Generate sentence embeddings with cpu:

        ```python
        from pathlib import Path
        from distilabel.models.embeddings import LlamaCppEmbeddings

        # You can follow along this example downloading the following model running the following
        # command in the terminal, that will download the model to the `Downloads` folder:
        # curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

        model_path = "Downloads/"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(
            model=model,
            model_path=str(Path.home() / model_path),
            n_gpu_layers=0,
            disable_cuda_device_placement=True,
        )

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```


    """

    model: str = Field(
        description="The name of the model to use for embeddings.",
    )

    model_path: RuntimeParameter[str] = Field(
        default=None,
        description="The path to the GGUF quantized model, compatible with the installed version of the `llama.cpp` Python bindings.",
    )

    repo_id: RuntimeParameter[str] = Field(
        default=None, description="The Hugging Face Hub repository id.", exclude=True
    )

    n_gpu_layers: RuntimeParameter[int] = Field(
        default=-1,
        description="The number of layers that will be loaded in the GPU.",
    )

    n_ctx: int = 512
    n_batch: int = 512
    seed: int = 4294967295

    normalize_embeddings: RuntimeParameter[bool] = Field(
        default=False,
        description="Whether to normalize the embeddings.",
    )
    verbose: RuntimeParameter[bool] = Field(
        default=False,
        description="Whether to print verbose output from llama.cpp library.",
    )
    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `Llama` class of `llama_cpp` library. See all the supported arguments at: "
        "https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.__init__",
    )
    _model: Optional["Llama"] = PrivateAttr(...)

    def load(self) -> None:
        """Loads the `gguf` model using either the path or the Hugging Face Hub repository id."""
        super().load()
        CudaDevicePlacementMixin.load(self)

        try:
            from llama_cpp import Llama
        except ImportError as ie:
            raise ImportError(
                "`llama-cpp-python` package is not installed. Please install it using"
                " `pip install 'distilabel[llama-cpp]'`."
            ) from ie

        if self.repo_id is not None:
            # use repo_id to download the model
            from huggingface_hub.utils import validate_repo_id

            validate_repo_id(self.repo_id)
            self._model = Llama.from_pretrained(
                repo_id=self.repo_id,
                filename=self.model,
                n_gpu_layers=self.n_gpu_layers,
                seed=self.seed,
                n_ctx=self.n_ctx,
                n_batch=self.n_batch,
                verbose=self.verbose,
                embedding=True,
                kwargs=self.extra_kwargs,
            )
        elif self.model_path is not None:
            self._model = Llama(
                model_path=str(Path(self.model_path) / self.model),
                n_gpu_layers=self.n_gpu_layers,
                seed=self.seed,
                n_ctx=self.n_ctx,
                n_batch=self.n_batch,
                verbose=self.verbose,
                embedding=True,
                kwargs=self.extra_kwargs,
            )
        else:
            raise ValueError("Either 'model_path' or 'repo_id' must be provided")

    def unload(self) -> None:
        """Unloads the `gguf` model."""
        CudaDevicePlacementMixin.unload(self)
        self._model.close()
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return self._model.embed(inputs, normalize=self.normalize_embeddings)

`model_name` `property` ¶

返回模型的名称。

`load()` ¶

使用路径或 Hugging Face Hub 仓库 ID 加载 gguf 模型。

源代码位于 src/distilabel/models/embeddings/llamacpp.py

def load(self) -> None:
    """Loads the `gguf` model using either the path or the Hugging Face Hub repository id."""
    super().load()
    CudaDevicePlacementMixin.load(self)

    try:
        from llama_cpp import Llama
    except ImportError as ie:
        raise ImportError(
            "`llama-cpp-python` package is not installed. Please install it using"
            " `pip install 'distilabel[llama-cpp]'`."
        ) from ie

    if self.repo_id is not None:
        # use repo_id to download the model
        from huggingface_hub.utils import validate_repo_id

        validate_repo_id(self.repo_id)
        self._model = Llama.from_pretrained(
            repo_id=self.repo_id,
            filename=self.model,
            n_gpu_layers=self.n_gpu_layers,
            seed=self.seed,
            n_ctx=self.n_ctx,
            n_batch=self.n_batch,
            verbose=self.verbose,
            embedding=True,
            kwargs=self.extra_kwargs,
        )
    elif self.model_path is not None:
        self._model = Llama(
            model_path=str(Path(self.model_path) / self.model),
            n_gpu_layers=self.n_gpu_layers,
            seed=self.seed,
            n_ctx=self.n_ctx,
            n_batch=self.n_batch,
            verbose=self.verbose,
            embedding=True,
            kwargs=self.extra_kwargs,
        )
    else:
        raise ValueError("Either 'model_path' or 'repo_id' must be provided")

`unload()` ¶

卸载 gguf 模型。

源代码位于 src/distilabel/models/embeddings/llamacpp.py

def unload(self) -> None:
    """Unloads the `gguf` model."""
    CudaDevicePlacementMixin.unload(self)
    self._model.close()
    super().unload()

`encode(inputs)` ¶

为提供的输入生成 embeddings。

参数

名称	类型	描述	默认值
`inputs`	`List[str]`	需要为其生成 embedding 的文本列表。	必需

返回值

类型	描述
`List[List[Union[int, float]]]`	生成的 embeddings。

源代码位于 src/distilabel/models/embeddings/llamacpp.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return self._model.embed(inputs, normalize=self.normalize_embeddings)

`SentenceTransformerEmbeddings` ¶

基类：Embeddings, CudaDevicePlacementMixin

用于生成 embedding 的 sentence-transformers 库实现。

属性

名称	类型	描述
`model`	`str`	模型 Hugging Face Hub 仓库 ID 或包含模型权重和配置文件目录的路径。
`device`	`Optional[RuntimeParameter[str]]`	用于加载模型的设备名称，例如 "cuda"、"mps" 等。默认为 `None`。
`prompts`	`Optional[Dict[str, str]]`	包含要与模型一起使用的提示的字典。默认为 `None`。
`default_prompt_name`	`Optional[str]`	将应用于输入的默认提示（在 `prompts` 中）。如果未提供，则不使用任何提示。默认为 `None`。
`trust_remote_code`	`bool`	是否允许获取和执行从 Hub 仓库获取的远程代码。默认为 `False`。
`revision`	`Optional[str]`	如果 `model` 指的是 Hugging Face Hub 仓库，则要使用的修订版（例如，分支名称或提交 ID）。默认为 `"main"`。
`token`	`Optional[str]`	将用于向 Hugging Face Hub 验证身份的 Hugging Face Hub 令牌。如果未提供，则将使用 `HF_TOKEN` 环境变量或 `huggingface_hub` 包本地配置。默认为 `None`。
`truncate_dim`	`Optional[int]`	截断句子 embeddings 的维度。默认为 `None`。
`model_kwargs`	`Optional[Dict[str, Any]]`	将传递给 Hugging Face `transformers` 模型类的额外 kwargs。默认为 `None`。
`tokenizer_kwargs`	`Optional[Dict[str, Any]]`	将传递给 Hugging Face `transformers` tokenizer 类的额外 kwargs。默认为 `None`。
`config_kwargs`	`Optional[Dict[str, Any]]`	将传递给 Hugging Face `transformers` 配置类的额外 kwargs。默认为 `None`。
`precision`	`Optional[Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']]`	结果 embeddings 将具有的 dtype。默认为 `"float32"`。
`normalize_embeddings`	`RuntimeParameter[bool]`	是否标准化 embeddings，使其长度为 1。默认为 `None`。

示例

生成句子 embeddings

from distilabel.models import SentenceTransformerEmbeddings

embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

源代码位于 src/distilabel/models/embeddings/sentence_transformers.py

class SentenceTransformerEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`sentence-transformers` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        device: the name of the device used to load the model e.g. "cuda", "mps", etc.
            Defaults to `None`.
        prompts: a dictionary containing prompts to be used with the model. Defaults to
            `None`.
        default_prompt_name: the default prompt (in `prompts`) that will be applied to the
            inputs. If not provided, then no prompt will be used. Defaults to `None`.
        trust_remote_code: whether to allow fetching and executing remote code fetched
            from the repository in the Hub. Defaults to `False`.
        revision: if `model` refers to a Hugging Face Hub repository, then the revision
            (e.g. a branch name or a commit id) to use. Defaults to `"main"`.
        token: the Hugging Face Hub token that will be used to authenticate to the Hugging
            Face Hub. If not provided, the `HF_TOKEN` environment or `huggingface_hub` package
            local configuration will be used. Defaults to `None`.
        truncate_dim: the dimension to truncate the sentence embeddings. Defaults to `None`.
        model_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            model class. Defaults to `None`.
        tokenizer_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            tokenizer class. Defaults to `None`.
        config_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            configuration class. Defaults to `None`.
        precision: the dtype that will have the resulting embeddings. Defaults to `"float32"`.
        normalize_embeddings: whether to normalize the embeddings so they have a length
            of 1. Defaults to `None`.

    Examples:
        Generating sentence embeddings:

        ```python
        from distilabel.models import SentenceTransformerEmbeddings

        embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```
    """

    model: str
    device: Optional[RuntimeParameter[str]] = Field(
        default=None,
        description="The device to be used to load the model. If `None`, then it"
        " will check if a GPU can be used.",
    )
    prompts: Optional[Dict[str, str]] = None
    default_prompt_name: Optional[str] = None
    trust_remote_code: bool = False
    revision: Optional[str] = None
    token: Optional[str] = None
    truncate_dim: Optional[int] = None
    model_kwargs: Optional[Dict[str, Any]] = None
    tokenizer_kwargs: Optional[Dict[str, Any]] = None
    config_kwargs: Optional[Dict[str, Any]] = None
    precision: Optional[Literal["float32", "int8", "uint8", "binary", "ubinary"]] = (
        "float32"
    )
    normalize_embeddings: RuntimeParameter[bool] = Field(
        default=True,
        description="Whether to normalize the embeddings so the generated vectors"
        " have a length of 1 or not.",
    )

    _model: Union["SentenceTransformer", None] = PrivateAttr(None)

    def load(self) -> None:
        """Loads the Sentence Transformer model"""
        super().load()

        if self.device == "cuda":
            CudaDevicePlacementMixin.load(self)

        try:
            from sentence_transformers import SentenceTransformer
        except ImportError as e:
            raise ImportError(
                "`sentence-transformers` package is not installed. Please install it using"
                " `pip install 'distilabel[sentence-transformers]'`."
            ) from e

        self._model = SentenceTransformer(
            model_name_or_path=self.model,
            device=self.device,
            prompts=self.prompts,
            default_prompt_name=self.default_prompt_name,
            trust_remote_code=self.trust_remote_code,
            revision=self.revision,
            token=self.token,
            truncate_dim=self.truncate_dim,
            model_kwargs=self.model_kwargs,
            tokenizer_kwargs=self.tokenizer_kwargs,
            config_kwargs=self.config_kwargs,
        )

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return self._model.encode(  # type: ignore
            sentences=inputs,
            batch_size=len(inputs),
            convert_to_numpy=True,
            precision=self.precision,  # type: ignore
            normalize_embeddings=self.normalize_embeddings,  # type: ignore
        ).tolist()  # type: ignore

    def unload(self) -> None:
        del self._model
        if self.device == "cuda":
            CudaDevicePlacementMixin.unload(self)
        super().unload()

`model_name` `property` ¶

返回模型的名称。

`load()` ¶

加载 Sentence Transformer 模型

源代码位于 src/distilabel/models/embeddings/sentence_transformers.py

def load(self) -> None:
    """Loads the Sentence Transformer model"""
    super().load()

    if self.device == "cuda":
        CudaDevicePlacementMixin.load(self)

    try:
        from sentence_transformers import SentenceTransformer
    except ImportError as e:
        raise ImportError(
            "`sentence-transformers` package is not installed. Please install it using"
            " `pip install 'distilabel[sentence-transformers]'`."
        ) from e

    self._model = SentenceTransformer(
        model_name_or_path=self.model,
        device=self.device,
        prompts=self.prompts,
        default_prompt_name=self.default_prompt_name,
        trust_remote_code=self.trust_remote_code,
        revision=self.revision,
        token=self.token,
        truncate_dim=self.truncate_dim,
        model_kwargs=self.model_kwargs,
        tokenizer_kwargs=self.tokenizer_kwargs,
        config_kwargs=self.config_kwargs,
    )

`encode(inputs)` ¶

为提供的输入生成 embeddings。

参数

名称	类型	描述	默认值
`inputs`	`List[str]`	需要为其生成 embedding 的文本列表。	必需

返回值

类型	描述
`List[List[Union[int, float]]]`	生成的 embeddings。

源代码位于 src/distilabel/models/embeddings/sentence_transformers.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return self._model.encode(  # type: ignore
        sentences=inputs,
        batch_size=len(inputs),
        convert_to_numpy=True,
        precision=self.precision,  # type: ignore
        normalize_embeddings=self.normalize_embeddings,  # type: ignore
    ).tolist()  # type: ignore

`vLLMEmbeddings` ¶

基类：Embeddings, CudaDevicePlacementMixin

用于生成 embedding 的 vllm 库实现。

属性

名称	类型	描述
`model`	`str`	模型 Hugging Face Hub 仓库 ID 或包含模型权重和配置文件目录的路径。
`dtype`	`str`	用于模型的数据类型。默认为 `auto`。
`trust_remote_code`	`bool`	是否信任加载模型时的远程代码。默认为 `False`。
`quantization`	`Optional[str]`	用于模型的量化模式。默认为 `None`。
`revision`	`Optional[str]`	要加载的模型修订版。默认为 `None`。
`enforce_eager`	`bool`	是否强制执行 eager 执行。默认为 `True`。
`seed`	`int`	用于随机数生成器的种子。默认为 `0`。
`extra_kwargs`	`Optional[RuntimeParameter[Dict[str, Any]]]`	将传递给 `vllm` 库的 `LLM` 类的其他关键字参数字典。默认为 `{}`。
`_model`	`LLM`	`vLLM` 模型实例。此属性旨在在内部使用，不应直接访问。它将在 `load` 方法中设置。

参考

离线推理 embeddings

示例

生成句子 embeddings

from distilabel.models import vLLMEmbeddings

embeddings = vLLMEmbeddings(model="intfloat/e5-mistral-7b-instruct")

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

源代码位于 src/distilabel/models/embeddings/vllm.py

class vLLMEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`vllm` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        dtype: the data type to use for the model. Defaults to `auto`.
        trust_remote_code: whether to trust the remote code when loading the model. Defaults
            to `False`.
        quantization: the quantization mode to use for the model. Defaults to `None`.
        revision: the revision of the model to load. Defaults to `None`.
        enforce_eager: whether to enforce eager execution. Defaults to `True`.
        seed: the seed to use for the random number generator. Defaults to `0`.
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `LLM` class of `vllm` library. Defaults to `{}`.
        _model: the `vLLM` model instance. This attribute is meant to be used internally
            and should not be accessed directly. It will be set in the `load` method.

    References:
        - [Offline inference embeddings](https://docs.vllm.com.cn/en/latest/getting_started/examples/offline_inference_embedding.html)

    Examples:
        Generating sentence embeddings:

        ```python
        from distilabel.models import vLLMEmbeddings

        embeddings = vLLMEmbeddings(model="intfloat/e5-mistral-7b-instruct")

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```
    """

    model: str
    dtype: str = "auto"
    trust_remote_code: bool = False
    quantization: Optional[str] = None
    revision: Optional[str] = None

    enforce_eager: bool = True

    seed: int = 0

    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `vLLM` class of `vllm` library. See all the supported arguments at: "
        "https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py",
    )

    _model: "_vLLM" = PrivateAttr(None)

    def load(self) -> None:
        """Loads the `vLLM` model using either the path or the Hugging Face Hub repository id."""
        super().load()

        CudaDevicePlacementMixin.load(self)

        try:
            from vllm import LLM as _vLLM

        except ImportError as ie:
            raise ImportError(
                "vLLM is not installed. Please install it using `pip install 'distilabel[vllm]'`."
            ) from ie

        self._model = _vLLM(
            self.model,
            dtype=self.dtype,
            trust_remote_code=self.trust_remote_code,
            quantization=self.quantization,
            revision=self.revision,
            enforce_eager=self.enforce_eager,
            seed=self.seed,
            **self.extra_kwargs,  # type: ignore
        )

    def unload(self) -> None:
        """Unloads the `vLLM` model."""
        CudaDevicePlacementMixin.unload(self)
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return [output.outputs.embedding for output in self._model.encode(inputs)]

`model_name` `property` ¶

返回模型的名称。

`load()` ¶

使用路径或 Hugging Face Hub 仓库 ID 加载 vLLM 模型。

源代码位于 src/distilabel/models/embeddings/vllm.py

def load(self) -> None:
    """Loads the `vLLM` model using either the path or the Hugging Face Hub repository id."""
    super().load()

    CudaDevicePlacementMixin.load(self)

    try:
        from vllm import LLM as _vLLM

    except ImportError as ie:
        raise ImportError(
            "vLLM is not installed. Please install it using `pip install 'distilabel[vllm]'`."
        ) from ie

    self._model = _vLLM(
        self.model,
        dtype=self.dtype,
        trust_remote_code=self.trust_remote_code,
        quantization=self.quantization,
        revision=self.revision,
        enforce_eager=self.enforce_eager,
        seed=self.seed,
        **self.extra_kwargs,  # type: ignore
    )

`unload()` ¶

卸载 vLLM 模型。

源代码位于 src/distilabel/models/embeddings/vllm.py

def unload(self) -> None:
    """Unloads the `vLLM` model."""
    CudaDevicePlacementMixin.unload(self)
    super().unload()

`encode(inputs)` ¶

为提供的输入生成 embeddings。

参数

名称	类型	描述	默认值
`inputs`	`List[str]`	需要为其生成 embedding 的文本列表。	必需

返回值

类型	描述
`List[List[Union[int, float]]]`	生成的 embeddings。

源代码位于 src/distilabel/models/embeddings/vllm.py

def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return [output.outputs.embedding for output in self._model.encode(inputs)]

Embedding Gallery¶

embeddings ¶

LlamaCppEmbeddings ¶

model_name property ¶

load() ¶

unload() ¶

encode(inputs) ¶

SentenceTransformerEmbeddings ¶

model_name property ¶

load() ¶

encode(inputs) ¶

vLLMEmbeddings ¶

model_name property ¶

load() ¶

unload() ¶

encode(inputs) ¶

`embeddings` ¶

`LlamaCppEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`unload()` ¶

`encode(inputs)` ¶

`SentenceTransformerEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`encode(inputs)` ¶

`vLLMEmbeddings` ¶

`model_name` `property` ¶

`load()` ¶

`unload()` ¶

`encode(inputs)` ¶