跳到内容

Embedding Gallery

本节包含在 distilabel 中实现的现有 Embeddings 子类。

embeddings

LlamaCppEmbeddings

基类:Embeddings, CudaDevicePlacementMixin

用于生成 embedding 的 LlamaCpp 库实现。

属性

名称 类型 描述
model_name str

包含 GGUF 量化模型的名称,与已安装的 llama.cpp Python 绑定版本兼容。

model_path RuntimeParameter[str]

包含 GGUF 量化模型的路径,与已安装的 llama.cpp Python 绑定版本兼容。

repo_id RuntimeParameter[str]

Hugging Face Hub 仓库 ID。

verbose RuntimeParameter[bool]

是否打印详细输出。默认为 False

n_gpu_layers RuntimeParameter[int]

要在 GPU 上运行的层数。默认为 -1(如果可用则使用 GPU)。

disable_cuda_device_placement RuntimeParameter[bool]

是否禁用 CUDA 设备放置。默认为 True

normalize_embeddings RuntimeParameter[bool]

是否标准化 embeddings。默认为 False

seed int

RNG 种子,-1 表示随机

n_ctx int

文本上下文,0 = 来自模型

n_batch int

提示处理最大批次大小

extra_kwargs Optional[RuntimeParameter[Dict[str, Any]]]

将传递给 llama_cpp 库的 Llama 类的其他关键字参数字典。默认为 {}

运行时参数
  • n_gpu_layers:用于 GPU 的层数。默认为 -1
  • verbose:是否打印详细输出。默认为 False
  • normalize_embeddings:是否标准化 embeddings。默认为 False
  • extra_kwargs:将传递给 llama_cpp 库的 Llama 类的其他关键字参数字典。默认为 {}
参考

示例

使用本地模型生成句子 embeddings

from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
    model=model,
    model_path=str(Path.home() / model_path),
)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()

使用 HuggingFace Hub 模型生成句子 embeddings

from distilabel.models.embeddings import LlamaCppEmbeddings
# You need to set environment variable to download private model to the local machine

repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]

使用 cpu 生成句子 embeddings

from pathlib import Path
from distilabel.models.embeddings import LlamaCppEmbeddings

# You can follow along this example downloading the following model running the following
# command in the terminal, that will download the model to the `Downloads` folder:
# curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

model_path = "Downloads/"
model = "all-MiniLM-L6-v2-Q2_K.gguf"
embeddings = LlamaCppEmbeddings(
    model=model,
    model_path=str(Path.home() / model_path),
    n_gpu_layers=0,
    disable_cuda_device_placement=True,
)

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
print(results)
embeddings.unload()
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]
源代码位于 src/distilabel/models/embeddings/llamacpp.py
class LlamaCppEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`LlamaCpp` library implementation for embedding generation.

    Attributes:
        model_name: contains the name of the GGUF quantized model, compatible with the
            installed version of the `llama.cpp` Python bindings.
        model_path: contains the path to the GGUF quantized model, compatible with the
            installed version of the `llama.cpp` Python bindings.
        repo_id: the Hugging Face Hub repository id.
        verbose: whether to print verbose output. Defaults to `False`.
        n_gpu_layers: number of layers to run on the GPU. Defaults to `-1` (use the GPU if available).
        disable_cuda_device_placement: whether to disable CUDA device placement. Defaults to `True`.
        normalize_embeddings: whether to normalize the embeddings. Defaults to `False`.
        seed: RNG seed, -1 for random
        n_ctx: Text context, 0 = from model
        n_batch: Prompt processing maximum batch size
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `Llama` class of `llama_cpp` library. Defaults to `{}`.

    Runtime parameters:
        - `n_gpu_layers`: the number of layers to use for the GPU. Defaults to `-1`.
        - `verbose`: whether to print verbose output. Defaults to `False`.
        - `normalize_embeddings`: whether to normalize the embeddings. Defaults to `False`.
        - `extra_kwargs`: additional dictionary of keyword arguments that will be passed to the
            `Llama` class of `llama_cpp` library. Defaults to `{}`.

    References:
        - [Offline inference embeddings](https://llama-cpp-python.readthedocs.io/en/stable/#embeddings)

    Examples:
        Generate sentence embeddings using a local model:

        ```python
        from pathlib import Path
        from distilabel.models.embeddings import LlamaCppEmbeddings

        # You can follow along this example downloading the following model running the following
        # command in the terminal, that will download the model to the `Downloads` folder:
        # curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

        model_path = "Downloads/"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(
            model=model,
            model_path=str(Path.home() / model_path),
        )

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        ```

        Generate sentence embeddings using a HuggingFace Hub model:

        ```python
        from distilabel.models.embeddings import LlamaCppEmbeddings
        # You need to set environment variable to download private model to the local machine

        repo_id = "second-state/All-MiniLM-L6-v2-Embedding-GGUF"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(model=model,repo_id=repo_id)

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```

        Generate sentence embeddings with cpu:

        ```python
        from pathlib import Path
        from distilabel.models.embeddings import LlamaCppEmbeddings

        # You can follow along this example downloading the following model running the following
        # command in the terminal, that will download the model to the `Downloads` folder:
        # curl -L -o ~/Downloads/all-MiniLM-L6-v2-Q2_K.gguf https://hugging-face.cn/second-state/All-MiniLM-L6-v2-Embedding-GGUF/resolve/main/all-MiniLM-L6-v2-Q2_K.gguf

        model_path = "Downloads/"
        model = "all-MiniLM-L6-v2-Q2_K.gguf"
        embeddings = LlamaCppEmbeddings(
            model=model,
            model_path=str(Path.home() / model_path),
            n_gpu_layers=0,
            disable_cuda_device_placement=True,
        )

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        print(results)
        embeddings.unload()
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```


    """

    model: str = Field(
        description="The name of the model to use for embeddings.",
    )

    model_path: RuntimeParameter[str] = Field(
        default=None,
        description="The path to the GGUF quantized model, compatible with the installed version of the `llama.cpp` Python bindings.",
    )

    repo_id: RuntimeParameter[str] = Field(
        default=None, description="The Hugging Face Hub repository id.", exclude=True
    )

    n_gpu_layers: RuntimeParameter[int] = Field(
        default=-1,
        description="The number of layers that will be loaded in the GPU.",
    )

    n_ctx: int = 512
    n_batch: int = 512
    seed: int = 4294967295

    normalize_embeddings: RuntimeParameter[bool] = Field(
        default=False,
        description="Whether to normalize the embeddings.",
    )
    verbose: RuntimeParameter[bool] = Field(
        default=False,
        description="Whether to print verbose output from llama.cpp library.",
    )
    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `Llama` class of `llama_cpp` library. See all the supported arguments at: "
        "https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.__init__",
    )
    _model: Optional["Llama"] = PrivateAttr(...)

    def load(self) -> None:
        """Loads the `gguf` model using either the path or the Hugging Face Hub repository id."""
        super().load()
        CudaDevicePlacementMixin.load(self)

        try:
            from llama_cpp import Llama
        except ImportError as ie:
            raise ImportError(
                "`llama-cpp-python` package is not installed. Please install it using"
                " `pip install 'distilabel[llama-cpp]'`."
            ) from ie

        if self.repo_id is not None:
            # use repo_id to download the model
            from huggingface_hub.utils import validate_repo_id

            validate_repo_id(self.repo_id)
            self._model = Llama.from_pretrained(
                repo_id=self.repo_id,
                filename=self.model,
                n_gpu_layers=self.n_gpu_layers,
                seed=self.seed,
                n_ctx=self.n_ctx,
                n_batch=self.n_batch,
                verbose=self.verbose,
                embedding=True,
                kwargs=self.extra_kwargs,
            )
        elif self.model_path is not None:
            self._model = Llama(
                model_path=str(Path(self.model_path) / self.model),
                n_gpu_layers=self.n_gpu_layers,
                seed=self.seed,
                n_ctx=self.n_ctx,
                n_batch=self.n_batch,
                verbose=self.verbose,
                embedding=True,
                kwargs=self.extra_kwargs,
            )
        else:
            raise ValueError("Either 'model_path' or 'repo_id' must be provided")

    def unload(self) -> None:
        """Unloads the `gguf` model."""
        CudaDevicePlacementMixin.unload(self)
        self._model.close()
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return self._model.embed(inputs, normalize=self.normalize_embeddings)
model_name property

返回模型的名称。

load()

使用路径或 Hugging Face Hub 仓库 ID 加载 gguf 模型。

源代码位于 src/distilabel/models/embeddings/llamacpp.py
def load(self) -> None:
    """Loads the `gguf` model using either the path or the Hugging Face Hub repository id."""
    super().load()
    CudaDevicePlacementMixin.load(self)

    try:
        from llama_cpp import Llama
    except ImportError as ie:
        raise ImportError(
            "`llama-cpp-python` package is not installed. Please install it using"
            " `pip install 'distilabel[llama-cpp]'`."
        ) from ie

    if self.repo_id is not None:
        # use repo_id to download the model
        from huggingface_hub.utils import validate_repo_id

        validate_repo_id(self.repo_id)
        self._model = Llama.from_pretrained(
            repo_id=self.repo_id,
            filename=self.model,
            n_gpu_layers=self.n_gpu_layers,
            seed=self.seed,
            n_ctx=self.n_ctx,
            n_batch=self.n_batch,
            verbose=self.verbose,
            embedding=True,
            kwargs=self.extra_kwargs,
        )
    elif self.model_path is not None:
        self._model = Llama(
            model_path=str(Path(self.model_path) / self.model),
            n_gpu_layers=self.n_gpu_layers,
            seed=self.seed,
            n_ctx=self.n_ctx,
            n_batch=self.n_batch,
            verbose=self.verbose,
            embedding=True,
            kwargs=self.extra_kwargs,
        )
    else:
        raise ValueError("Either 'model_path' or 'repo_id' must be provided")
unload()

卸载 gguf 模型。

源代码位于 src/distilabel/models/embeddings/llamacpp.py
def unload(self) -> None:
    """Unloads the `gguf` model."""
    CudaDevicePlacementMixin.unload(self)
    self._model.close()
    super().unload()
encode(inputs)

为提供的输入生成 embeddings。

参数

名称 类型 描述 默认值
inputs List[str]

需要为其生成 embedding 的文本列表。

必需

返回值

类型 描述
List[List[Union[int, float]]]

生成的 embeddings。

源代码位于 src/distilabel/models/embeddings/llamacpp.py
def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return self._model.embed(inputs, normalize=self.normalize_embeddings)

SentenceTransformerEmbeddings

基类:Embeddings, CudaDevicePlacementMixin

用于生成 embedding 的 sentence-transformers 库实现。

属性

名称 类型 描述
model str

模型 Hugging Face Hub 仓库 ID 或包含模型权重和配置文件目录的路径。

device Optional[RuntimeParameter[str]]

用于加载模型的设备名称,例如 "cuda"、"mps" 等。默认为 None

prompts Optional[Dict[str, str]]

包含要与模型一起使用的提示的字典。默认为 None

default_prompt_name Optional[str]

将应用于输入的默认提示(在 prompts 中)。如果未提供,则不使用任何提示。默认为 None

trust_remote_code bool

是否允许获取和执行从 Hub 仓库获取的远程代码。默认为 False

revision Optional[str]

如果 model 指的是 Hugging Face Hub 仓库,则要使用的修订版(例如,分支名称或提交 ID)。默认为 "main"

token Optional[str]

将用于向 Hugging Face Hub 验证身份的 Hugging Face Hub 令牌。如果未提供,则将使用 HF_TOKEN 环境变量或 huggingface_hub 包本地配置。默认为 None

truncate_dim Optional[int]

截断句子 embeddings 的维度。默认为 None

model_kwargs Optional[Dict[str, Any]]

将传递给 Hugging Face transformers 模型类的额外 kwargs。默认为 None

tokenizer_kwargs Optional[Dict[str, Any]]

将传递给 Hugging Face transformers tokenizer 类的额外 kwargs。默认为 None

config_kwargs Optional[Dict[str, Any]]

将传递给 Hugging Face transformers 配置类的额外 kwargs。默认为 None

precision Optional[Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']]

结果 embeddings 将具有的 dtype。默认为 "float32"

normalize_embeddings RuntimeParameter[bool]

是否标准化 embeddings,使其长度为 1。默认为 None

示例

生成句子 embeddings

from distilabel.models import SentenceTransformerEmbeddings

embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]
源代码位于 src/distilabel/models/embeddings/sentence_transformers.py
class SentenceTransformerEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`sentence-transformers` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        device: the name of the device used to load the model e.g. "cuda", "mps", etc.
            Defaults to `None`.
        prompts: a dictionary containing prompts to be used with the model. Defaults to
            `None`.
        default_prompt_name: the default prompt (in `prompts`) that will be applied to the
            inputs. If not provided, then no prompt will be used. Defaults to `None`.
        trust_remote_code: whether to allow fetching and executing remote code fetched
            from the repository in the Hub. Defaults to `False`.
        revision: if `model` refers to a Hugging Face Hub repository, then the revision
            (e.g. a branch name or a commit id) to use. Defaults to `"main"`.
        token: the Hugging Face Hub token that will be used to authenticate to the Hugging
            Face Hub. If not provided, the `HF_TOKEN` environment or `huggingface_hub` package
            local configuration will be used. Defaults to `None`.
        truncate_dim: the dimension to truncate the sentence embeddings. Defaults to `None`.
        model_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            model class. Defaults to `None`.
        tokenizer_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            tokenizer class. Defaults to `None`.
        config_kwargs: extra kwargs that will be passed to the Hugging Face `transformers`
            configuration class. Defaults to `None`.
        precision: the dtype that will have the resulting embeddings. Defaults to `"float32"`.
        normalize_embeddings: whether to normalize the embeddings so they have a length
            of 1. Defaults to `None`.

    Examples:
        Generating sentence embeddings:

        ```python
        from distilabel.models import SentenceTransformerEmbeddings

        embeddings = SentenceTransformerEmbeddings(model="mixedbread-ai/mxbai-embed-large-v1")

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```
    """

    model: str
    device: Optional[RuntimeParameter[str]] = Field(
        default=None,
        description="The device to be used to load the model. If `None`, then it"
        " will check if a GPU can be used.",
    )
    prompts: Optional[Dict[str, str]] = None
    default_prompt_name: Optional[str] = None
    trust_remote_code: bool = False
    revision: Optional[str] = None
    token: Optional[str] = None
    truncate_dim: Optional[int] = None
    model_kwargs: Optional[Dict[str, Any]] = None
    tokenizer_kwargs: Optional[Dict[str, Any]] = None
    config_kwargs: Optional[Dict[str, Any]] = None
    precision: Optional[Literal["float32", "int8", "uint8", "binary", "ubinary"]] = (
        "float32"
    )
    normalize_embeddings: RuntimeParameter[bool] = Field(
        default=True,
        description="Whether to normalize the embeddings so the generated vectors"
        " have a length of 1 or not.",
    )

    _model: Union["SentenceTransformer", None] = PrivateAttr(None)

    def load(self) -> None:
        """Loads the Sentence Transformer model"""
        super().load()

        if self.device == "cuda":
            CudaDevicePlacementMixin.load(self)

        try:
            from sentence_transformers import SentenceTransformer
        except ImportError as e:
            raise ImportError(
                "`sentence-transformers` package is not installed. Please install it using"
                " `pip install 'distilabel[sentence-transformers]'`."
            ) from e

        self._model = SentenceTransformer(
            model_name_or_path=self.model,
            device=self.device,
            prompts=self.prompts,
            default_prompt_name=self.default_prompt_name,
            trust_remote_code=self.trust_remote_code,
            revision=self.revision,
            token=self.token,
            truncate_dim=self.truncate_dim,
            model_kwargs=self.model_kwargs,
            tokenizer_kwargs=self.tokenizer_kwargs,
            config_kwargs=self.config_kwargs,
        )

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return self._model.encode(  # type: ignore
            sentences=inputs,
            batch_size=len(inputs),
            convert_to_numpy=True,
            precision=self.precision,  # type: ignore
            normalize_embeddings=self.normalize_embeddings,  # type: ignore
        ).tolist()  # type: ignore

    def unload(self) -> None:
        del self._model
        if self.device == "cuda":
            CudaDevicePlacementMixin.unload(self)
        super().unload()
model_name property

返回模型的名称。

load()

加载 Sentence Transformer 模型

源代码位于 src/distilabel/models/embeddings/sentence_transformers.py
def load(self) -> None:
    """Loads the Sentence Transformer model"""
    super().load()

    if self.device == "cuda":
        CudaDevicePlacementMixin.load(self)

    try:
        from sentence_transformers import SentenceTransformer
    except ImportError as e:
        raise ImportError(
            "`sentence-transformers` package is not installed. Please install it using"
            " `pip install 'distilabel[sentence-transformers]'`."
        ) from e

    self._model = SentenceTransformer(
        model_name_or_path=self.model,
        device=self.device,
        prompts=self.prompts,
        default_prompt_name=self.default_prompt_name,
        trust_remote_code=self.trust_remote_code,
        revision=self.revision,
        token=self.token,
        truncate_dim=self.truncate_dim,
        model_kwargs=self.model_kwargs,
        tokenizer_kwargs=self.tokenizer_kwargs,
        config_kwargs=self.config_kwargs,
    )
encode(inputs)

为提供的输入生成 embeddings。

参数

名称 类型 描述 默认值
inputs List[str]

需要为其生成 embedding 的文本列表。

必需

返回值

类型 描述
List[List[Union[int, float]]]

生成的 embeddings。

源代码位于 src/distilabel/models/embeddings/sentence_transformers.py
def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return self._model.encode(  # type: ignore
        sentences=inputs,
        batch_size=len(inputs),
        convert_to_numpy=True,
        precision=self.precision,  # type: ignore
        normalize_embeddings=self.normalize_embeddings,  # type: ignore
    ).tolist()  # type: ignore

vLLMEmbeddings

基类:Embeddings, CudaDevicePlacementMixin

用于生成 embedding 的 vllm 库实现。

属性

名称 类型 描述
model str

模型 Hugging Face Hub 仓库 ID 或包含模型权重和配置文件目录的路径。

dtype str

用于模型的数据类型。默认为 auto

trust_remote_code bool

是否信任加载模型时的远程代码。默认为 False

quantization Optional[str]

用于模型的量化模式。默认为 None

revision Optional[str]

要加载的模型修订版。默认为 None

enforce_eager bool

是否强制执行 eager 执行。默认为 True

seed int

用于随机数生成器的种子。默认为 0

extra_kwargs Optional[RuntimeParameter[Dict[str, Any]]]

将传递给 vllm 库的 LLM 类的其他关键字参数字典。默认为 {}

_model LLM

vLLM 模型实例。此属性旨在在内部使用,不应直接访问。它将在 load 方法中设置。

参考

示例

生成句子 embeddings

from distilabel.models import vLLMEmbeddings

embeddings = vLLMEmbeddings(model="intfloat/e5-mistral-7b-instruct")

embeddings.load()

results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
# [
#   [-0.05447685346007347, -0.01623094454407692, ...],
#   [4.4889533455716446e-05, 0.044016145169734955, ...],
# ]
源代码位于 src/distilabel/models/embeddings/vllm.py
class vLLMEmbeddings(Embeddings, CudaDevicePlacementMixin):
    """`vllm` library implementation for embedding generation.

    Attributes:
        model: the model Hugging Face Hub repo id or a path to a directory containing the
            model weights and configuration files.
        dtype: the data type to use for the model. Defaults to `auto`.
        trust_remote_code: whether to trust the remote code when loading the model. Defaults
            to `False`.
        quantization: the quantization mode to use for the model. Defaults to `None`.
        revision: the revision of the model to load. Defaults to `None`.
        enforce_eager: whether to enforce eager execution. Defaults to `True`.
        seed: the seed to use for the random number generator. Defaults to `0`.
        extra_kwargs: additional dictionary of keyword arguments that will be passed to the
            `LLM` class of `vllm` library. Defaults to `{}`.
        _model: the `vLLM` model instance. This attribute is meant to be used internally
            and should not be accessed directly. It will be set in the `load` method.

    References:
        - [Offline inference embeddings](https://docs.vllm.com.cn/en/latest/getting_started/examples/offline_inference_embedding.html)

    Examples:
        Generating sentence embeddings:

        ```python
        from distilabel.models import vLLMEmbeddings

        embeddings = vLLMEmbeddings(model="intfloat/e5-mistral-7b-instruct")

        embeddings.load()

        results = embeddings.encode(inputs=["distilabel is awesome!", "and Argilla!"])
        # [
        #   [-0.05447685346007347, -0.01623094454407692, ...],
        #   [4.4889533455716446e-05, 0.044016145169734955, ...],
        # ]
        ```
    """

    model: str
    dtype: str = "auto"
    trust_remote_code: bool = False
    quantization: Optional[str] = None
    revision: Optional[str] = None

    enforce_eager: bool = True

    seed: int = 0

    extra_kwargs: Optional[RuntimeParameter[Dict[str, Any]]] = Field(
        default_factory=dict,
        description="Additional dictionary of keyword arguments that will be passed to the"
        " `vLLM` class of `vllm` library. See all the supported arguments at: "
        "https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py",
    )

    _model: "_vLLM" = PrivateAttr(None)

    def load(self) -> None:
        """Loads the `vLLM` model using either the path or the Hugging Face Hub repository id."""
        super().load()

        CudaDevicePlacementMixin.load(self)

        try:
            from vllm import LLM as _vLLM

        except ImportError as ie:
            raise ImportError(
                "vLLM is not installed. Please install it using `pip install 'distilabel[vllm]'`."
            ) from ie

        self._model = _vLLM(
            self.model,
            dtype=self.dtype,
            trust_remote_code=self.trust_remote_code,
            quantization=self.quantization,
            revision=self.revision,
            enforce_eager=self.enforce_eager,
            seed=self.seed,
            **self.extra_kwargs,  # type: ignore
        )

    def unload(self) -> None:
        """Unloads the `vLLM` model."""
        CudaDevicePlacementMixin.unload(self)
        super().unload()

    @property
    def model_name(self) -> str:
        """Returns the name of the model."""
        return self.model

    def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
        """Generates embeddings for the provided inputs.

        Args:
            inputs: a list of texts for which an embedding has to be generated.

        Returns:
            The generated embeddings.
        """
        return [output.outputs.embedding for output in self._model.encode(inputs)]
model_name property

返回模型的名称。

load()

使用路径或 Hugging Face Hub 仓库 ID 加载 vLLM 模型。

源代码位于 src/distilabel/models/embeddings/vllm.py
def load(self) -> None:
    """Loads the `vLLM` model using either the path or the Hugging Face Hub repository id."""
    super().load()

    CudaDevicePlacementMixin.load(self)

    try:
        from vllm import LLM as _vLLM

    except ImportError as ie:
        raise ImportError(
            "vLLM is not installed. Please install it using `pip install 'distilabel[vllm]'`."
        ) from ie

    self._model = _vLLM(
        self.model,
        dtype=self.dtype,
        trust_remote_code=self.trust_remote_code,
        quantization=self.quantization,
        revision=self.revision,
        enforce_eager=self.enforce_eager,
        seed=self.seed,
        **self.extra_kwargs,  # type: ignore
    )
unload()

卸载 vLLM 模型。

源代码位于 src/distilabel/models/embeddings/vllm.py
def unload(self) -> None:
    """Unloads the `vLLM` model."""
    CudaDevicePlacementMixin.unload(self)
    super().unload()
encode(inputs)

为提供的输入生成 embeddings。

参数

名称 类型 描述 默认值
inputs List[str]

需要为其生成 embedding 的文本列表。

必需

返回值

类型 描述
List[List[Union[int, float]]]

生成的 embeddings。

源代码位于 src/distilabel/models/embeddings/vllm.py
def encode(self, inputs: List[str]) -> List[List[Union[int, float]]]:
    """Generates embeddings for the provided inputs.

    Args:
        inputs: a list of texts for which an embedding has to be generated.

    Returns:
        The generated embeddings.
    """
    return [output.outputs.embedding for output in self._model.encode(inputs)]