在 `distilabel` 中使用图像进行文本生成¶

使用 distilabel 回答关于图像的问题。

图像-文本到文本模型接收图像和文本提示并输出文本。在本示例中，我们将使用带有 meta-llama/Llama-3.2-11B-Vision-Instruct 的 LLM InferenceEndpointsLLM 来询问关于图像的问题，以及带有 gpt-4o-mini 的 OpenAILLM。我们将提出一个简单的问题来展示 TextGenerationWithImage 任务如何在管道中使用。

Inference Endpoints - meta-llama/Llama-3.2-11B-Vision-InstructOpenAI - gpt-4o-mini

from distilabel.models.llms import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks.text_generation_with_image import TextGenerationWithImage
from distilabel.steps import LoadDataFromDicts


with Pipeline(name="vision_generation_pipeline") as pipeline:
    loader = LoadDataFromDicts(
        data=[
            {
                "instruction": "What’s in this image?",
                "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
        ],
    )

    llm = InferenceEndpointsLLM(
        model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",
    )

    vision = TextGenerationWithImage(
        name="vision_gen",
        llm=llm,
        image_type="url"  # (1)
    )

    loader >> vision

image_type 可以是指向图像的 URL、base64 字符串表示形式或 PIL 图像，请查看 TextGenerationWithImage 以获取更多信息。

图像

问题

这张图片里有什么？

这张图片描绘了一条木制木板路，蜿蜒穿过一片茂盛的草地，两侧是充满活力的绿色草地，在平静而宜人的天空下延伸至地平线。木板路笔直向前延伸，远离观看者，在茂密、葱郁的绿草、庄稼或其他植物类型或各种小型树木和灌木中形成了一条清晰的小路。这片草地上点缀着树木和灌木，看起来健康而翠绿。上方的天空是美丽的蓝色，散布着白云，为场景增添了一丝宁静感。虽然这张图片看起来像是自然景观，但因为草是...

from distilabel.models.llms import OpenAILLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks.text_generation_with_image import TextGenerationWithImage
from distilabel.steps import LoadDataFromDicts


with Pipeline(name="vision_generation_pipeline") as pipeline:
    loader = LoadDataFromDicts(
        data=[
            {
                "instruction": "What’s in this image?",
                "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
        ],
    )

    llm = OpenAILLM(
        model="gpt-4o-mini",
    )

    vision = TextGenerationWithImage(
        name="vision_gen",
        llm=llm,
        image_type="url"  # (1)
    )

    loader >> vision

image_type 可以是指向图像的 URL、base64 字符串表示形式或 PIL 图像，请查看 VisionGeneration 以获取更多信息。

图像

问题

这张图片里有什么？

这张图片描绘了一个风景优美的景观，其中有一条木制走道或小路穿过一片茂盛的绿色沼泽或田野。该区域周围环绕着高高的草和各种灌木，背景中可能可以看到树木。天空是蓝色的，有一些淡淡的云彩，预示着美好的一天。总的来说，它呈现出一个宁静的自然环境，非常适合散步或自然观察。

完整的管道可以在以下示例中运行

运行完整的管道

python examples/text_generation_with_image.py

text_generation_with_image.py

# Copyright 2023-present, Argilla, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://apache.ac.cn/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from distilabel.models.llms import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromDicts
from distilabel.steps.tasks.text_generation_with_image import TextGenerationWithImage

with Pipeline(name="vision_generation_pipeline") as pipeline:
    loader = LoadDataFromDicts(
        data=[
            {
                "instruction": "What’s in this image?",
                "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            }
        ],
    )

    llm = InferenceEndpointsLLM(
        model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",
    )

    vision = TextGenerationWithImage(name="vision_gen", llm=llm, image_type="url")

    loader >> vision


if __name__ == "__main__":
    distiset = pipeline.run(use_cache=False)
    distiset.push_to_hub("plaguss/test-vision-generation-Llama-3.2-11B-Vision-Instruct")

示例数据集可以在 plaguss/test-vision-generation-Llama-3.2-11B-Vision-Instruct 中查看。

在 distilabel 中使用图像进行文本生成¶

在 `distilabel` 中使用图像进行文本生成¶