Support Lumina-Accessory Instruction Image Editing for Lumina2 #12304

syunar · 2025-09-09T05:30:23Z

What does this PR do?

Add Lumina Accessory, a multi-task instruction fine-tuning framework designed for the Lumina series (currently supporting Lumina-Image-2.0). The official repository is from Alpha-VLLM/Lumina-Accessory

Inference

import torch
from diffusers.utils import load_image
from diffusers import Lumina2AccessoryTransformer2DModel, Lumina2AccessoryPipeline


ckpt_path = "https://huggingface.co/Alpha-VLLM/Lumina-Accessory/blob/main/consolidated.00-of-01.pth"
transformer = Lumina2AccessoryTransformer2DModel.from_single_file(ckpt_path, torch_dtype=torch.bfloat16)
pipe = Lumina2AccessoryPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-Image-2.0", transformer=transformer, torch_dtype=torch.bfloat16
)
device = "cuda"
pipe.to(device)

test_cases = [
    {
        "task": "Image Infilling",
        "input_image": "https://github.com/Alpha-VLLM/Lumina-Accessory/blob/main/examples/case_1_condition.jpg?raw=true",
        "system_prompt": "You are an assistant designed to generate superior images with the highest degree of image-text alignment based on textual prompts and a partially masked image.",
        "prompt": "A classical oil painting of a young woman dressed in a modern DARK BLACK leather jacket.",
        "cond_position_type": "aligned",
    },
    {
        "task": "Palette Condition",
        "input_image": "https://github.com/Alpha-VLLM/Lumina-Accessory/blob/main/examples/case_2_condition.jpg?raw=true",
        "system_prompt": "You are an assistant designed to generate superior images with the highest degree of image-text alignment based on textual prompts and a palette map condition.",
        "prompt": "A still life photograph of a floral arrangement in a rustic, blue ceramic vase, centrally positioned on a round table draped with a delicate, white tablecloth. The bouquet features a mix of vibrant flowers, including large yellow roses, orange carnations, and smaller white blossoms, interspersed with green foliage and sprigs of orange buds. The vase is with the flowers extending upwards and outwards, creating a dynamic composition. In the background, hanging on the textured, beige wallpaper with a subtle floral pattern, is a traditional Chinese scroll featuring elegant calligraphy in classical Wenyanwen (文言文). The presence of the scroll adds a refined, cultural depth to the vintage setting. Soft, natural lighting casts gentle shadows, enhancing the textures of the vase and the lace. The overall atmosphere is serene and nostalgic, with a warm, muted color palette, medium depth of field, and a classic, timeless aesthetic.",
        "cond_position_type": "aligned",
    },
    {
        "task": "Depth Condition",
        "input_image": "https://github.com/Alpha-VLLM/Lumina-Accessory/blob/main/examples/case_3_condition.jpg?raw=true",
        "system_prompt": "You are an assistant designed to generate superior images with the highest degree of image-text alignment based on textual prompts and a depth map condition.",
        "prompt": "A contemplative photograph of a person with short brown hair, wearing a dark jacket, standing in the lower left foreground, facing away towards a field of tall, dried grasses. The grasses dominate the middle ground, their brown and beige tones contrasting with the dark jacket. The background features a cloudy, overcast sky with a soft, diffused light, creating a serene and introspective atmosphere. The composition is balanced with the person anchoring the lower left and the expansive sky occupying the upper half. The image has a muted color palette, emphasizing earthy tones and a sense of solitude. Photographic style, medium depth of field, natural lighting, soft focus, tranquil, introspective mood.",
        "cond_position_type": "aligned",
    },
]

generator = torch.Generator(device=device).manual_seed(0)
W, H = 1024, 1024
for test_case in test_cases:
    img = load_image(test_case["input_image"])
    w, h = img.size
    img = img.resize((W, H))

    output = pipe(
        image=img,
        prompt=test_case["prompt"],
        system_prompt=test_case["system_prompt"],
        negative_prompt="",
        num_inference_steps=25,
        width=img.size[0],
        height=img.size[1],
        num_images_per_prompt=1,
        guidance_scale=4.0,
        cfg_trunc_ratio=1.0,
        cfg_normalization=True,
        cond_position_type=test_case["cond_position_type"],
        generator=generator,
    ).images[0]

    output = output.resize((w, h))
    img = img.resize((w, h))

    img.save(f"test_lumina2_accessory_{test_case['task'].strip().replace(' ', '_').lower()}_input.png")
    output.save(f"test_lumina2_accessory_{test_case['task'].strip().replace(' ', '_').lower()}_output.png")

Sanity Check

Image Infilling

Input Image	Output Image

Palette Condition

Input Image	Output Image

Depth Condition

Input Image	Output Image

syunar · 2025-09-09T05:56:15Z

Hi @sayakpaul @yiyixuxu @a-r-r-o-w — ready for review. Let me know if I can do anything to make it easier.

syunar · 2025-09-16T04:42:10Z

@yiyixuxu gentle ping — I’ve fixed the failing checks, could you rerun the tests?

feat: add lumina2 accessory pipeline

9feaa84

sayakpaul requested review from DN6 and yiyixuxu September 9, 2025 07:17

syunar added 4 commits September 9, 2025 20:59

fix: remove debug code

6ea2232

fix: pos ids

84693b7

fix: correct init vae_scale_factor and add latent_chnnels

2dac79b

fix: docs

096c7fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Lumina-Accessory Instruction Image Editing for Lumina2 #12304

Support Lumina-Accessory Instruction Image Editing for Lumina2 #12304

Uh oh!

syunar commented Sep 9, 2025

Uh oh!

syunar commented Sep 9, 2025

Uh oh!

syunar commented Sep 16, 2025

Uh oh!

Uh oh!

Support Lumina-Accessory Instruction Image Editing for Lumina2 #12304

Are you sure you want to change the base?

Support Lumina-Accessory Instruction Image Editing for Lumina2 #12304

Uh oh!

Conversation

syunar commented Sep 9, 2025

What does this PR do?

Inference

Sanity Check

Uh oh!

syunar commented Sep 9, 2025

Uh oh!

syunar commented Sep 16, 2025

Uh oh!

Uh oh!