What is Lumina-Image 2.0?
Lumina-Image 2.0 is a 2.6B parameter image generation model that combines Unified Next-DiT and Flux-based VAE.
While adopting the Gemma 2B text encoder, the model body is considerably smaller than SD3 and FLUX Pro, and like AuraFlow, it is designed aiming for the "relatively lightweight and easy-to-use base model" category. It is also characterized by high prompt adherence for its size, and attracted attention as one of the candidates for the next-generation base model.
However, since it uses Gemma 2B (2B parameters) as a text encoder, it should be noted that the VRAM usage for the text encoder is slightly larger compared to SD1.5 etc.
Model Download
-
diffusion_models
-
text_encoders
-
vae
📂ComfyUI/
└──📂models/
├── 📂diffusion_models/
│ └── lumina_2_model_bf16.safetensors
├── 📂text_encoders/
│ └── gemma_2_2b_fp16.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 47,
"last_link_id": 68,
"nodes": [
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 64
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"worst quality"
]
},
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A whimsical 3D illustration of flowers with bulbous red petals and smooth green stems. Soft, diffused lighting and a clean, off-white background"
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
777,
"fixed",
25,
4,
"res_multistep",
"normal",
1
]
},
{
"id": 44,
"type": "CLIPLoader",
"pos": [
188.4966278076172,
274.4528503417969
],
"size": [
270,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
63,
64
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"gemma_2_2b_fp16.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\lumina_2_model_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-26.957896179097734,
-25.496790894387402
],
"size": [
309.9175109863281,
228.3336181640625
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [lumina_2_model_bf16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/diffusion_models)\n- [gemma_2_2b_fp16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/vae)\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── lumina_2_model_bf16.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_fp16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
393.70000000000005,
455.90000000000003
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
63,
44,
0,
6,
0,
"CLIP"
],
[
64,
44,
0,
33,
0,
"CLIP"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917354,
"offset": [
126.95789617909773,
126.7067908943874
]
},
"frontendVersion": "1.35.0",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Neta Lumina
Neta-Lumina is a fine-tuned model for anime based on Lumina-Image 2.0.
Like an anime model, it also supports Danbooru tags and is characterized by accepting multi-language prompts such as Chinese, English, and Japanese.
Model Download
-
diffusion_models
📂ComfyUI/
└──📂models/
└── 📂diffusion_models/
└── neta-lumina-v1.0.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 47,
"last_link_id": 68,
"nodes": [
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 44,
"type": "CLIPLoader",
"pos": [
188.4966278076172,
274.4528503417969
],
"size": [
270,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
63,
64
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"gemma_2_2b_fp16.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n\n1girl, solo, red hair, long hair, wet hair, red spider lily, flower in mouth, casual clothes, white blouse, light cardigan, floating on water, water surface, gentle ripples, lying on back, upper body, top-down view, detailed eyes, cinematic anime style, high-end anime, refined lineart, subtle shading, soft glow, morning, sunrise, golden hour lighting, sparkling water, light particles, best quality,\nA cinematic, high-quality anime illustration of a red-haired young woman floating quietly on the surface of calm water at sunrise, viewed from above in a medium shot. She wears simple, modern clothing—a white blouse layered with a light cardigan—that clings slightly to her as the fabric is soaked, giving a natural sense of weight and texture without feeling eroticized. A vivid red spider lily rests gently between her lips, its petals contrasting against her pale skin and soft, wet hair that fans out around her in the water. Warm golden-hour sunlight streams in from one side, scattering fine sparkles across the water surface and creating delicate bokeh-like highlights around her face and shoulders. The shading and coloring are polished like a high-budget anime film, with refined linework, nuanced gradients, and carefully rendered reflections. Subtle ripples expand from her body, and the overall composition focuses on a serene, poetic mood with precise detail in her expression, hair, clothing folds, and the spider lily."
]
},
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 64
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate low-quality images based on textual prompts <Prompt Start>\nblurry, worst quality, low quality, deformed hands, bad anatomy,\nextra limbs, poorly drawn face, mutated, extra eyes, bad proportions"
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
123456,
"fixed",
30,
5.5,
"res_multistep",
"linear_quadratic",
1
]
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\neta-lumina-v1.0.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
380.99299999999994,
449.66200000000003
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-54.62550631501746,
-44.09894580183908
],
"size": [
309.9175109863281,
228.3336181640625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [neta-lumina-v1.0.safetensors](https://huggingface.co/neta-art/Neta-Lumina/blob/main/Unet/neta-lumina-v1.0.safetensors)\n- [gemma_2_2b_fp16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/vae)\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── neta-lumina-v1.0.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_fp16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
63,
44,
0,
6,
0,
"CLIP"
],
[
64,
44,
0,
33,
0,
"CLIP"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
154.62550631501745,
144.09894580183908
]
},
"frontendVersion": "1.35.0",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- Follow the official settings for the sampler, use
res_multistep/linear_quadratic.
Prompts are a bit unique, and you need to write a system prompt before the text you actually want to generate.
You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>
1girl, portrait, ...
Please refer to the official Prompt Book for details.
NetaYume Lumina
There is also a model called NetaYume Lumina, which is further fine-tuned based on Neta Lumina.
I will introduce this as well.
Model Download
-
diffusion_models
📂ComfyUI/
└──📂models/
└── 📂diffusion_models/
└── NetaYumev4_unet.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 47,
"last_link_id": 68,
"nodes": [
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 44,
"type": "CLIPLoader",
"pos": [
188.4966278076172,
274.4528503417969
],
"size": [
270,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
63,
64
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"gemma_2_2b_fp16.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 64
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate low-quality images based on textual prompts <Prompt Start>\nblurry, worst quality, low quality, deformed hands, bad anatomy,\nextra limbs, poorly drawn face, mutated, extra eyes, bad proportions"
]
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
380.99299999999994,
449.66200000000003
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\NetaYumev4_unet.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n\n1girl, solo, white hair, long hair, wet hair, white spider lily, flowers on water, simple white dress, long dress, floating on red water, crimson sea, side view, close-up, face focus, tilted angle, diagonal composition, detailed eyes, cinematic anime style, high-end anime, refined lineart, dramatic lighting, glowing reflections, best quality,\nA cinematic, high-quality anime illustration of a white-haired young woman floating in a calm crimson sea at twilight, shown in a close-up side view along the water’s surface. She wears a modest, simple long white dress that spreads softly in the red water, its wet fabric drifting and folding with gentle motion. Several delicate white spider lilies float on the surface around her, some catching on the hem of her dress and near her shoulder, their pale petals forming a striking contrast against the deep red sea. The composition uses a slightly tilted, diagonal angle so that her face and the waterline create a dynamic, film-like frame, with her serene expression and detailed eyes as the main focus. Dramatic but controlled lighting makes the red water glow with subtle highlights and reflections, while soft specular light traces the contours of her face, hair, and dress. The rendering style resembles a high-budget anime film, with refined linework, nuanced gradients, and carefully painted reflections, emphasizing the interplay of white and crimson and the quiet, otherworldly atmosphere of the scene."
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
7777,
"fixed",
30,
5.5,
"res_multistep",
"linear_quadratic",
1
]
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-54.62550631501746,
-44.09894580183908
],
"size": [
309.9175109863281,
228.3336181640625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [NetaYumev4_unet.safetensors](https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0/blob/main/Unet/v4/NetaYumev4_unet.safetensors)\n- [gemma_2_2b_fp16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/vae)\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── NetaYumev4_unet.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_fp16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
63,
44,
0,
6,
0,
"CLIP"
],
[
64,
44,
0,
33,
0,
"CLIP"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917354,
"offset": [
154.62550631501745,
144.09894580183908
]
},
"frontendVersion": "1.35.0",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
NewBie image Exp0.1
NewBie-image (Exp0.1) is an anime-focused T2I model with a unique NewBie architecture designed based on Next-DiT, incorporating insights from Lumina architecture research. It uses a more powerful text encoder and enables more detailed control with XML-formatted prompts (structured tags).
This model is only 20% trained. The workflow may change with future updates.
Model Download
-
diffusion models
-
text encoders
-
vae
📂ComfyUI/
└──📂models/
├── 📂diffusion_models/
│ └── NewBie-Image-Exp0.1-bf16.safetensors
├── 📂text_encoders/
│ ├── gemma_3_4b_it_bf16.safetensors
│ └── jina_clip_v2_bf16.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 49,
"last_link_id": 70,
"nodes": [
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\NewBie-Image-Exp0.1-bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 70
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"<e621_tags>furry</e621_tags>\n\n<danbooru_tags>\n furry, english_text, chinese_text, korean_text, speech_bubble, logo, signature, watermark, web_address,\n artist_name, character_name, copyright_name, twitter_username,\n dated, low_score, worst_quality, low_quality, bad_quality, lowres, blurry, blurred, pixelated,\n compression_artifacts, jpeg_artifacts,\n bad_anatomy, deformed_hands, deformed_fingers, fused_fingers, missing_fingers,\n extra_limbs, extra_arms, extra_legs, extra_fingers, extra_digits,\n wrong_hands, ugly_hands, bad_proportions, poorly_drawn_face, extra_eyes, mutated\n</danbooru_tags>\n\n<resolution>low_resolution</resolution>\n"
]
},
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1536,
1
]
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 49,
"type": "DualCLIPLoader",
"pos": [
177.29417778458955,
290.9165148987406
],
"size": [
279.4214876033057,
130
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
69,
70
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.5.1",
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"gemma_3_4b_it_bf16.safetensors",
"jina_clip_v2_bf16.safetensors",
"newbie",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
7777,
"fixed",
30,
4.5,
"res_multistep",
"linear_quadratic",
1
]
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
353.4448074477016,
493.73916861441126
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 69
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"<character_1>\n <n>character_1</n>\n <gender>1girl</gender>\n <appearance>\n solo, black_hair, long_hair, wet_hair, floating_hair,\n sharp_eyes, intense_gaze\n </appearance>\n <clothing>\n white_dress, wet_clothes\n </clothing>\n <expression>\n serious, stoic, closed_mouth\n </expression>\n <action>\n underwater, sinking, bubbles, bubble_trail, water_droplets\n </action>\n <position>\n portrait, upper_body, dynamic_angle, dutch_angle, diagonal_composition\n </position>\n</character_1>\n\n<general_tags>\n <style>\n anime_style, key_visual, official_art, illustration,\n refined_lineart, clean_lineart, high_contrast\n </style>\n <background>\n underwater, deep_blue_water, water_surface, waterline,\n caustics, light_rays, reflections\n </background>\n <atmosphere>\n cool, dramatic, cinematic, ethereal\n </atmosphere>\n <quality>\n masterpiece, best_quality, very_aesthetic, no_text\n </quality>\n <resolution>max_high_resolution</resolution>\n</general_tags>\n"
]
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-77.16495034206478,
-42.42937518851968
],
"size": [
344.97886071896573,
240.85554679796087
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n* [NewBie-Image-Exp0.1-bf16.safetensors](https://huggingface.co/Comfy-Org/NewBie-image-Exp0.1_repackaged/blob/main/split_files/diffusion_models/NewBie-Image-Exp0.1-bf16.safetensors)\n* [gemma_3_4b_it_bf16.safetensors](https://huggingface.co/Comfy-Org/NewBie-image-Exp0.1_repackaged/blob/main/split_files/text_encoders/gemma_3_4b_it_bf16.safetensors)\n* [jina_clip_v2_bf16.safetensors](https://huggingface.co/Comfy-Org/NewBie-image-Exp0.1_repackaged/blob/main/split_files/text_encoders/jina_clip_v2_bf16.safetensors)\n* [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── NewBie-Image-Exp0.1-bf16.safetensors\n ├── 📂text_encoders/\n │ ├── gemma_3_4b_it_bf16.safetensors\n │ └── jina_clip_v2_bf16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
],
[
69,
49,
0,
6,
0,
"CLIP"
],
[
70,
49,
0,
33,
0,
"CLIP"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
308.41858706005223,
258.7552199824561
]
},
"frontendVersion": "1.36.7",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Prompts in XML format (structured with tags) are recommended.
<general_tags>
<style>
anime_style, key_visual, official_art, illustration,
refined_lineart, clean_lineart, high_contrast
</style>
<background>
underwater, deep_blue_water, water_surface, waterline,
caustics, light_rays, reflections
</background>
</general_tags>
However, you can generate images without problems even if you write in natural language, so please feel free to try it first.
Please refer to the official prompt guide for details.