什么是 Lumina-Image 2.0?
Lumina-Image 2.0 是组合了 Unified Next-DiT 和 Flux 系 VAE 的 2.6B 参数的图像生成模型。
采用了 Gemma 2B 系文本编码器,模型本体比 SD3 和 FLUX Pro 相当小,设计目标和 AuraFlow 一样是“比较轻量且日常容易使用的基础模型”定位。 尺寸虽小但提示词遵从性高也是特征,作为次世代基础模型候补之一备受瞩目。
模型的下载
-
diffusion_models
-
text_encoders
-
vae
📂ComfyUI/
└──📂models/
├── 📂diffusion_models/
│ └── lumina_2_model_bf16.safetensors
├── 📂text_encoders/
│ └── gemma_2_2b_fp16.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 47,
"last_link_id": 68,
"nodes": [
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 64
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"worst quality"
]
},
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"A whimsical 3D illustration of flowers with bulbous red petals and smooth green stems. Soft, diffused lighting and a clean, off-white background"
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
777,
"fixed",
25,
4,
"res_multistep",
"normal",
1
]
},
{
"id": 44,
"type": "CLIPLoader",
"pos": [
188.4966278076172,
274.4528503417969
],
"size": [
270,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
63,
64
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"gemma_2_2b_fp16.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\lumina_2_model_bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-26.957896179097734,
-25.496790894387402
],
"size": [
309.9175109863281,
228.3336181640625
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [lumina_2_model_bf16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/diffusion_models)\n- [gemma_2_2b_fp16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/vae)\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── lumina_2_model_bf16.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_fp16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
393.70000000000005,
455.90000000000003
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
63,
44,
0,
6,
0,
"CLIP"
],
[
64,
44,
0,
33,
0,
"CLIP"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917354,
"offset": [
126.95789617909773,
126.7067908943874
]
},
"frontendVersion": "1.35.0",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
Neta Lumina
Neta-Lumina 是以 Lumina-Image 2.0 为基础的 动漫面向微调模型。
很有动漫模型的风格,也对应 Danbooru 标签,特征是接受中文・英语・日语和多语言的提示词。
模型的下载
-
diffusion_models
📂ComfyUI/
└──📂models/
└── 📂diffusion_models/
└── neta-lumina-v1.0.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 47,
"last_link_id": 68,
"nodes": [
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 44,
"type": "CLIPLoader",
"pos": [
188.4966278076172,
274.4528503417969
],
"size": [
270,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
63,
64
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"gemma_2_2b_fp16.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n\n1girl, solo, red hair, long hair, wet hair, red spider lily, flower in mouth, casual clothes, white blouse, light cardigan, floating on water, water surface, gentle ripples, lying on back, upper body, top-down view, detailed eyes, cinematic anime style, high-end anime, refined lineart, subtle shading, soft glow, morning, sunrise, golden hour lighting, sparkling water, light particles, best quality,\nA cinematic, high-quality anime illustration of a red-haired young woman floating quietly on the surface of calm water at sunrise, viewed from above in a medium shot. She wears simple, modern clothing—a white blouse layered with a light cardigan—that clings slightly to her as the fabric is soaked, giving a natural sense of weight and texture without feeling eroticized. A vivid red spider lily rests gently between her lips, its petals contrasting against her pale skin and soft, wet hair that fans out around her in the water. Warm golden-hour sunlight streams in from one side, scattering fine sparkles across the water surface and creating delicate bokeh-like highlights around her face and shoulders. The shading and coloring are polished like a high-budget anime film, with refined linework, nuanced gradients, and carefully rendered reflections. Subtle ripples expand from her body, and the overall composition focuses on a serene, poetic mood with precise detail in her expression, hair, clothing folds, and the spider lily."
]
},
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 64
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate low-quality images based on textual prompts <Prompt Start>\nblurry, worst quality, low quality, deformed hands, bad anatomy,\nextra limbs, poorly drawn face, mutated, extra eyes, bad proportions"
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
123456,
"fixed",
30,
5.5,
"res_multistep",
"linear_quadratic",
1
]
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\neta-lumina-v1.0.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
380.99299999999994,
449.66200000000003
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-54.62550631501746,
-44.09894580183908
],
"size": [
309.9175109863281,
228.3336181640625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [neta-lumina-v1.0.safetensors](https://huggingface.co/neta-art/Neta-Lumina/blob/main/Unet/neta-lumina-v1.0.safetensors)\n- [gemma_2_2b_fp16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/vae)\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── neta-lumina-v1.0.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_fp16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
63,
44,
0,
6,
0,
"CLIP"
],
[
64,
44,
0,
33,
0,
"CLIP"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6830134553650705,
"offset": [
154.62550631501745,
144.09894580183908
]
},
"frontendVersion": "1.35.0",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
- 采样器遵从官方设定,使用
res_multistep/linear_quadratic。
提示词稍微有特征,需要在实际想让其画的文本前写 系统提示词。
You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>
1girl, portrait, ...
详情请参照官方的 Prompt Book。
NetaYume Lumina
也有以 Neta Lumina 为基础,进一步微调的名为 NetaYume Lumina 的模型。
机会难得,这里也介绍一下吧。
模型的下载
-
diffusion_models
📂ComfyUI/
└──📂models/
└── 📂diffusion_models/
└── NetaYumev4_unet.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 47,
"last_link_id": 68,
"nodes": [
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1024,
1
]
},
{
"id": 44,
"type": "CLIPLoader",
"pos": [
188.4966278076172,
274.4528503417969
],
"size": [
270,
106
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
63,
64
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "CLIPLoader"
},
"widgets_values": [
"gemma_2_2b_fp16.safetensors",
"lumina2",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 64
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate low-quality images based on textual prompts <Prompt Start>\nblurry, worst quality, low quality, deformed hands, bad anatomy,\nextra limbs, poorly drawn face, mutated, extra eyes, bad proportions"
]
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
380.99299999999994,
449.66200000000003
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\NetaYumev4_unet.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 63
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n\n1girl, solo, white hair, long hair, wet hair, white spider lily, flowers on water, simple white dress, long dress, floating on red water, crimson sea, side view, close-up, face focus, tilted angle, diagonal composition, detailed eyes, cinematic anime style, high-end anime, refined lineart, dramatic lighting, glowing reflections, best quality,\nA cinematic, high-quality anime illustration of a white-haired young woman floating in a calm crimson sea at twilight, shown in a close-up side view along the water’s surface. She wears a modest, simple long white dress that spreads softly in the red water, its wet fabric drifting and folding with gentle motion. Several delicate white spider lilies float on the surface around her, some catching on the hem of her dress and near her shoulder, their pale petals forming a striking contrast against the deep red sea. The composition uses a slightly tilted, diagonal angle so that her face and the waterline create a dynamic, film-like frame, with her serene expression and detailed eyes as the main focus. Dramatic but controlled lighting makes the red water glow with subtle highlights and reflections, while soft specular light traces the contours of her face, hair, and dress. The rendering style resembles a high-budget anime film, with refined linework, nuanced gradients, and carefully painted reflections, emphasizing the interplay of white and crimson and the quiet, otherworldly atmosphere of the scene."
]
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
7777,
"fixed",
30,
5.5,
"res_multistep",
"linear_quadratic",
1
]
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-54.62550631501746,
-44.09894580183908
],
"size": [
309.9175109863281,
228.3336181640625
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n- [NetaYumev4_unet.safetensors](https://huggingface.co/duongve/NetaYume-Lumina-Image-2.0/blob/main/Unet/v4/NetaYumev4_unet.safetensors)\n- [gemma_2_2b_fp16.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/text_encoders)\n- [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/tree/main/split_files/vae)\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── NetaYumev4_unet.safetensors\n ├── 📂text_encoders/\n │ └── gemma_2_2b_fp16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
63,
44,
0,
6,
0,
"CLIP"
],
[
64,
44,
0,
33,
0,
"CLIP"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.8264462809917354,
"offset": [
154.62550631501745,
144.09894580183908
]
},
"frontendVersion": "1.35.0",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
NewBie image Exp0.1
NewBie-image(Exp0.1)是立足于 Lumina 架构研究的知识,以 Next-DiT 为地基设计的 NewBie 独自架构的动漫面向 T2I 模型。使用了更强力的文本编码器,设计为能用 XML 形式提示词(结构化标签)进行更细致地控制。
这个模型还只进行了 20% 的训练。根据今后的更新,workflow 可能会有变更。
模型的下载
-
diffusion models
-
text encoders
-
vae
📂ComfyUI/
└──📂models/
├── 📂diffusion_models/
│ └── NewBie-Image-Exp0.1-bf16.safetensors
├── 📂text_encoders/
│ ├── gemma_3_4b_it_bf16.safetensors
│ └── jina_clip_v2_bf16.safetensors
└── 📂vae/
└── ae.safetensors
text2image

{
"id": "18404b37-92b0-4d11-a39c-ae941838eb83",
"revision": 0,
"last_node_id": 49,
"last_link_id": 70,
"nodes": [
{
"id": 43,
"type": "VAELoader",
"pos": [
985.1763763427734,
88.72033833561756
],
"size": [
234.05543518066406,
58
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
62
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
],
"color": "#322",
"bgcolor": "#533"
},
{
"id": 8,
"type": "VAEDecode",
"pos": [
1246.2337646484375,
211.26541137695312
],
"size": [
170,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 52
},
{
"name": "vae",
"type": "VAE",
"link": 62
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"slot_index": 0,
"links": [
68
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 41,
"type": "UNETLoader",
"pos": [
310.51028121398076,
36.66623591530498
],
"size": [
270,
82
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"Lumina\\NewBie-Image-Exp0.1-bf16.safetensors",
"default"
],
"color": "#323",
"bgcolor": "#535"
},
{
"id": 33,
"type": "CLIPTextEncode",
"pos": [
507,
378
],
"size": [
339.84503173828125,
102.47611236572266
],
"flags": {
"collapsed": false
},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 70
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
55
]
}
],
"title": "CLIP Text Encode (Negative Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"<e621_tags>furry</e621_tags>\n\n<danbooru_tags>\n furry, english_text, chinese_text, korean_text, speech_bubble, logo, signature, watermark, web_address,\n artist_name, character_name, copyright_name, twitter_username,\n dated, low_score, worst_quality, low_quality, bad_quality, lowres, blurry, blurred, pixelated,\n compression_artifacts, jpeg_artifacts,\n bad_anatomy, deformed_hands, deformed_fingers, fused_fingers, missing_fingers,\n extra_limbs, extra_arms, extra_legs, extra_fingers, extra_digits,\n wrong_hands, ugly_hands, bad_proportions, poorly_drawn_face, extra_eyes, mutated\n</danbooru_tags>\n\n<resolution>low_resolution</resolution>\n"
]
},
{
"id": 27,
"type": "EmptySD3LatentImage",
"pos": [
579.1014404296875,
547
],
"size": [
267.74359130859375,
106
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
51
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "EmptySD3LatentImage"
},
"widgets_values": [
1024,
1536,
1
]
},
{
"id": 45,
"type": "ModelSamplingAuraFlow",
"pos": [
613.4235101781512,
36.89046000588115
],
"size": [
233.42152156013003,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 65
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.41",
"Node name for S&R": "ModelSamplingAuraFlow"
},
"widgets_values": [
6.000000000000001
]
},
{
"id": 49,
"type": "DualCLIPLoader",
"pos": [
177.29417778458955,
290.9165148987406
],
"size": [
279.4214876033057,
130
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
69,
70
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.5.1",
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"gemma_3_4b_it_bf16.safetensors",
"jina_clip_v2_bf16.safetensors",
"newbie",
"default"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 31,
"type": "KSampler",
"pos": [
904.2318115234375,
210.53184509277344
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 66
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 67
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 51
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"slot_index": 0,
"links": [
52
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "KSampler"
},
"widgets_values": [
7777,
"fixed",
30,
4.5,
"res_multistep",
"linear_quadratic",
1
]
},
{
"id": 47,
"type": "SaveImage",
"pos": [
1442.129204705959,
211.26541137695312
],
"size": [
353.4448074477016,
493.73916861441126
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 68
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.76"
},
"widgets_values": [
"ComfyUI"
]
},
{
"id": 6,
"type": "CLIPTextEncode",
"pos": [
507,
190
],
"size": [
339.84503173828125,
123.01304626464844
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 69
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"slot_index": 0,
"links": [
67
]
}
],
"title": "CLIP Text Encode (Positive Prompt)",
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.39",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"<character_1>\n <n>character_1</n>\n <gender>1girl</gender>\n <appearance>\n solo, black_hair, long_hair, wet_hair, floating_hair,\n sharp_eyes, intense_gaze\n </appearance>\n <clothing>\n white_dress, wet_clothes\n </clothing>\n <expression>\n serious, stoic, closed_mouth\n </expression>\n <action>\n underwater, sinking, bubbles, bubble_trail, water_droplets\n </action>\n <position>\n portrait, upper_body, dynamic_angle, dutch_angle, diagonal_composition\n </position>\n</character_1>\n\n<general_tags>\n <style>\n anime_style, key_visual, official_art, illustration,\n refined_lineart, clean_lineart, high_contrast\n </style>\n <background>\n underwater, deep_blue_water, water_surface, waterline,\n caustics, light_rays, reflections\n </background>\n <atmosphere>\n cool, dramatic, cinematic, ethereal\n </atmosphere>\n <quality>\n masterpiece, best_quality, very_aesthetic, no_text\n </quality>\n <resolution>max_high_resolution</resolution>\n</general_tags>\n"
]
},
{
"id": 46,
"type": "MarkdownNote",
"pos": [
-77.16495034206478,
-42.42937518851968
],
"size": [
344.97886071896573,
240.85554679796087
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [],
"outputs": [],
"properties": {},
"widgets_values": [
"## models\n* [NewBie-Image-Exp0.1-bf16.safetensors](https://huggingface.co/Comfy-Org/NewBie-image-Exp0.1_repackaged/blob/main/split_files/diffusion_models/NewBie-Image-Exp0.1-bf16.safetensors)\n* [gemma_3_4b_it_bf16.safetensors](https://huggingface.co/Comfy-Org/NewBie-image-Exp0.1_repackaged/blob/main/split_files/text_encoders/gemma_3_4b_it_bf16.safetensors)\n* [jina_clip_v2_bf16.safetensors](https://huggingface.co/Comfy-Org/NewBie-image-Exp0.1_repackaged/blob/main/split_files/text_encoders/jina_clip_v2_bf16.safetensors)\n* [ae.safetensors](https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged/blob/main/split_files/vae/ae.safetensors)\n\n```\n📂ComfyUI/\n└──📂models/\n ├── 📂diffusion_models/\n │ └── NewBie-Image-Exp0.1-bf16.safetensors\n ├── 📂text_encoders/\n │ ├── gemma_3_4b_it_bf16.safetensors\n │ └── jina_clip_v2_bf16.safetensors\n └── 📂vae/\n └── ae.safetensors\n\n```"
],
"color": "#323",
"bgcolor": "#535"
}
],
"links": [
[
51,
27,
0,
31,
3,
"LATENT"
],
[
52,
31,
0,
8,
0,
"LATENT"
],
[
55,
33,
0,
31,
2,
"CONDITIONING"
],
[
62,
43,
0,
8,
1,
"VAE"
],
[
65,
41,
0,
45,
0,
"MODEL"
],
[
66,
45,
0,
31,
0,
"MODEL"
],
[
67,
6,
0,
31,
1,
"CONDITIONING"
],
[
68,
8,
0,
47,
0,
"IMAGE"
],
[
69,
49,
0,
6,
0,
"CLIP"
],
[
70,
49,
0,
33,
0,
"CLIP"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.9090909090909091,
"offset": [
308.41858706005223,
258.7552199824561
]
},
"frontendVersion": "1.36.7",
"VHS_latentpreview": false,
"VHS_latentpreviewrate": 0,
"VHS_MetadataImage": true,
"VHS_KeepIntermediate": true
},
"version": 0.4
}
提示词推荐 XML 形式(用标签区隔的结构化)。
<general_tags>
<style>
anime_style, key_visual, official_art, illustration,
refined_lineart, clean_lineart, high_contrast
</style>
<background>
underwater, deep_blue_water, water_surface, waterline,
caustics, light_rays, reflections
</background>
</general_tags>
虽说如此,用自然语言写也能没问题地生成,所以首先请轻松地尝试一下。
详情请参照官方的提示词指南。