stabilityai/stable-diffusion-3-medium · Jetpack 6 +fp8 sd3 returns zeroes

Jul 7

•

I am abroad with my trusty suitcase AGX Orin attempting to run some tests requested by a client, I am running a fairly simple adaptation of the hello world example on the fp8 safetensors model. Everything appears to run without issue except that the resulting image is only zeroes.

My code:

import torch
from diffusers import StableDiffusion3Pipeline
import cv2
import numpy as np


model_path = "/usr/src/sd3/sd3_medium_incl_clips_t5xxlfp8.safetensors"
print('loading model')
pipe = StableDiffusion3Pipeline.from_single_file(
    model_path,
    torch_dtype=torch.float16,
)
print('model loaded')
pipe = pipe.to('cuda')


image = pipe(
    "A cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=10,
    guidance_scale=7.0,
).images[0]

image.save('image.jpg')
np_image = np.array(image)
cv_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
cv2.imshow("Generated Image", cv_image)

cv2.waitKey(0)
cv2.destroyAllWindows()
pass  # debug hook

My environment:

python:

absl-py==2.1.0
accelerate==0.32.1
AHRS==0.3.1
appdirs==1.4.4
apturl==0.5.2
astunparse==1.6.3
attrs==21.2.0
bcrypt==3.2.0
beniget==0.4.1
blinker==1.4
blobconverter==1.4.3
Brlapi==0.8.3
Brotli==1.0.9
ccsm==0.9.14.1
certifi==2020.6.20
chardet==4.0.0
charset-normalizer==3.3.2
click==8.0.3
colorama==0.4.4
coloredlogs==15.0.1
compizconfig-python==0.9.14.1
cpuset==1.6
cryptography==3.4.8
cupshelpers==1.0
cupy==13.2.0
cycler==0.11.0
dbus-python==1.2.18
decorator==4.4.2
defer==1.0.6
depthai==2.27.0.0
depthai-pipeline-graph==0.0.5
depthai-sdk==1.15.0
diffusers==0.29.2
distlib==0.3.4
distro==1.9.0
distro-info==1.1+ubuntu0.2
duplicity==0.8.21
evdev==1.4.0
fasteners==0.14.1
fastrlock==0.8.2
filelock==3.6.0
flatbuffers==24.3.25
fonttools==4.29.1
fs==2.4.12
fsspec==2024.6.0
future==0.18.2
gast==0.6.0
google-pasta==0.2.0
graphsurgeon==0.4.6
grpcio==1.64.1
h5py==3.11.0
httplib2==0.20.2
huggingface-hub==0.23.4
humanfriendly==10.0
idna==3.3
imageio==2.34.2
importlib-metadata==4.6.4
jeepney==0.7.1
jetson-stats @ file:///usr/src/jetson_stats
Jetson.GPIO==2.1.7
Jinja2==3.1.4
keras==3.4.1
keyring==23.5.0
kiwisolver==1.3.2
language-selector==0.1
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lazy_loader==0.4
libclang==18.1.1
lockfile==0.12.2
louis==3.20.0
lxml==4.8.0
lz4==3.1.3+dfsg
macaroonbakery==1.3.1
Mako==1.1.3
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.17.0
matplotlib==3.5.1
mdurl==0.1.2
ml-dtypes==0.3.2
monotonic==1.6
more-itertools==8.10.0
mpmath==0.0.0
namex==0.0.8
networkx==3.3
numpy==1.23.5
oauthlib==3.2.0
olefile==0.46
onboard==1.4.1
onnx==1.16.0
onnx-graphsurgeon==0.3.12
onnxruntime-gpu @ file:///usr/src/onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl
opencv-contrib-python==4.10.0.84
opt-einsum==3.3.0
optree==0.11.0
packaging==21.3
pandas==1.3.5
paramiko==2.9.3
pexpect==4.8.0
pillow==10.3.0
platformdirs==2.5.1
ply==3.11
protobuf==4.25.3
psutil==6.0.0
ptyprocess==0.7.0
pycairo==1.20.1
pycups==2.0.1
Pygments==2.18.0
PyGObject==3.42.1
PyJWT==2.3.0
pymacaroons==0.13.0
PyNaCl==1.5.0
PyOpenGL==3.1.5
pyparsing==2.4.7
PyQt5==5.15.6
PyQt5-sip==12.9.1
pyRFC3339==1.1
python-apt==2.4.0+ubuntu3
python-dateutil==2.8.1
python-dbusmock==0.27.5
python-debian==0.1.43+ubuntu1.1
pythran==0.10.0
pytube==15.0.0
PyTurboJPEG==1.6.4
pytz==2022.1
pyxdg==0.27
PyYAML==5.4.1
Qt.py==1.4.1
quickdl==0.0.2
regex==2024.5.15
requests==2.32.3
rich==13.7.1
safetensors==0.4.3
scikit-image==0.24.0
scipy==1.14.0
SecretStorage==3.3.1
sentencepiece==0.2.0
sentry-sdk==1.21.0
six==1.16.0
smbus2==0.4.3
sympy==1.9
systemd-python==234
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow @ file:///usr/src/wheels/tensorflow-2.16.1%2Bnv24.06-cp310-cp310-linux_aarch64.whl
tensorflow-io-gcs-filesystem==0.37.0
tensorrt==8.6.2
tensorrt-dispatch==8.6.2
tensorrt-lean==8.6.2
termcolor==2.4.0
tifffile==2024.6.18
tokenizers==0.19.1
torch @ file:///usr/src/wheels/torch-2.3.0-cp310-cp310-linux_aarch64.whl
torchaudio @ file:///usr/src/wheels/torchaudio-2.3.0%2B952ea74-cp310-cp310-linux_aarch64.whl
torchvision @ file:///usr/src/wheels/torchvision-0.18.0a0%2B6043bc2-cp310-cp310-linux_aarch64.whl
tqdm==4.66.4
transformers==4.42.3
types-pyside2==5.15.2.1.7
typing_extensions==4.12.2
ubuntu-drivers-common==0.0.0
ubuntu-pro-client==8001
uff==0.6.9
ufoLib2==0.13.1
ufw==0.36.1
unicodedata2==14.0.0
urllib3==2.2.2
urwid==2.1.2
virtualenv==20.13.0+ds
wadllib==1.3.6
Werkzeug==3.0.3
wrapt==1.16.0
xdg==5
xkit==0.0.0
xmltodict==0.13.0
zipp==1.0.0

torch, tensorflow and onnx are running from the official wheels supplied by Nvidia for jetpack 6

I have experimented with inference step values from 8 to 30 with no change, I suspect that there is something wonky with diffusers loading the model into fp16 from the fp8 quant. Some guidance would be helpful.

Output:

Manbehindthemadness

Jul 7

Update: I have now tried using both the fp8 and the fp16 safetensors with the same result mentioned above.

edgechangfu

Jul 8

I got the same result! Full black in the image.

Manbehindthemadness

Jul 8

I am now attempting a full and clean pull from the hub to see if the problem is related to opening the model as a single file, or some other compatibility element that hasn’t become clear.

edgechangfu

Jul 8

I am now attempting a full and clean pull from the hub to see if the problem is related to opening the model as a single file, or some other compatibility element that hasn’t become clear.

Ok. Thank you for your sharing. If you solve it, please let me know.

Manbehindthemadness

Jul 8

same result, an image mat of only zeroes (this time using the exact demo code)
Output:

Fetching 26 files: 100%|██████████| 26/26 [56:36<00:00, 130.62s/it]
Loading pipeline components...: 22%|██▏ | 2/9 [00:02<00:07, 1.04s/it]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 33%|███▎ | 3/9 [00:02<00:04, 1.48it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:00<00:00, 2.02it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.97it/s]
Loading pipeline components...: 100%|██████████| 9/9 [00:04<00:00, 2.01it/s]
100%|██████████| 1/1 [00:01<00:00, 1.37s/it]
100%|██████████| 50/50 [00:47<00:00, 1.04it/s]

Manbehindthemadness

Jul 8

@edgechangfu Can you downgrade to diffusers==0.29.0 and try again?

Output:

/usr/src/sd3/venv/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Loading pipeline components...: 56%|█████▌ | 5/9 [00:02<00:01, 2.10it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 67%|██████▋ | 6/9 [00:02<00:01, 2.48it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:00<00:00, 1.71it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.72it/s]
Loading pipeline components...: 100%|██████████| 9/9 [00:03<00:00, 2.26it/s]
0%| | 0/10 [00:00<?, ?it/s]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
10%|█ | 1/10 [00:01<00:12, 1.36s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
20%|██ | 2/10 [00:02<00:09, 1.19s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
30%|███ | 3/10 [00:03<00:07, 1.13s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
40%|████ | 4/10 [00:04<00:06, 1.10s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
50%|█████ | 5/10 [00:05<00:05, 1.08s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
60%|██████ | 6/10 [00:06<00:04, 1.06s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
70%|███████ | 7/10 [00:07<00:03, 1.05s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
80%|████████ | 8/10 [00:08<00:02, 1.04s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
90%|█████████ | 9/10 [00:09<00:01, 1.04s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
100%|██████████| 10/10 [00:10<00:00, 1.07s/it]

Manbehindthemadness

Jul 8

@edgechangfu please do get back to me on this when you can, if that solution works we need to move this over to: https://github.com/huggingface/diffusers

Manbehindthemadness

Jul 9

After some testing the results are actually weirder than I thought... I can basically "fiddle" with the settings...for a horribly long time. Suddenly it works, works well, and consistently.
Then, I will unload the model, go about my business, load it back for some testing... and it no longer works, I then have to repeat the process. Really quite strange.

hnhparitosh

Jul 21

Hi, @Manbehindthemadness @edgechangfu I am also facing the same issue.
Were you able to find a fix/workaround for this?

Manbehindthemadness

Jul 21

Hi, @Manbehindthemadness @edgechangfu I am also facing the same issue.
Were you able to find a fix/workaround for this?

I am sorry, I have not, I’m still getting an output of NaN. I have had to put my project on hold for now.