Jetpack 6 +fp8 sd3 returns zeroes

#161
by Manbehindthemadness - opened

I am abroad with my trusty suitcase AGX Orin attempting to run some tests requested by a client, I am running a fairly simple adaptation of the hello world example on the fp8 safetensors model. Everything appears to run without issue except that the resulting image is only zeroes.

My code:

import torch
from diffusers import StableDiffusion3Pipeline
import cv2
import numpy as np


model_path = "/usr/src/sd3/sd3_medium_incl_clips_t5xxlfp8.safetensors"
print('loading model')
pipe = StableDiffusion3Pipeline.from_single_file(
    model_path,
    torch_dtype=torch.float16,
)
print('model loaded')
pipe = pipe.to('cuda')


image = pipe(
    "A cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=10,
    guidance_scale=7.0,
).images[0]

image.save('image.jpg')
np_image = np.array(image)
cv_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
cv2.imshow("Generated Image", cv_image)

cv2.waitKey(0)
cv2.destroyAllWindows()
pass  # debug hook

My environment:

Screenshot from 2024-07-06 12-07-49.png

python:

absl-py==2.1.0
accelerate==0.32.1
AHRS==0.3.1
appdirs==1.4.4
apturl==0.5.2
astunparse==1.6.3
attrs==21.2.0
bcrypt==3.2.0
beniget==0.4.1
blinker==1.4
blobconverter==1.4.3
Brlapi==0.8.3
Brotli==1.0.9
ccsm==0.9.14.1
certifi==2020.6.20
chardet==4.0.0
charset-normalizer==3.3.2
click==8.0.3
colorama==0.4.4
coloredlogs==15.0.1
compizconfig-python==0.9.14.1
cpuset==1.6
cryptography==3.4.8
cupshelpers==1.0
cupy==13.2.0
cycler==0.11.0
dbus-python==1.2.18
decorator==4.4.2
defer==1.0.6
depthai==2.27.0.0
depthai-pipeline-graph==0.0.5
depthai-sdk==1.15.0
diffusers==0.29.2
distlib==0.3.4
distro==1.9.0
distro-info==1.1+ubuntu0.2
duplicity==0.8.21
evdev==1.4.0
fasteners==0.14.1
fastrlock==0.8.2
filelock==3.6.0
flatbuffers==24.3.25
fonttools==4.29.1
fs==2.4.12
fsspec==2024.6.0
future==0.18.2
gast==0.6.0
google-pasta==0.2.0
graphsurgeon==0.4.6
grpcio==1.64.1
h5py==3.11.0
httplib2==0.20.2
huggingface-hub==0.23.4
humanfriendly==10.0
idna==3.3
imageio==2.34.2
importlib-metadata==4.6.4
jeepney==0.7.1
jetson-stats @ file:///usr/src/jetson_stats
Jetson.GPIO==2.1.7
Jinja2==3.1.4
keras==3.4.1
keyring==23.5.0
kiwisolver==1.3.2
language-selector==0.1
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lazy_loader==0.4
libclang==18.1.1
lockfile==0.12.2
louis==3.20.0
lxml==4.8.0
lz4==3.1.3+dfsg
macaroonbakery==1.3.1
Mako==1.1.3
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.17.0
matplotlib==3.5.1
mdurl==0.1.2
ml-dtypes==0.3.2
monotonic==1.6
more-itertools==8.10.0
mpmath==0.0.0
namex==0.0.8
networkx==3.3
numpy==1.23.5
oauthlib==3.2.0
olefile==0.46
onboard==1.4.1
onnx==1.16.0
onnx-graphsurgeon==0.3.12
onnxruntime-gpu @ file:///usr/src/onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl
opencv-contrib-python==4.10.0.84
opt-einsum==3.3.0
optree==0.11.0
packaging==21.3
pandas==1.3.5
paramiko==2.9.3
pexpect==4.8.0
pillow==10.3.0
platformdirs==2.5.1
ply==3.11
protobuf==4.25.3
psutil==6.0.0
ptyprocess==0.7.0
pycairo==1.20.1
pycups==2.0.1
Pygments==2.18.0
PyGObject==3.42.1
PyJWT==2.3.0
pymacaroons==0.13.0
PyNaCl==1.5.0
PyOpenGL==3.1.5
pyparsing==2.4.7
PyQt5==5.15.6
PyQt5-sip==12.9.1
pyRFC3339==1.1
python-apt==2.4.0+ubuntu3
python-dateutil==2.8.1
python-dbusmock==0.27.5
python-debian==0.1.43+ubuntu1.1
pythran==0.10.0
pytube==15.0.0
PyTurboJPEG==1.6.4
pytz==2022.1
pyxdg==0.27
PyYAML==5.4.1
Qt.py==1.4.1
quickdl==0.0.2
regex==2024.5.15
requests==2.32.3
rich==13.7.1
safetensors==0.4.3
scikit-image==0.24.0
scipy==1.14.0
SecretStorage==3.3.1
sentencepiece==0.2.0
sentry-sdk==1.21.0
six==1.16.0
smbus2==0.4.3
sympy==1.9
systemd-python==234
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow @ file:///usr/src/wheels/tensorflow-2.16.1%2Bnv24.06-cp310-cp310-linux_aarch64.whl
tensorflow-io-gcs-filesystem==0.37.0
tensorrt==8.6.2
tensorrt-dispatch==8.6.2
tensorrt-lean==8.6.2
termcolor==2.4.0
tifffile==2024.6.18
tokenizers==0.19.1
torch @ file:///usr/src/wheels/torch-2.3.0-cp310-cp310-linux_aarch64.whl
torchaudio @ file:///usr/src/wheels/torchaudio-2.3.0%2B952ea74-cp310-cp310-linux_aarch64.whl
torchvision @ file:///usr/src/wheels/torchvision-0.18.0a0%2B6043bc2-cp310-cp310-linux_aarch64.whl
tqdm==4.66.4
transformers==4.42.3
types-pyside2==5.15.2.1.7
typing_extensions==4.12.2
ubuntu-drivers-common==0.0.0
ubuntu-pro-client==8001
uff==0.6.9
ufoLib2==0.13.1
ufw==0.36.1
unicodedata2==14.0.0
urllib3==2.2.2
urwid==2.1.2
virtualenv==20.13.0+ds
wadllib==1.3.6
Werkzeug==3.0.3
wrapt==1.16.0
xdg==5
xkit==0.0.0
xmltodict==0.13.0
zipp==1.0.0

torch, tensorflow and onnx are running from the official wheels supplied by Nvidia for jetpack 6

I have experimented with inference step values from 8 to 30 with no change, I suspect that there is something wonky with diffusers loading the model into fp16 from the fp8 quant. Some guidance would be helpful.

Output:

image.jpg

Update: I have now tried using both the fp8 and the fp16 safetensors with the same result mentioned above.

I got the same result! Full black in the image.

I am now attempting a full and clean pull from the hub to see if the problem is related to opening the model as a single file, or some other compatibility element that hasn’t become clear.

I am now attempting a full and clean pull from the hub to see if the problem is related to opening the model as a single file, or some other compatibility element that hasn’t become clear.

Ok. Thank you for your sharing. If you solve it, please let me know.

same result, an image mat of only zeroes (this time using the exact demo code)
Output:

Fetching 26 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 26/26 [56:36<00:00, 130.62s/it]
Loading pipeline components...: 22%|β–ˆβ–ˆβ– | 2/9 [00:02<00:07, 1.04s/it]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 3/9 [00:02<00:04, 1.48it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:00<00:00, 2.02it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.97it/s]
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:04<00:00, 2.01it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:01<00:00, 1.37s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [00:47<00:00, 1.04it/s]

@edgechangfu Can you downgrade to diffusers==0.29.0 and try again?

Output:

/usr/src/sd3/venv/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Loading pipeline components...: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 5/9 [00:02<00:01, 2.10it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 6/9 [00:02<00:01, 2.48it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:00<00:00, 1.71it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00, 1.72it/s]
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:03<00:00, 2.26it/s]
0%| | 0/10 [00:00<?, ?it/s]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
10%|β–ˆ | 1/10 [00:01<00:12, 1.36s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
20%|β–ˆβ–ˆ | 2/10 [00:02<00:09, 1.19s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
30%|β–ˆβ–ˆβ–ˆ | 3/10 [00:03<00:07, 1.13s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
40%|β–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [00:04<00:06, 1.10s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [00:05<00:05, 1.08s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [00:06<00:04, 1.06s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [00:07<00:03, 1.05s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [00:08<00:02, 1.04s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:09<00:01, 1.04s/it]Passing scale via joint_attention_kwargs when not using the PEFT backend is ineffective.
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:10<00:00, 1.07s/it]

Screenshot from 2024-07-06 12-07-50.png
Generated Image_screenshot_08.07.20243.png

@edgechangfu please do get back to me on this when you can, if that solution works we need to move this over to: https://github.com/huggingface/diffusers

After some testing the results are actually weirder than I thought... I can basically "fiddle" with the settings...for a horribly long time. Suddenly it works, works well, and consistently.
Then, I will unload the model, go about my business, load it back for some testing... and it no longer works, I then have to repeat the process. Really quite strange.

Hi, @Manbehindthemadness @edgechangfu I am also facing the same issue.
Were you able to find a fix/workaround for this?

Hi, @Manbehindthemadness @edgechangfu I am also facing the same issue.
Were you able to find a fix/workaround for this?

I am sorry, I have not, I’m still getting an output of NaN. I have had to put my project on hold for now.

Sign up or log in to comment