Save_pretained showing larger files than the one in the repo

#23
by adityakad - opened

#Hi, I ran the below steps.
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")
base_model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto")

#I saved the model after this.
base_model.save_pretrained("/home/ec2-user/SageMaker/models/dolly-v2-3b", from_pt=True)

When I see the saved files, it is different from the one you see in the repo.
For instance, I see one 5.68GB bin file in the repo but the saved model file downloaded 2 bin files. One file is 10.1GB and other is 1.15GB. This does not match the files in this repo.

Any idea why this is happening? What are the implications of this large model size?
Here is what I get after saving the pretrained model.

Screenshot 2023-05-30 at 7.48.33 PM.png

Databricks org

It's because you did not load in 16-bit, I'd imagine. You're saving weights in 2x the precision and storage space.

I see, so when I tried running the results from the saved model, the latency was 3-4 times higher than the one from_pretrained. Shouldn't the latency be the same in both the cases?

Databricks org

No, because you are doing more than twice the work in 32-bit math. I don't see why you are doing it this way?

Based on what you say, I am loading it originally from HuggingFace in 32 bit as well. Is that right? But the latency is really low on that one. How is that happening?

Databricks org

Ah ok I mistook the setup, you're benchmarking loading this way too without saving. Yeah should be the same thing. Check the torch_dtype in both cases to confirm. Otherwise not sure why or maybe I'm wrong about the precision being the issue.

Are you sure you are unloading the first model before loading the second ? Otherwise you might load the second only partly on the GPU

When I load the model first from HuggingFace, it does show downloading 5.68Gb like in the repo.

I am saving this very same model.

What do you mean by unloading the first model? How do I do that?

These are the exact steps:

Screenshot 2023-05-30 at 8.50.36 PM.png

srowen changed discussion status to closed

Sign up or log in to comment