Add TF weights
Validated by the pt_to_tf
CLI. Max crossload hidden state difference=7.629e-06; Max converted hidden state difference=7.629e-06.
Hi there π
I'm a TF maintainer at Hugging Face, and this is your most downloaded model whose weights can be automatically converted into TensorFlow, using our tools. We believe that having TF weights would be of interest to the community, and will further boost the visibility of the model.
I also don't want to be a source of spam! Let me know if you are interested in merging these TF weights, and if you would like me to open PRs with TF weights for other models that you own π€
Sounds good to me. I guess this does not break anything for other users, right? Should this become part of the conversion from MarianNMT?
Hi
@tiedeman
-- no, there are no files being overwritten, so nothing changes for existing users. On the upside, users would be able to call TFMarianMTModel .from_pretrained("Helsinki-NLP/opus-mt-zh-en")
and get a native TF model with the same functionality, to integrate into their TF ecosystem π₯
Our weight conversion tool also validates if the hidden states are the same for the PT and TF models. You can check the maximum error in the first commit -- anything below 1e-5 means the two models will behave the same way, even for autoregressive tasks like text generation. We decided to open exactly one PR with a TF weight conversion per organization and engage in a conversation with you because although we believe it will be positive for the TF community, a) we are not the owners of the models and b) it can be a massive source of spam for you π
Depending on your interest, there are 4 ways I can proceed:
- Nothing gets merged, and no more PRs like this are opened;
- We merge this one, but no more PRs like this are opened;
- We merge this one, and I will open more PRs like this over time;
- (3. but with no hub notifications) We merge this one, and you give me the authorization to use admin privileges to push validated TF conversions directly into Helsinki-NLP repositories.
Let me know how you would like to proceed!
Actually, my apologies -- I've designed a stricter set of checks (https://github.com/huggingface/transformers/pull/17588), to ensure TF users enjoy the same model experience as PT users, and there is something wrong with this weight conversion. The most likely cause is some code issue in the TF implementation of this architecture.
I will close this PR for the time being π