Fix error in config.json

#9
by pere - opened

@patrickvonplaten @Sanchit

The decoder_start_token_id should refer to the <|startoftranscript|> token in the vocabulary.

Thanks for the fix, I agree that this needs to be corrected as it should match v2 in it's generation config: https://huggingface.co/openai/whisper-large-v2/blob/696465c62215e36a9ab3f9b7672fe7749f1a1df5/config.json#L19

patrickvonplaten changed pull request status to merged

Good catch @pere ! We converted the generation_config standalone but missed the generation attributes in the config. The bos_token_id and eos_token_id also need updating: https://huggingface.co/openai/whisper-large-v3/discussions/25#6555f5d2ef6e96329fd5db2f

Sign up or log in to comment