zero embedding for token_id == `1` (<BOS>)

#7
by poedator - opened

WARNING : This model and its 160M sibling have ALL ZEROS EMBEDDING for token_id==1 (<BOS>). This creates confusion in measuring StatisCache sequence length. The transformers maintainers chose to detect it based on non-zero cache values, but the all-zeros embedding distorts the get_seq_length(). No blame here, just a combination of design decisions with unpredictable results.
See the relevant transformers line here https://github.com/huggingface/transformers/blob/8c12690cecbb97e187861e386f7a0ac790e4236c/src/transformers/cache_utils.py#L414

Sign up or log in to comment