zero embedding for token_id == `1` (<BOS>)
#7
by
poedator
- opened
WARNING : This model and its 160M sibling have ALL ZEROS EMBEDDING for token_id==1
(<BOS>
). This creates confusion in measuring StatisCache sequence length. The transformers maintainers chose to detect it based on non-zero cache values, but the all-zeros embedding distorts the get_seq_length()
. No blame here, just a combination of design decisions with unpredictable results.
See the relevant transformers line here https://github.com/huggingface/transformers/blob/8c12690cecbb97e187861e386f7a0ac790e4236c/src/transformers/cache_utils.py#L414