Maximum input+output tokens ??

by ha1772007 - opened Aug 21

Discussion

ha1772007

Aug 21

Maximum input+output tokens ??

CoolSpring

Owner Aug 21

CoolSpring/Qwen2-0.5B-Abyme was trained with a sequence_len of 4096, while Qwen/Qwen2-0.5B-Instruct has a 32768 context length capability in the Needle in a Haystack task as per Qwen team claimed on their releasing blog post. So I would guess a number in between, leaning towards the low side.

However, it is still a guess, and personally I haven't used this model since it was done for experimental purposes. I'm happy to see you are interested in my created model, please take care!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment