arcee-ai/Llama-3.1-SuperNova-Lite · Would love a 70B version

I mostly ran this as fp16 since it fit on my GPU and I found it to be better than just about all other 8B models, only to be outdone by models in the 20B neighborhood.
It handles tool calling much better than L3.1, and advanced system prompts like using 'thinking', 'reflection', and 'output' tags from the Reflection-Llama-3.1- 70B suggested system prompt.
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B

I can run Llama-3.1-70B IQ4 (40Gb) split across my CPU and GPU and get 1.4 tokens/sec which is slow but worth it when I need it, and I'd love to compare it to a 70B Supernova model. Maybe think about incorporating the tag system from Reflection during fine tuning as well.

Great work, looking forward to the next release.