--- library_name: transformers language: - en base_model: - RozGrov/NemoDori-v0.2-12B-MN-BT datasets: - Inv/c2-logs-cleaned-deslopped tags: - unsloth - trl - sft - merge - mergekit - lazymergekit - RozGrov/NemoDori-v0.2-12B-MN-BT --- # NemoDori-v0.2-Frankend.2-v1-16.6B _Experimental!_
A more upscaled version of [**NemoDori-v0.2-12B-MN-BT**](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT), now at **16.6B**. This is also my first _successful_(?) fine-tuned model using **500 random rows** from dataset [Inv/c2-logs-cleaned-deslopped](https://huggingface.co/datasets/Inv/c2-logs-cleaned-deslopped) in 70 steps. The reason I used that dataset is... just for testing. What I thought is, if I can replace/fill up those duplicated layers by training it, maybe that makes it better. NemoDori v0.2 is my best merge model so far, but I'm afraid it's still 12B, not much to improve after merging all kinds of models.
Again, I'm just interested to play with these LLM stuff for awhile. Maybe more version of this will come out later. As far from my short testing, this model has become a little more strict than the parent model (v0.2).I haven't notice anything major yet.
You can use ST with this preset [here](https://huggingface.co/RozGrov/NemoDori-v0.2-Frankend.2-v1-16.6B/resolve/main/NemoDori-v0.2-Frankend.2-v1-16.6B%20-%20ST%20Preset.json). Unfortunately, you can't go wild with this model (from my short tests), sometimes it makes little senses, and sometimes... you will get a reddit link (i'm not kidding). I didn't have enough time to test it, because it's more pricey without quantization.
I trust @mradermacher to make the quants version of this model. (Thank you so much for making those GGUF on my models ^_^) And... yeah... Your feedbacks are always welcome. Let me know what's your experience using this model, that would be really appreciated.
Take care everyone. ### Merge Method This model was merged from the following models using the `passthrough` merge method: * [RozGrov/NemoDori-v0.2-12B-MN-BT](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT) ## 🧩 Configuration ```yaml slices: - sources: - model: RozGrov/NemoDori-v0.2-12B-MN-BT layer_range: [0, 8] - sources: - model: RozGrov/NemoDori-v0.2-12B-MN-BT layer_range: [8, 24] parameters: scale: - filter: q_proj value: 0.919 - filter: k_proj value: 0.919 - value: 1.0 - sources: - model: RozGrov/NemoDori-v0.2-12B-MN-BT layer_range: [16, 24] parameters: scale: - filter: q_proj value: 0.7 - filter: k_proj value: 0.7 - filter: o_proj value: 0.0 - filter: down_proj value: 0.0 - value: 1.0 - sources: - model: RozGrov/NemoDori-v0.2-12B-MN-BT layer_range: [16, 32] parameters: scale: - filter: q_proj value: 0.919 - filter: k_proj value: 0.919 - value: 1.0 - sources: - model: RozGrov/NemoDori-v0.2-12B-MN-BT layer_range: [32, 40] merge_method: passthrough dtype: bfloat16 ``` ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "RozGrov/NemoDori-v0.2-Frankend.2-pre" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ```