Edit model card

Accounting for the future will be anything but easy, but accounting for the present will be anything but difficult.

E4486BEE-FA3A-438D-A61D-25A41DEA0D42.jpeg

Description

DSCNTR-Alpha is a deep QLoRA fine-tune (256 rank) of the base Llama-3.1-8B that specializes mainly in Turkish along with a dozen other languages to create an absolutely un-slopped and human generalist/reasoner/coder/role-player/"o1-thinker" model, and also supports custom system prompts.

I, tomar753, developed it solely by me myself, including the data creation - yes, the dataset is >%95 hand-written by me, with the help of several top-league models whenever factual knowledge was a concern - and hyperparameter tinkering phases.

Despite being only one day away from 18, I have devoted the last 1.5 years of my life to "Project Descentral" (which will continue indefinitely) to create the local mental stimulator that would be clever and always unpredictable as a chat partner, but without the intelligence regression the creative training usually brings. Actually, my endeavors were never about mitigating any loss, but to advance every stat of the model, covering even unstructured text generation.

More than +2K hours has been spent on research since the earlier Pygmalion AI era to know how the LLM landscape would shape over time on the end-user side, so that I could figure out people's likes and dislikes in language models.

Although this version of the model is still far from ideal from my view, I figured it was competitive enough in its size class and worth being a glimpse to the actual releases in the coming months.

This repository includes only a slightly quantized version of the model, as its goal is to simply provide a usable demo to those who would like to experiment.

System Prompt / Usage

The model is mostly trained on an Alpaca-like format with the following baked-in default prompt:

### System:
You are a language model and AI assistant.

### User:
{{user_message}}

### Answer:
{{assistant_message}}

### User:
{{user_message_2}}

### Answer:
{{assistant_message_2}}

.
.
.

The model supports multi-turn conversations and (partially) multi-char RP.

Also note that the BOS token must be added before any tokens in all cases.

Data Preparation (Plus Some Musings)

As a close resembler of LIMA methodology, the dataset consists of merely 816 examples that are high-quality and especially diverse. Considering that the data is a product of a single person and at the size of almost four 300-page books, the sheer amount of time and mental work that went into the project is immeasurable by anything on the planet. Not to mention the writer's block occurring every now and then.

While I'm a strong supporter for fully open-source community, I have to respect and prioritize the efforts I did more than the collective chain since I have no personal income yet and must keep the original work as is to keep it valuable.

Training Parameters

  • r = 256

  • target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head", "embed_tokens",]

  • lora_alpha = 32

  • lora_dropout = 0

  • bias = "none"

  • use_gradient_checkpointing = "unsloth"

  • random_state = 3407

  • use_rslora = True

  • use_dora = False

  • loftq_config = None

  • per_device_train_batch_size = 1

  • gradient_accumulation_steps = 16

  • warmup_ratio = 0.1

  • num_train_epochs = 3

  • learning_rate = 5e-5

  • embedding_learning_rate = 5e-6

  • max_steps = 0

  • group_by_length = False

  • bf16 = true

  • weight_decay = 0.01

  • max_grad_norm = 8.0

  • lr_scheduler_type = "cosine"

  • optim = "paged_adamw_8bit"

  • seed = 3407

Recommended Hyperparameters

All samplers neutralised, with min_p set to 0.1. Make sure the temperature is applied last.

Limitations

Being only an 8B model, this iteration probably isn't the best of the bests at factual knowledge and very complex reasoning over long chunks of context, but will surprise with its distinctive approach across different situations and open-ended tasks – there is a reason I am the data preparation process itself.

Disclaimer

The model might occasionally display unprompted bias in casual contexts out of the blue. Any harm as a consequence of its behavior is not of my responsibility. ALWAYS double-check its outputs.

Appreciations

Special thanks to my younger brother, kolar628, for helping me overcome uncreative episodes.

Shoutout to the precious T3 AI’LE family for their crystal clear vision and hard work in developing Turkish language models!

β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€” IMG-20241127-WA0000.jpg β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”

Downloads last month
20
GGUF
Model size
8.03B params
Architecture
llama

5-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for d-scentre/DSCNTR-Alpha-8B

Quantized
(215)
this model