aashish1904 commited on
Commit
56a11f6
β€’
1 Parent(s): f8c3f60

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +144 -0
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ tags:
6
+ - alignment-handbook
7
+ - generated_from_trainer
8
+ - trl
9
+ - sft
10
+ - generated_from_trainer
11
+ datasets:
12
+
13
+ - jan-hq/bagel_sft_binarized
14
+ - jan-hq/dolphin_binarized
15
+ - jan-hq/openhermes_binarized
16
+ - jan-hq/bagel_dpo_binarized
17
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
18
+ pipeline_tag: text-generation
19
+ inference:
20
+ parameters:
21
+ temperature: 0.7
22
+ max_new_tokens: 40
23
+ widget:
24
+ - messages:
25
+ - role: user
26
+ content: Tell me about NVIDIA in 20 words
27
+
28
+ ---
29
+
30
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
31
+
32
+
33
+ # QuantFactory/LlamaCorn-1.1B-Chat-GGUF
34
+ This is quantized version of [jan-hq/LlamaCorn-1.1B-Chat](https://huggingface.co/jan-hq/LlamaCorn-1.1B-Chat) created using llama.cpp
35
+
36
+ # Original Model Card
37
+
38
+
39
+ <!-- header start -->
40
+ <!-- 200823 -->
41
+
42
+ <div style="width: auto; margin-left: auto; margin-right: auto"
43
+ >
44
+ <img src="https://github.com/janhq/jan/assets/89722390/35daac7d-b895-487c-a6ac-6663daaad78e" alt="Jan banner"
45
+ style="width: 100%; min-width: 400px; display: block; margin: auto;">
46
+ </div>
47
+
48
+ <p align="center">
49
+ <a href="https://jan.ai/">Jan</a
50
+ >
51
+ - <a
52
+ href="https://discord.gg/AsJ8krTT3N">Discord</a>
53
+ </p>
54
+ <!-- header end -->
55
+
56
+ # Model description
57
+
58
+ - Finetuned [TinyLlama-1.1B](TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) further for handling simple tasks and have acceptable conversational quality
59
+ - Utilized high-quality opensource dataset
60
+ - Can be run on [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) on consumer devices
61
+ - Can fit into laptop dGPUs with as little as >=6gb of VRAM
62
+
63
+ # Prompt template
64
+
65
+ ChatML
66
+ ```
67
+ <|im_start|>system
68
+ {system_message}<|im_end|>
69
+ <|im_start|>user
70
+ {prompt}<|im_end|>
71
+ <|im_start|>assistant
72
+
73
+ ```
74
+
75
+ # Run this model
76
+ You can run this model using [Jan Desktop](https://jan.ai/) on Mac, Windows, or Linux.
77
+
78
+ Jan is an open source, ChatGPT alternative that is:
79
+
80
+ - πŸ’» **100% offline on your machine**: Your conversations remain confidential, and visible only to you.
81
+ - πŸ—‚οΈ **
82
+ An Open File Format**: Conversations and model settings stay on your computer and can be exported or deleted at any time.
83
+ - 🌐 **OpenAI Compatible**: Local server on port `1337` with OpenAI compatible endpoints
84
+
85
+ - 🌍 **Open Source & Free**: We build in public; check out our [Github](https://github.com/janhq)
86
+
87
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/r7VmEBLGXpPLTu2MImM7S.png)
88
+
89
+
90
+ # About Jan
91
+ Jan believes in the need for an open-source AI ecosystem and is building the infra and tooling to allow open-source AIs to compete on a level playing field with proprietary ones.
92
+
93
+ Jan's long-term vision is to build a cognitive framework for future robots, who are practical, useful assistants for humans and businesses in everyday life.
94
+
95
+ # LlamaCorn-1.1B-Chat
96
+
97
+ ## Training procedure
98
+
99
+ ### Training hyperparameters
100
+
101
+ The following hyperparameters were used during training:
102
+ - learning_rate: 5e-07
103
+ - train_batch_size: 2
104
+ - eval_batch_size: 4
105
+ - seed: 42
106
+ - distributed_type: multi-GPU
107
+ - num_devices: 2
108
+ - gradient_accumulation_steps: 16
109
+ - total_train_batch_size: 64
110
+ - total_eval_batch_size: 8
111
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
112
+ - lr_scheduler_type: cosine
113
+ - lr_scheduler_warmup_ratio: 0.1
114
+ - num_epochs: 3
115
+
116
+ ### Training results
117
+
118
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
119
+ |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
120
+ | 0.9958 | 0.03 | 100 | 1.0003 | -0.0002 | -0.0002 | 0.4930 | -0.0001 | -180.9232 | -195.6078 | -2.6876 | -2.6924 |
121
+ | 0.9299 | 1.02 | 3500 | 0.9439 | -0.1570 | -0.2195 | 0.5770 | 0.0625 | -183.1160 | -197.1755 | -2.6612 | -2.6663 |
122
+ | 0.9328 | 2.01 | 6900 | 0.9313 | -0.2127 | -0.2924 | 0.5884 | 0.0798 | -183.8456 | -197.7321 | -2.6296 | -2.6352 |
123
+ | 0.9321 | 2.98 | 10200 | 0.9305 | -0.2149 | -0.2955 | 0.5824 | 0.0805 | -183.8759 | -197.7545 | -2.6439 | -2.6493 |
124
+
125
+
126
+ ### Framework versions
127
+
128
+ - Transformers 4.36.2
129
+ - Pytorch 2.1.2+cu121
130
+ - Datasets 2.14.6
131
+ - Tokenizers 0.15.0
132
+
133
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
134
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jan-hq__LlamaCorn-1.1B)
135
+
136
+ | Metric |Value|
137
+ |---------------------------------|----:|
138
+ |Avg. |36.94|
139
+ |AI2 Reasoning Challenge (25-Shot)|34.13|
140
+ |HellaSwag (10-Shot) |59.33|
141
+ |MMLU (5-Shot) |29.01|
142
+ |TruthfulQA (0-shot) |36.78|
143
+ |Winogrande (5-shot) |61.96|
144
+ |GSM8k (5-shot) | 0.45|