Upload 6 files

Browse files

Files changed (6) hide show

README.md +197 -0
config.json +43 -0
generation_config.json +9 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +970 -0

README.md ADDED Viewed

	@@ -0,0 +1,197 @@

+---
+license: llama3
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- nvidia
+- chatqa-1.5
+- chatqa
+- llama-3
+- pytorch
+---
+## Model Details
+We introduce Llama3-ChatQA-1.5, which excels at conversational question answering (QA) and retrieval-augmented generation (RAG). Llama3-ChatQA-1.5 is developed using an improved training recipe from [ChatQA (1.0)](https://arxiv.org/abs/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-8B). Specifically, we incorporate more conversational QA data to enhance its tabular and arithmetic calculation capability. Llama3-ChatQA-1.5 has two variants: Llama3-ChatQA-1.5-8B and Llama3-ChatQA-1.5-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format. **For more information about ChatQA, check the [website](https://chatqa-project.github.io/)!**
+## Other Resources
+[Llama3-ChatQA-1.5-70B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-70B) &ensp; [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) &ensp; [Training Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data) &ensp; [Retriever](https://huggingface.co/nvidia/dragon-multiturn-query-encoder) &ensp; [Website](https://chatqa-project.github.io/) &ensp; [Paper](https://arxiv.org/abs/2401.10225)
+## Benchmark Results
+Results in [ChatRAG Bench](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) are as follows:
+| | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
+| -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| Doc2Dial | 37.88 | 33.51 | 37.88 | 34.16 | 38.9 | 39.33 | 41.26 |
+| QuAC | 29.69 | 34.16 | 36.96 | 40.29 | 41.82 | 39.73 | 38.82 |
+| QReCC | 46.97 | 49.77 | 51.34 | 52.01 | 48.05 | 49.03 | 51.40 |
+| CoQA | 76.61 | 69.71 | 76.98 | 77.42 | 78.57 | 76.46 | 78.44 |
+| DoQA | 41.57 | 40.67 | 41.24 | 43.39 | 51.94 | 49.6 | 50.67 |
+| ConvFinQA | 51.61 | 71.21 | 76.6 | 81.28 | 73.69 | 78.46 | 81.88 |
+| SQA | 61.87 | 74.07 | 69.61 | 79.21 | 69.14 | 73.28 | 83.82 |
+| TopioCQA | 45.45 | 53.77 | 49.72 | 45.09 | 50.98 | 49.96 | 55.63 |
+| HybriDial* | 54.51 | 46.7 | 48.59 | 49.81 | 56.44 | 65.76 | 68.27 |
+| INSCIT | 30.96 | 35.76 | 36.23 | 36.34 | 31.9 | 30.1 | 32.31 |
+| Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
+| Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
+Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ChatRAG Bench can be found [here](https://huggingface.co/datasets/nvidia/ChatRAG-Bench).
+## Prompt Format
+**We highly recommend that you use the prompt format we provide, as follows:**
+### when context is available
+<pre>
+System: {System}
+{Context}
+User: {Question}
+Assistant: {Response}
+User: {Question}
+Assistant:
+</pre>
+### when context is not available
+<pre>
+System: {System}
+User: {Question}
+Assistant: {Response}
+User: {Question}
+Assistant:
+</pre>
+**The content of the system's turn (i.e., {System}) for both scenarios is as follows:**
+<pre>
+This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context.
+</pre>
+**Note that our ChatQA-1.5 models are optimized for the capability with context, e.g., over documents or retrieved context.**
+## How to use
+### take the whole document as context
+This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "nvidia/Llama3-ChatQA-1.5-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
+messages = [
+    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
+]
+document = """NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"""
+def get_formatted_input(messages, context):
+    system = "System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context."
+    instruction = "Please give a full and complete answer for the question."
+    for item in messages:
+        if item['role'] == "user":
+            ## only apply this instruction for the first user turn
+            item['content'] = instruction + " " + item['content']
+            break
+    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] == "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
+    formatted_input = system + "\n\n" + context + "\n\n" + conversation
+    return formatted_input
+formatted_input = get_formatted_input(messages, document)
+tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)
+terminators = [
+    tokenizer.eos_token_id,
+    tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)
+response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+### run retrieval to get top-n chunks as context
+This can be applied to the scenario when the document is very long, so that it is necessary to run retrieval. Here, we use our [Dragon-multiturn](https://huggingface.co/nvidia/dragon-multiturn-query-encoder) retriever which can handle conversatinoal query. In addition, we provide a few [documents](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B/tree/main/docs) for users to play with.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel
+import torch
+import json
+## load ChatQA-1.5 tokenizer and model
+model_id = "nvidia/Llama3-ChatQA-1.5-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
+## load retriever tokenizer and model
+retriever_tokenizer = AutoTokenizer.from_pretrained('nvidia/dragon-multiturn-query-encoder')
+query_encoder = AutoModel.from_pretrained('nvidia/dragon-multiturn-query-encoder')
+context_encoder = AutoModel.from_pretrained('nvidia/dragon-multiturn-context-encoder')
+## prepare documents, we take landrover car manual document that we provide as an example
+chunk_list = json.load(open("docs.json"))['landrover']
+messages = [
+    {"role": "user", "content": "how to connect the bluetooth in the car?"}
+]
+### running retrieval
+## convert query into a format as follows:
+## user: {user}\nagent: {agent}\nuser: {user}
+formatted_query_for_retriever = '\n'.join([turn['role'] + ": " + turn['content'] for turn in messages]).strip()
+query_input = retriever_tokenizer(formatted_query_for_retriever, return_tensors='pt')
+ctx_input = retriever_tokenizer(chunk_list, padding=True, truncation=True, max_length=512, return_tensors='pt')
+query_emb = query_encoder(**query_input).last_hidden_state[:, 0, :]
+ctx_emb = context_encoder(**ctx_input).last_hidden_state[:, 0, :]
+## Compute similarity scores using dot product and rank the similarity
+similarities = query_emb.matmul(ctx_emb.transpose(0, 1)) # (1, num_ctx)
+ranked_results = torch.argsort(similarities, dim=-1, descending=True) # (1, num_ctx)
+## get top-n chunks (n=5)
+retrieved_chunks = [chunk_list[idx] for idx in ranked_results.tolist()[0][:5]]
+context = "\n\n".join(retrieved_chunks)
+### running text generation
+formatted_input = get_formatted_input(messages, context)
+tokenized_prompt = tokenizer(tokenizer.bos_token + formatted_input, return_tensors="pt").to(model.device)
+terminators = [
+    tokenizer.eos_token_id,
+    tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+outputs = model.generate(input_ids=tokenized_prompt.input_ids, attention_mask=tokenized_prompt.attention_mask, max_new_tokens=128, eos_token_id=terminators)
+response = outputs[0][tokenized_prompt.input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
+## Correspondence to
+Zihan Liu (zihanl@nvidia.com), Wei Ping (wping@nvidia.com)
+## Citation
+<pre>
+@article{liu2024chatqa,
+  title={ChatQA: Building GPT-4 Level Conversational QA Models},
+  author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
+  journal={arXiv preprint arXiv:2401.10225},
+  year={2024}}
+</pre>
+## License
+The use of this model is governed by the [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](https://llama.meta.com/llama3/license/)

config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "_name_or_path": "nvidia/Llama3-ChatQA-1.5-8B",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "eos_token_id": 128001,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 8192,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "quantization_config": {
+    "_load_in_4bit": true,
+    "_load_in_8bit": false,
+    "bnb_4bit_compute_dtype": "float32",
+    "bnb_4bit_quant_storage": "uint8",
+    "bnb_4bit_quant_type": "fp4",
+    "bnb_4bit_use_double_quant": false,
+    "llm_int8_enable_fp32_cpu_offload": false,
+    "llm_int8_has_fp16_weight": false,
+    "llm_int8_skip_modules": null,
+    "llm_int8_threshold": 6.0,
+    "load_in_4bit": true,
+    "load_in_8bit": false,
+    "quant_method": "bitsandbytes"
+  },
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.40.2",
+  "use_cache": true,
+  "vocab_size": 128256
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 128000,
+  "eos_token_id": [
+    128001,
+    128009
+  ],
+  "transformers_version": "4.40.2"
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e6590c5432cece32a15caabd1b02103c8846064e2c9b061091a56a321cd8926
+size 4977222696

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1e13d82dd95b22c2bb3ebc8d9948820043c7ee9a416a51c193ba1c8d87ee853
+size 1050673280

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,970 @@

+{
+  "metadata": {
+    "total_size": 6027779904
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00002-of-00002.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.24.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.25.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.26.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.27.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.28.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.29.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.30.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.31.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight.absmax": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight.quant_map": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight.quant_state.bitsandbytes__fp4": "model-00001-of-00002.safetensors",
+    "model.norm.weight": "model-00001-of-00002.safetensors"
+  }
+}