--- language: - pl license: apache-2.0 library_name: transformers tags: - finetuned - gguf inference: false pipeline_tag: text-generation base_model: speakleash/Bielik-11B-v2.3-Instruct ---

# Bielik-11B-v2.3-Instruct-GPTQ This repo contains OpenVino 4bit format model files for [SpeakLeash](https://speakleash.org/)'s [Bielik-11B-v.2.3-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct). DISCLAIMER: Be aware that quantised models show reduced response quality and possible hallucinations!
### Model usage with OpenVino This model can be deployed efficiently using the [OpenVino](https://docs.openvino.ai/2024/index.html). Below you can find two ways of model inference: using Intel Optimum, pure OpenVino library. The most simple LLM inferencing code with OpenVINO and the optimum-intel library. ```python from optimum.intel import OVModelForCausalLM from transformers import AutoTokenizer model_id = "speakleash/Bielik-11B-v2.3-Instruct-4bit-ov" model = OVModelForCausalLM.from_pretrained(model_id, use_cache=False) question = "Dlaczego ryby nie potrafią fruwać?" prompt_text_bielik = f"""<|im_start|> system Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim.<|im_end|> <|im_start|> user {question}<|im_end|> <|im_start|> assistant """ tokenizer = AutoTokenizer.from_pretrained(model_id) inputs = tokenizer(prompt_text_bielik, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=500) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` Run an LLM model with only OpenVINO (additionaly we provided code which uses 'greedy decoding' instead of sampling). ```python import openvino as ov import numpy as np from transformers import AutoTokenizer model_path = "speakleash/Bielik-11B-v2.3-Instruct-4bit-ov/openvino_model.xml" tokenizer = AutoTokenizer.from_pretrained("speakleash/Bielik-11B-v2.3-Instruct-4bit-ov") ov_model = ov.Core().read_model(model_path) compiled_model = ov.compile_model(ov_model, "CPU") infer_request = compiled_model.create_infer_request() question = "Dlaczego ryby nie potrafią fruwać?" prompt_text_bielik = f"""<|im_start|> system Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim.<|im_end|> <|im_start|> user {question}<|im_end|> <|im_start|> assistant """ tokens = tokenizer.encode(prompt_text_bielik, return_tensors="np") input_ids = tokens attention_mask = np.ones_like(input_ids) position_ids = np.arange(len(tokens[0])).reshape(1, -1) beam_idx = np.array([0], dtype=np.int32) infer_request.reset_state() prev_output = '' generated_text_ids = np.array([], dtype=np.int32) num_max_token_for_generation = 500 print(f'Pytanie: {question}') print("Odpowiedź:", end=' ', flush=True) for _ in range(num_max_token_for_generation): response = infer_request.infer(inputs={ 'input_ids': input_ids, 'attention_mask': attention_mask, 'position_ids': position_ids, 'beam_idx': beam_idx }) next_token_logits = response['logits'][0, -1, :] sampled_id = np.argmax(next_token_logits) # Greedy decoding generated_text_ids = np.append(generated_text_ids, sampled_id) output_text = tokenizer.decode(generated_text_ids) print(output_text[len(prev_output):], end='', flush=True) prev_output = output_text input_ids = np.array([[sampled_id]], dtype=np.int64) attention_mask = np.array([[1]], dtype=np.int64) position_ids = np.array([[position_ids[0, -1] + 1]], dtype=np.int64) if sampled_id == tokenizer.eos_token_id: print('\n\n*** Zakończono generowanie.') break print(f'\n\n*** Wygenerowano {len(generated_text_ids)} tokenów.') ``` ### Model description: * **Developed by:** [SpeakLeash](https://speakleash.org/) & [ACK Cyfronet AGH](https://www.cyfronet.pl/) * **Language:** Polish * **Model type:** causal decoder-only * **Quant from:** [Bielik-11B-v2.3-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct) * **Finetuned from:** [Bielik-11B-v2](https://huggingface.co/speakleash/Bielik-11B-v2) * **License:** Apache 2.0 and [Terms of Use](https://bielik.ai/terms/) ### Responsible for model quantization * [Remigiusz Kinas](https://www.linkedin.com/in/remigiusz-kinas/)SpeakLeash - team leadership, conceptualizing, calibration data preparation, process creation and quantized model delivery. ## Contact Us If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our [Discord SpeakLeash](https://discord.gg/CPBxPce4).