Update README.md

1f276b8 verified 8 months ago

4.04 kB

	---
	license: apache-2.0
	language:
	- el
	- en
	tags:
	- finetuned
	- quantized
	- GGUF
	model_creator: ilsp
	inference: true
	base_model: ilsp/Meltemi-7B-Instruct-v1
	library_name: gguf
	quantized_by: ilsp
	---

	# Meltemi 7B Instruct Quantized models

	![image/png](https://miro.medium.com/v2/resize:fit:720/format:webp/1*IaE7RJk6JffW8og-MOnYCA.png)

	## Description

	In this repository you can find quantised GGUF variants of [Meltemi-7B-Instruct-v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1) model, created using [llama.cpp](https://github.com/ggerganov/llama.cpp) at the [Institute for Language and Speech Processing](https://www.athenarc.gr/en/ilsp) of [Athena Research & Innovation Center](https://www.athenarc.gr/en).

	## Provided files (Use case column taken from the llama.cpp documentation)

	Based on the information

	\| Name \| Quant method \| Bits \| Size \| Appr. RAM required \| Use case \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----- \|
	\| [meltemi-instruct-v1_q3_K_M.bin](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-instruct-v1_q3_K_M.bin) \| Q3_K_M \| 3 \| 3.67 GB\| 6.45 GB \| small, high quality loss \|
	\| [meltemi-instruct-v1_q5_K_M.bin](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-instruct-v1_q5_K_M.bin) \| Q5_K_M \| 5 \| 5.31 GB\| 8.1 GB \| large, low quality loss - recommended \|

	# Instruction format
	The prompt format is the same as the [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) format:

	```
	<s><\|system\|>
	Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
	<\|user\|>
	Πες μου αν έχεις συνείδηση.</s>
	<\|assistant\|>
	```

	# Loading the model with llama_cpp

	Install llama-cpp-python (set -DLLAMA_CUBLAS=on if you want to use your GPU for inference)

	```
	$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
	pip install llama-cpp-python
	```

	```python
	from llama_cpp import Llama

	llm = Llama(
	model_path="./meltemi-instruct-v1_q5_K_M.bin", # Download the model file first
	n_ctx=8192, # The max sequence length to use - note that longer sequence lengths require much more resources
	n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
	n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
	)
	system = "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."
	input_text = "Πες μου αν έχεις συνείδηση."

	prompt = f"""
	<\|system\|>
	{system}
	</s>
	<\|user\|>
	{input_text}
	</s>
	<\|assistant\|>
	"""

	output = llm(
	prompt,
	max_tokens=1024,
	stop=["</s>"],
	echo=True
	)

	output_text = output['choices'][0]['text'][len(prompt):].strip()
	```

	# Ethical Considerations

	This model has not been aligned with human preferences, and therefore might generate misleading, harmful, or toxic content.


	# Acknowledgements

	The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.