File size: 1,626 Bytes
7973870 dfe5db1 7973870 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- LLM
- Universal-NER
- NER
- 4bit
inference: false
---
![image](qunatized_lama_color_letters_4bit_512px.png)
# Quantized version of Universal-NER/UniNER-7B-definition
[Universal-NER/UniNER-7B-definition](https://huggingface.co/Universal-NER/UniNER-7B-definition) quantized to 4bit with GPTQ and stored with 1GB shard size.
## Model Description
The model [Universal-NER/UniNER-7B-definition](https://huggingface.co/Universal-NER/UniNER-7B-definition) was quantized to 4bit, group_size 128, and act-order=True with auto-gptq integration in transformers (https://huggingface.co/blog/gptq-integration).
## Evaluation
TODO
## Prompt template
Prompt template is the same as for the full precision model:
```python
prompt_template = """A virtual assistant answers questions from a user based on the provided text.
USER: Text: {input_text}
ASSISTANT: I’ve read this text.
USER: What describes {entity_name} in the text?
ASSISTANT:
"""
```
## Usage
It is recommended to format input according to the prompt template mentioned above during inference for best results.
```python
prompt = prompt_template.format_map({"input_text": "Cologne is a great city in Germany - maybe even the greatest ;)", "entity_name": "city"})
```
The model is small enough to be loaded in free-tier Colab with a T4 GPU: https://gist.github.com/sebastianschramm/b849c06676c6601d9a87270e83f5a157
## License
The original full precision model and its associated data are released under the CC BY-NC 4.0 license. Hence, the same license applies for the 4bit version. |