--- language: - en pipeline_tag: text-generation library_name: transformers tags: - LLM - Universal-NER - NER inference: false --- # Quantized version of Universal-NER/UniNER-7B-all [Universal-NER/UniNER-7B-all](https://huggingface.co/Universal-NER/UniNER-7B-all) quantized to 4bit with GPTQ and stored with 1GB shard size. ## Model Description The model [Universal-NER/UniNER-7B-all](https://huggingface.co/Universal-NER/UniNER-7B-all) was quantized to 4bit, group_size 128, and ascending_order=True with auto-gptq integration in transformers (https://huggingface.co/blog/gptq-integration). ## Evaluation TODO ## Prompt template Prompt template is the same as for the full precision model: ```python prompt_template = """A virtual assistant answers questions from a user based on the provided text. USER: Text: {input_text} ASSISTANT: I’ve read this text. USER: What describes {entity_name} in the text? ASSISTANT: """ ``` ## Usage It is recommended to format input according to the prompt template mentioned above during inference for best results. ```python prompt = prompt_template.format_map({"input_text": "Cologne is a great city in Germany - maybe even the greatest ;)", "entity_name": "city"}) ```