Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,107 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
datasets:
|
4 |
+
- EryriLabs/uk_legislation_alpaca_style_cleaned
|
5 |
+
base_model:
|
6 |
+
- EryriLabs/llama-3.2-uk-legislation-3b
|
7 |
+
---
|
8 |
+
license: cc-by-4.0
|
9 |
+
datasets:
|
10 |
+
- santoshtyss/uk_legislation
|
11 |
+
language:
|
12 |
+
- en
|
13 |
+
base_model:
|
14 |
+
- unsloth/Llama-3.2-3B
|
15 |
+
tags:
|
16 |
+
- legal
|
17 |
+
---
|
18 |
+
|
19 |
+
# Llama 3.2 UK Legislation 3B
|
20 |
+
|
21 |
+
|
22 |
+
<figure>
|
23 |
+
<img src="UKlegislation.png" alt="Llama 3.2 UK Legislation 3B" width="300">
|
24 |
+
</figure>
|
25 |
+
|
26 |
+
This model is a fine-tuned version of the Llama 3.2 UK Legislation 3B base. It was instruction-tuned for Q and A on UK legislation.
|
27 |
+
It was trained as part of a blog series, see the article [here](https://www.gpt-labs.ai/post/making-a-domain-specific-uk-legislation-llm-part-1-pretraining)
|
28 |
+
## Model Details
|
29 |
+
|
30 |
+
### Model Description
|
31 |
+
- **Developed by:** GPT-LABS.AI
|
32 |
+
- **Model type:** Transformer-based language model
|
33 |
+
- **Language:** English
|
34 |
+
- **License:** [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
|
35 |
+
- **Base model:** [llama-3.2-uk-legislation-3b](EryriLabs/llama-3.2-uk-legislation-3b)
|
36 |
+
|
37 |
+
### Model Sources
|
38 |
+
- **Repository:** [EryriLabs/llama-3.2-uk-legislation-3b](https://huggingface.co/EryriLabs/llama-3.2-uk-legislation-3b)
|
39 |
+
- **Blog Post:** [Making a Domain-Specific UK Legislation LLM: Part 1 - Pretraining](https://www.gpt-labs.ai/post/making-a-domain-specific-uk-legislation-llm-part-1-pretraining)
|
40 |
+
|
41 |
+
## Uses
|
42 |
+
|
43 |
+
### Intended Use
|
44 |
+
This model is designed to serve as Q and A for UK legislation and for further development for tasks such as:
|
45 |
+
- Domain-specific applications in law or other fields
|
46 |
+
- Research and experimentation in natural language processing
|
47 |
+
- General-purpose natural language understanding and generation
|
48 |
+
|
49 |
+
### Out-of-Scope Use
|
50 |
+
This model is **not suitable** for:
|
51 |
+
- Providing domain-specific expertise
|
52 |
+
- Applications requiring high accuracy or nuanced understanding of UK legislation
|
53 |
+
- Tasks involving sensitive or critical real-world applications without rigorous evaluation
|
54 |
+
|
55 |
+
## Bias, Risks, and Limitations
|
56 |
+
|
57 |
+
- **Bias:** The model may reflect biases inherent in the pretraining data. Outputs should be critically evaluated for accuracy and fairness.
|
58 |
+
- **Risks:** As a base model, it may generate responses that are overly general or contextually inappropriate for specific tasks.
|
59 |
+
- **Limitations:** The model is not fine-tuned for specific domains, including legal or legislative text, and does not include the most recent developments in any field.
|
60 |
+
|
61 |
+
## How to Get Started with the Model
|
62 |
+
|
63 |
+
```python
|
64 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
65 |
+
|
66 |
+
# Load model and tokenizer
|
67 |
+
model = AutoModelForCausalLM.from_pretrained("EryriLabs/llama-3.2-uk-legislation-instruct-3b", device_map="auto")
|
68 |
+
tokenizer = AutoTokenizer.from_pretrained("EryriLabs/llama-3.2-uk-legislation-instruct-3b")
|
69 |
+
|
70 |
+
# Sample question
|
71 |
+
input_text = "What are the main principles of UK legislation?"
|
72 |
+
|
73 |
+
# Tokenize and generate response
|
74 |
+
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
|
75 |
+
outputs = model.generate(inputs["input_ids"], max_length=50)
|
76 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
77 |
+
|
78 |
+
print(response)
|
79 |
+
```
|
80 |
+
|
81 |
+
## Technical Specifications
|
82 |
+
|
83 |
+
- **Model Architecture:** Llama 3.2 3B, a transformer-based model designed for natural language processing tasks.
|
84 |
+
- **Training Data:** Pretrained on a diverse dataset of general text.
|
85 |
+
- **Compute Infrastructure:** Training conducted on high-performance GPUs (e.g., NVIDIA A100).
|
86 |
+
|
87 |
+
## Citation
|
88 |
+
|
89 |
+
If you use this model, please cite:
|
90 |
+
|
91 |
+
```
|
92 |
+
@misc{llama3.2-uk-legislation-instruct-3b,
|
93 |
+
author = {GPT-LABS.AI},
|
94 |
+
title = {Llama 3.2 UK Legislation Instruct 3B},
|
95 |
+
year = {2024},
|
96 |
+
publisher = {Hugging Face},
|
97 |
+
url = {https://huggingface.co/EryriLabs/llama-3.2-uk-legislation-instruct-3b}
|
98 |
+
}
|
99 |
+
```
|
100 |
+
|
101 |
+
## Model Card Authors
|
102 |
+
|
103 |
+
- GPT-LABS.AI
|
104 |
+
|
105 |
+
## Contact
|
106 |
+
|
107 |
+
For questions or feedback, please visit gpt-labs.ai
|