File size: 3,571 Bytes
bd886f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ba1f68
bd886f7
 
 
5741017
bd886f7
 
 
 
 
 
99bf167
5ba1f68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd886f7
 
 
 
 
 
5ba1f68
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
language:
- multilingual
- ar
- bg
- ca
- cs
- da
- de
- el
- en
- es
- et
- fa
- fi
- fr
- gl
- gu
- he
- hi
- hr
- hu
- hy
- id
- it
- ja
- ka
- ko
- ku
- lt
- lv
- mk
- mn
- mr
- ms
- my
- nb
- nl
- pl
- pt
- ro
- ru
- sk
- sl
- sq
- sr
- sv
- th
- tr
- uk
- ur
- vi
- ha
license: mit
library_name: sentence-transformers
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
language_bcp47:
- fr-ca
- pt-br
- zh-cn
- zh-tw
pipeline_tag: sentence-similarity
inference: false
---

## 0xnu/pmmlv2-fine-tuned-hausa

Hausa fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).

[Hausa](https://en.wikipedia.org/wiki/Hausa_language) words typically comprise diverse blends of vowels and consonants. The Hausa language boasts a vibrant phonetic framework featuring twenty-three consonants, five vowels, and two diphthongs. Words in Hausa can fluctuate in length and intricacy, but they usually adhere to uniform configurations of syllable arrangement and articulation. Additionally, Hausa words often incorporate diacritical marks like the apostrophe and macron to signify glottal stops and long vowels.

### Usage (Sentence-Transformers)

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

```
pip install -U sentence-transformers
```

### Embeddings

```python
from sentence_transformers import SentenceTransformer
sentences = ["Tambarin talaka cikinsa", "Gwanin dokin wanda yake kansa"]

model = SentenceTransformer('0xnu/pmmlv2-fine-tuned-hausa')
embeddings = model.encode(sentences)
print(embeddings)
```

### Advanced Usage

```sh
from sentence_transformers import SentenceTransformer, util
import torch

# Define sentences in Hausa
sentences = [
    "Menene sunan babban birnin Ingila?",
    "Wanne dabba ne mafi zafi a duniya?",
    "Ta yaya zan iya koyon harshen Hausa?",
    "Wanne abinci ne mafi shahara a Najeriya?",
    "Wane irin kaya ake sawa don bikin Hausa?"
]

# Load the Hausa-trained model
model = SentenceTransformer('path/to/pmmlv2-fine-tuned-hausa')

# Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

# Function to find the closest sentence
def find_closest_sentence(query_embedding, sentence_embeddings, sentences):
    # Compute cosine similarities
    cosine_scores = util.pytorch_cos_sim(query_embedding, sentence_embeddings)[0]
    # Find the position of the highest score
    best_match_index = torch.argmax(cosine_scores).item()
    return sentences[best_match_index], cosine_scores[best_match_index].item()

query = "Menene sunan babban birnin Ingila?"
query_embedding = model.encode(query, convert_to_tensor=True)
closest_sentence, similarity_score = find_closest_sentence(query_embedding, embeddings, sentences)

print(f"Tambaya: {query}")
print(f"Jimla mafi kusa: {closest_sentence}")
print(f"Alamar kama: {similarity_score:.4f}")

# You can also try with a new sentence not in the original list
new_query = "Wanne sarki ne yake mulkin Kano a yanzu?"
new_query_embedding = model.encode(new_query, convert_to_tensor=True)
closest_sentence, similarity_score = find_closest_sentence(new_query_embedding, embeddings, sentences)

print(f"\nSabuwar Tambaya: {new_query}")
print(f"Jimla mafi kusa: {closest_sentence}")
print(f"Alamar kama: {similarity_score:.4f}")
```

### License

This project is licensed under the [MIT License](./LICENSE).

### Copyright

(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu).