|
--- |
|
language: |
|
- multilingual |
|
- ar |
|
- bg |
|
- ca |
|
- cs |
|
- da |
|
- de |
|
- el |
|
- en |
|
- es |
|
- et |
|
- fa |
|
- fi |
|
- fr |
|
- gl |
|
- gu |
|
- he |
|
- hi |
|
- hr |
|
- hu |
|
- hy |
|
- id |
|
- it |
|
- ja |
|
- ka |
|
- ko |
|
- ku |
|
- lt |
|
- lv |
|
- mk |
|
- mn |
|
- mr |
|
- ms |
|
- my |
|
- nb |
|
- nl |
|
- pl |
|
- pt |
|
- ro |
|
- ru |
|
- sk |
|
- sl |
|
- sq |
|
- sr |
|
- sv |
|
- th |
|
- tr |
|
- uk |
|
- ur |
|
- vi |
|
- ha |
|
license: mit |
|
library_name: sentence-transformers |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
language_bcp47: |
|
- fr-ca |
|
- pt-br |
|
- zh-cn |
|
- zh-tw |
|
pipeline_tag: sentence-similarity |
|
inference: false |
|
--- |
|
|
|
## 0xnu/pmmlv2-fine-tuned-hausa |
|
|
|
Hausa fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2). |
|
|
|
[Hausa](https://en.wikipedia.org/wiki/Hausa_language) words typically comprise diverse blends of vowels and consonants. The Hausa language boasts a vibrant phonetic framework featuring twenty-three consonants, five vowels, and two diphthongs. Words in Hausa can fluctuate in length and intricacy, but they usually adhere to uniform configurations of syllable arrangement and articulation. Additionally, Hausa words often incorporate diacritical marks like the apostrophe and macron to signify glottal stops and long vowels. |
|
|
|
### Usage (Sentence-Transformers) |
|
|
|
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
### Embeddings |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
sentences = ["Tambarin talaka cikinsa", "Gwanin dokin wanda yake kansa"] |
|
|
|
model = SentenceTransformer('0xnu/pmmlv2-fine-tuned-hausa') |
|
embeddings = model.encode(sentences) |
|
print(embeddings) |
|
``` |
|
|
|
### Advanced Usage |
|
|
|
```sh |
|
from sentence_transformers import SentenceTransformer, util |
|
import torch |
|
|
|
# Define sentences in Hausa |
|
sentences = [ |
|
"Menene sunan babban birnin Ingila?", |
|
"Wanne dabba ne mafi zafi a duniya?", |
|
"Ta yaya zan iya koyon harshen Hausa?", |
|
"Wanne abinci ne mafi shahara a Najeriya?", |
|
"Wane irin kaya ake sawa don bikin Hausa?" |
|
] |
|
|
|
# Load the Hausa-trained model |
|
model = SentenceTransformer('path/to/pmmlv2-fine-tuned-hausa') |
|
|
|
# Compute embeddings |
|
embeddings = model.encode(sentences, convert_to_tensor=True) |
|
|
|
# Function to find the closest sentence |
|
def find_closest_sentence(query_embedding, sentence_embeddings, sentences): |
|
# Compute cosine similarities |
|
cosine_scores = util.pytorch_cos_sim(query_embedding, sentence_embeddings)[0] |
|
# Find the position of the highest score |
|
best_match_index = torch.argmax(cosine_scores).item() |
|
return sentences[best_match_index], cosine_scores[best_match_index].item() |
|
|
|
query = "Menene sunan babban birnin Ingila?" |
|
query_embedding = model.encode(query, convert_to_tensor=True) |
|
closest_sentence, similarity_score = find_closest_sentence(query_embedding, embeddings, sentences) |
|
|
|
print(f"Tambaya: {query}") |
|
print(f"Jimla mafi kusa: {closest_sentence}") |
|
print(f"Alamar kama: {similarity_score:.4f}") |
|
|
|
# You can also try with a new sentence not in the original list |
|
new_query = "Wanne sarki ne yake mulkin Kano a yanzu?" |
|
new_query_embedding = model.encode(new_query, convert_to_tensor=True) |
|
closest_sentence, similarity_score = find_closest_sentence(new_query_embedding, embeddings, sentences) |
|
|
|
print(f"\nSabuwar Tambaya: {new_query}") |
|
print(f"Jimla mafi kusa: {closest_sentence}") |
|
print(f"Alamar kama: {similarity_score:.4f}") |
|
``` |
|
|
|
### License |
|
|
|
This project is licensed under the [MIT License](./LICENSE). |
|
|
|
### Copyright |
|
|
|
(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu). |
|
|