vrashad commited on
Commit
5d00664
1 Parent(s): 41e10f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -3
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - ar
5
+ - az
6
+ - bg
7
+ - de
8
+ - el
9
+ - en
10
+ - es
11
+ - fr
12
+ - hi
13
+ - it
14
+ - ja
15
+ - nl
16
+ - pl
17
+ - pt
18
+ - ru
19
+ - sw
20
+ - th
21
+ - tr
22
+ - ur
23
+ - vi
24
+ - zh
25
+ pipeline_tag: text-classification
26
+ tags:
27
+ - language detect
28
+ ---
29
+
30
+ # Multilingual Language Detection Model
31
+
32
+ ## Model Description
33
+ This repository contains a multilingual language detection model based on the XLM-RoBERTa base architecture. The model is capable of distinguishing between 21 different languages including Arabic, Azerbaijani, Bulgarian, German, Greek, English, Spanish, French, Hindi, Italian, Japanese, Dutch, Polish, Portuguese, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, and Chinese.
34
+
35
+ ## How to Use
36
+ You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.
37
+
38
+ ### Quick Start
39
+ First, install the transformers library if you haven't already:
40
+ ```bash
41
+ pip install transformers
42
+
43
+
44
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
45
+ import torch
46
+
47
+ # Load tokenizer and model
48
+ tokenizer = AutoTokenizer.from_pretrained("LocalDoc/language_detection")
49
+ model = AutoModelForSequenceClassification.from_pretrained("LocalDoc/language_detection")
50
+
51
+ # Prepare text
52
+ text = "Əlqasım oğulları vorzakondu"
53
+ encoded_input = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
54
+
55
+ # Prediction
56
+ model.eval()
57
+ with torch.no_grad():
58
+ outputs = model(**encoded_input)
59
+
60
+ # Process the outputs
61
+ logits = outputs.logits
62
+ probabilities = torch.nn.functional.softmax(logits, dim=-1)
63
+ predicted_class_index = probabilities.argmax().item()
64
+ labels = ["az", "ar", "bg", "de", "el", "en", "es", "fr", "hi", "it", "ja", "nl", "pl", "pt", "ru", "sw", "th", "tr", "ur", "vi", "zh"]
65
+ predicted_label = labels[predicted_class_index]
66
+ print(f"Predicted Language: {predicted_label}")
67
+
68
+
69
+ Training Performance
70
+
71
+ The model was trained over three epochs, showing consistent improvement in accuracy and loss:
72
+
73
+ <b>Epoch 1:</b> Training Loss: 0.0127, Validation Loss: 0.0174, Accuracy: 0.9966, F1 Score: 0.9966
74
+ <b>Epoch 2:</b> Training Loss: 0.0149, Validation Loss: 0.0141, Accuracy: 0.9973, F1 Score: 0.9973
75
+ <b>Epoch 3:</b> Training Loss: 0.0001, Validation Loss: 0.0109, Accuracy: 0.9984, F1 Score: 0.9984
76
+
77
+ Test Results
78
+
79
+ The model achieved the following results on the test set:
80
+
81
+ Loss: 0.0133
82
+ Accuracy: 0.9975
83
+ F1 Score: 0.9975
84
+ Precision: 0.9975
85
+ Recall: 0.9975
86
+ Evaluation Time: 17.5 seconds
87
+ Samples per Second: 599.685
88
+ Steps per Second: 9.424
89
+
90
+ Licensing
91
+
92
+ This model is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You are free to use, modify, and distribute this model non-commercially, provided you attribute the original creation.