|
--- |
|
pipeline_tag: text-classification |
|
language: |
|
- uz |
|
- en |
|
- ru |
|
license: apache-2.0 |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
<p><b> Til identifikatori.</b> |
|
|
|
Tabiiy tilni qayta ishlash (NLP) sohasida tilni aniqlash vazifasi ma'lum matn yoki hujjat tilini aniqlashni o'z ichiga oladi, |
|
ammo ko'plab tillarni aniqlash qobiliyati qiyinlashadi. Ushbu model matndan 21 tilni tanib oladi,xususan, oʻzbek tilida |
|
qoʻllaniladigan lotin-kirill yozuviga eʼtibor qaratadi. Bu boradagi tadqiqotlar kamligini hisobga olib, mos transformator |
|
arxitekturasiga asoslangan oʻzbek lotin-kirill yozuvini aniqlik darajasi yuqori boʻlgan tilni aniqlash modelini taqdim etamiz. |
|
Modelimiz biz yaratgan o‘zbek tili korpusidan foydalangan holda baholandi, bu ham kelajakda o‘zbek tilini aniqlash vazifalarini |
|
baholash uchun qimmatli manba bo‘lib xizmat qilishi mumkin.Ushbu model 21 ta tilni, jumladan, ikkita alifboda (lotin va kirill) |
|
ifodalangan o‘zbek tilini qamrab oladi. |
|
|
|
<p><b> Language identifier. </b> |
|
|
|
The task of language identification in Natural Language Processing (NLP) involves identifying the language of a particular text or document, |
|
but the ability to identify multiple languages can be challenging. This model is capable of recognizing 21 languages from text, specifically |
|
focusing on the Latin-Cyrillic script used in Uzbek. Considering the scarcity of research in this area, we present a language identification |
|
model with a high degree of accuracy for the Uzbek Latin-Cyrillic script, based on the relevant transformer architecture. Our model has been |
|
evaluated using the Uzbek corpus that we created, which can potentially serve as a valuable resource for evaluating language identification |
|
tasks for Uzbek in the future. This model encompasses 21 languages, including Uzbek expressed in two scripts (Latin and Cyrillic). |
|
|