File size: 1,414 Bytes
c729d95
 
2199918
 
 
 
 
 
c729d95
2199918
fd49436
2199918
 
e9175c6
2199918
8d33702
 
2199918
056b212
e9175c6
2199918
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
license: apache-2.0
language:
- fa
tags:
- bert
- roberta
- persian
---

# AriaBERT: A Pre-trained Persian BERT Model for Natural Language Understanding

## Introduction
AriaBERT represents a breakthrough in natural language processing (NLP) for the Persian language. Developed to address the critical gap in efficient pretrained language models for Persian, AriaBERT is tailored to elevate the standards of Persian language tasks.

## Paper: https://www.researchsquare.com/article/rs-3558473/v1

## Key Features
- **Diverse Training Data:** AriaBERT has been trained on over 32 gigabytes of varied Persian textual data, spanning conversational, formal, and hybrid texts. This includes a rich mix of tweets, news articles, poems, medical and encyclopedia texts, user opinions, and more.
- **RoBERTa Architecture:** Leveraging the robustness of the RoBERTa architecture and the precision of Byte-Pair Encoding tokenizer, AriaBERT stands apart from traditional BERT-based models.
- **Broad Applicability:** Ideal for a range of NLP tasks including classification, sentiment analysis, and stance detection, particularly in the Persian language context.

## Performance Benchmarks
- **Sentiment Analysis:** Achieves an average improvement of 3% over competing models.
- **Classification Tasks:** Demonstrates a 0.65% improvement in accuracy.
- **Stance Detection:** Shows a 3% enhancement in performance metrics.