File size: 943 Bytes
f3ed02f 2df4717 80e8f2c 097064f 80e8f2c b268c72 8362824 b268c72 8362824 532ac30 d4cb0e9 532ac30 7abbe25 532ac30 80e8f2c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
language:
- he
pipeline_tag: fill-mask
datasets:
- HeNLP/HeDC4
---
## Hebrew Language Model for Long Documents
State-of-the-art Longformer language model for Hebrew.
#### How to use
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('HeNLP/LongHeRo')
model = AutoModelForMaskedLM.from_pretrained('HeNLP/LongHeRo')
# Tokenization Example:
# Tokenizing
tokenized_string = tokenizer('שלום לכולם')
# Decoding
decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True)
```
### Citing
If you use LongHeRo in your research, please cite [HeRo: RoBERTa and Longformer Hebrew Language Models](http://arxiv.org/abs/2304.11077).
```
@article{shalumov2023hero,
title={HeRo: RoBERTa and Longformer Hebrew Language Models},
author={Vitaly Shalumov and Harel Haskey},
year={2023},
journal={arXiv:2304.11077},
}
``` |