HeRo / README.md
vitvit's picture
Update README.md
8dc628f
---
language:
- he
datasets:
- HeNLP/HeDC4
---
## Hebrew Language Model
State-of-the-art RoBERTa language model for Hebrew.
#### How to use
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('HeNLP/HeRo')
model = AutoModelForMaskedLM.from_pretrained('HeNLP/HeRo'
# Tokenization Example:
# Tokenizing
tokenized_string = tokenizer('ืฉืœื•ื ืœื›ื•ืœื')
# Decoding
decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True)
```
### Citing
If you use HeRo in your research, please cite [HeRo: RoBERTa and Longformer Hebrew Language Models](http://arxiv.org/abs/2304.11077).
```
@article{shalumov2023hero,
title={HeRo: RoBERTa and Longformer Hebrew Language Models},
author={Vitaly Shalumov and Harel Haskey},
year={2023},
journal={arXiv:2304.11077},
}
```