bb335m / README.md
bh4's picture
Upload folder using huggingface_hub
caa551c verified
---
license: cc-by-nc-sa-4.0
language:
- bn
---
## Description
**Biswabangla-335M-io** is a 335 million parameters open source instruction-tuned Generative pretrained Language Model for Bangla/Bengali.
Biswabangla is a monolingual Bangla/Bengali Generative Language model. The tokenizer of Biswabangla also works for Assamese language.
This is a pretrained model from scratch at a context size of 4096. Furthermore instruction-tuned on 1 million Bengali input-output pairs across various Bengali NLP tasks.
This model is instruction-tuned on 1 million Bangla instructions in the form of (input,output) pairs.
This model is strictly prohibited to use for commercial purposes.
If you use our model, please cite our paper [Niyogi and Bhattacharya, 2024](https://arxiv.org/abs/2401.18034)
The architecture of Biswabangla is different than the language models, mentioned in [Niyogi and Bhattacharya, 2024](https://arxiv.org/abs/2401.18034)
### Model Architecture
Transformer Decoder Only Auto Regressive Model
### Limitations
The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet.
Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
Gyan AI Research does own the output generated from the model.
### Citations
```
@misc{niyogi2024paramanufamilynovelefficient,
title={Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages},
author={Mitodru Niyogi and Arnab Bhattacharya},
year={2024},
eprint={2401.18034},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2401.18034},
}