|
--- |
|
license: afl-3.0 |
|
language: |
|
- ru |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- humor |
|
- T5 |
|
- jokes-generation |
|
--- |
|
|
|
|
|
## Task |
|
Model create for jokes generation task on Russian language. |
|
Generate jokes from scratch is too difficult task. Too make it easier jokes was splitted into setup and punch pairs. |
|
Each setup can produce infinite number of punches so inspiration was also introduced, |
|
which means main idea (or main word) of punch for given setup. In the real world, jokes come in different qualities (bad, good, funny, ...). |
|
Therefore, in order for the models to distinguish them from each other, a mark was introduced. It ranges from 0 (not a joke) to 5 (golden joke). |
|
|
|
|
|
## Info |
|
Model trained using flax on huge dataset with jokes and anekdots on different tasks: |
|
1. Span masks (dataset size: 850K) |
|
2. Conditional generation tasks (simultaneously): |
|
a. Generate inspiration by given setup (dataset size: 230K) |
|
b. Generate punch by given setup and inspiration (dataset size: 240K) |
|
c. Generate mark by given setup and punch (dataset size: 200K) |
|
|
|
## Ethical considerations and risks |
|
Model is fine-tuned on a large corpus of humorous text data scraped from from websites/telegram channels with anecdotes, shortliners, jokes. |
|
Text was not filtered for explicit content or assessed for existing biases. |
|
As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases |
|
in the underlying data. |
|
Please don't take it seriously. |
|
|