File size: 1,841 Bytes
d2051f9
 
 
05de0ea
 
 
d2051f9
05de0ea
 
 
d2051f9
 
05de0ea
d2051f9
311a854
d2051f9
 
05de0ea
d2051f9
 
 
 
05de0ea
 
 
 
 
d2051f9
05de0ea
d2051f9
05de0ea
 
 
 
 
 
 
 
 
 
 
 
 
 
d2051f9
 
05de0ea
 
d2051f9
 
05de0ea
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
tags:
- fastai
- text-translation

language: ml

widget:
- text: "കേൾക്കുന്ന എല്ലാ കാര്യങ്ങളും എനിക്കു മനസിലായില്ല"
  example_title: "Malayalam Seq2Seq translation"


---

# മലയാളം - English ULMFit translationmodel. (Working in Progress)


[![മലയാളം: kaggle notebook](https://img.shields.io/badge/മലയാളം%20-notebook-green.svg)](https://www.kaggle.com/code/rajeshradhakrishnan/ml-ulmfit-seq2seq-translation)


---

# malayalam-ULMFit-Seq2Seq (Traslation model)

malayalam-ULMFit-Seq2Seq model is pre-trained on [Malyalam_Language_Model_ULMFiT](https://github.com/goru001/nlp-for-malyalam/blob/master/language-model/Malyalam_Language_Model_ULMFiT.ipynb) using  [fastai](https://docs.fast.ai/text.data.html) Language Model using fastai 

Tokenized using Sentencepiece with a vocab size of 10000 the language model is upload to [kaggle dataset](https://www.kaggle.com/datasets/rajeshradhakrishnan/ulmfit-fastai)

## Usage

```
!pip install -Uqq huggingface_hub["fastai"]

from huggingface_hub import from_pretrained_fastai
learner = from_pretrained_fastai(repo_id)

original_xtext = 'കേൾക്കുന്ന എല്ലാ കാര്യങ്ങളും എനിക്കു മനസിലായില്ല'
original_ytext = 'I didnt understand all this'
predicted_text = learner.predict(original_xtext)
print(f'original text: {original_xtext}')
print(f'original answer: {original_ytext}')
print(f'predicted text: {predicted_text}')

```

## Intended uses & limitations

It's not fine tuned to the state of the art accuracy

## Training and evaluation data

[Malayalam Samanantar Dataset - uploaded to kaggle with english - malayalam ](https://www.kaggle.com/datasets/rajeshradhakrishnan/ulmfit-fastai)