javi8979 commited on
Commit
a90c2f0
1 Parent(s): 883c1f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -24,7 +24,7 @@ pipeline_tag: translation
24
 
25
  - [Model description](#model-description)
26
  - [Intended uses and limitations](#intended-uses-and-limitations)
27
- - [How to use](#how-to-use)
28
  - [Training](#training)
29
  - [Evaluation](#evaluation)
30
  - [Citation](#citation)
@@ -44,4 +44,30 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
44
 
45
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
46
 
47
- For more details regarding the model architecture take a look at the paper which is available on [arXiv]().
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  - [Model description](#model-description)
26
  - [Intended uses and limitations](#intended-uses-and-limitations)
27
+ - [Run the model](#Run-the-model)
28
  - [Training](#training)
29
  - [Evaluation](#evaluation)
30
  - [Citation](#citation)
 
44
 
45
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
46
 
47
+ For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv]().
48
+
49
+ ## Intended Uses and Limitations
50
+
51
+ The model is proficient in 16 supervised translation directions that include Catalan and is capable of translating in other 56 zero-shot directions as well.
52
+
53
+ At the time of submission, no measures have been taken to estimate the bias and added toxicity embedded in the model. However, we are aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
54
+
55
+ ## Run the model
56
+
57
+
58
+ ```python
59
+ from transformers import AutoTokenizer, AutoModelForCausalLM
60
+
61
+ model_id = "projecte-aina/Plume32k"
62
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
63
+ model = AutoModelForCausalLM.from_pretrained(model_id)
64
+
65
+ src_lang_code = 'spa_Latn'
66
+ tgt_lang_code = 'cat_Latn'
67
+ sentence = 'Ayer se fue, tomó sus cosas y se puso a navegar.'
68
+ prompt = '<s> [{}] {} \n[{}]'.format(src_lang_code, sentence, tgt_lang_code)
69
+ input_ids = tokenizer(prompt, return_tensors='pt').input_ids
70
+ output_ids = model.generate( input_ids, max_length=200, num_beams=5 )
71
+ input_length = input_ids.shape[1]
72
+ generated_text = tokenizer.decode(output_ids[0, input_length: ], skip_special_tokens=True).strip()
73
+ ```