# Setup

In [None]:
%pip install -U -q transformers huggingface-hub

In [1]:
from huggingface_hub import InferenceClient, login
from transformers import AutoTokenizer

In [2]:
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

<div class="alert alert-danger" role="alert" style="display: flex; align-items: center;">
    <div style="text-align: center; padding-right: 10px;">
        <i class="fa fa-exclamation-triangle fa-2x"></i>
    </div>
    <div style="display: flex; align-items: center; margin-top: 4px;"> <!-- Added margin-top to lower the text -->
        <strong>Warning: You will need to point to a model/deployment that is running.</strong>
    </div>
</div>


In [3]:
MODEL = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
client = InferenceClient(MODEL)

# Translation
Our goal is to explore translation between English and Arabic and how prompt engineering can impact it. There has been [some work](https://arxiv.org/pdf/2308.01391), but we didn't find as much as we were hoping, especially for open source models.

We have created a dataset [arabic-translation-prompt-engineering/TpDwD](https://huggingface.co/datasets/arabic-translation-prompt-engineering/TpDwD) across 6 domains and want to compare each method by having human rankers. We also have human translations to ground these rankings.

We will evaluate the following methods:
- Baseline
- Manual Purpose Driven
- Automatic Purpose Driven
- Automatic Motivation Driven

## Baseline

For our baseline we will translate with a simple system prompt and instruction.

### System Prompt
This is a pretty basic system prompt. We give a role, and an assumed understanding. We also push for goals like "highly motivated and detail-oriented".

> You are a skilled translator with extensive experience in English to Arabic translations. You possess a deep understanding of the linguistic, cultural, and contextual nuances essential for accurate and effective translation between these languages. Highly motivated and detail-oriented, you are committed to delivering translations that maintain the integrity and intent of the original text. Your role is crucial in ensuring clear and precise communication in our multilingual system.

In [4]:
baseline_system_prompt = """You are a skilled translator with extensive experience in English and Arabic translations. You possess a deep understanding of the linguistic, cultural, and contextual nuances essential for accurate and effective translation between these languages. Highly motivated and detail-oriented, you are committed to delivering translations that maintain the integrity and intent of the original text. Your role is crucial in ensuring clear and precise communication in our multilingual system."""

### Instruction
> Translate this from english to arabic: {translation_input}.
>
> Translation:

We will use a simple instruction to get a translation.

In [5]:
def baseline_chat_completion(translation_input):
    """
    Generates a completion for a chat conversation using a specified system prompt and a user input.
    """
    messages = [
        {"role": "system", "content": baseline_system_prompt},
        {
            "role": "user",
            "content": f"Translate this from english to arabic: {translation_input}.\nTranslation: ",
        },
    ]
    return client.chat_completion(messages, max_tokens=10_000)

In [6]:
translation_input = "Float like a butterfly sting like a bee – his hands can’t hit what his eyes can’t see."
response = baseline_chat_completion(
    translation_input,
)

### Token Cost
Here we can see that the cost is quite cheap, only 96 tokens!

In [7]:
f"Baseline Total Prompt tokens: {response.usage.prompt_tokens - len(tokenizer(translation_input, return_tensors='pt')['input_ids'][0])}"

'Baseline Total Prompt tokens: 96'

In [8]:
print(response.choices[0].message.content)

يسبح في الحلبة كالفراشة ويلسع كالنحلة - لا يمكن ليديه أن تصيبا ما لا تستطيع عيناه رؤيته


## Manual Purpose Driven Translation

[Optimizing Machine Translation through Prompt Engineering](https://arxiv.org/pdf/2308.01391) has done some good exploratory work in examining how prompt engineering can impact translation. They were working between Japanese and English and showed that translations influenced by prompts tailored to **specific purposes** and **target audiences** generally adhered more closely to the translation specifications, suggesting that such prompted translations could be more culturally and contextually appropriate than standard machine translations.

### Prompt
One of the approaches in the paper was to include the purpose and target audience specification. This was motivated by the author’s experience as a professional translator, leading to the conclusion that these two parameters are essential even in everyday translation work. You can find the prompt below adapted for Arabic to English:

> Translate the following English [source text] into Arabic. Please fulfill the following conditions when translating.    
> Purpose of the translation: `<Manual description>`  
> Target audience: `<Manual description>`  
> [source text] `{translation_input}`  
> [translated text]

You can see that we need to provide the Purpose and the Target Audience for each translation. This makes sense as we will be able to steer our model appropriately, but the drawback is that we need to do this for each subject. In the real world this likely won't scale and is rather tedious.

Lets go ahead and create these for each of our datasets.

In [9]:
dataset_to_purpose_target = {
    "ELRC-24ss": {
        "purpose": "Enhancing understanding and knowledge about COVID-19 and health-related topics.",
        "audience": "Individuals seeking reliable and comprehensible information about COVID-19 and related health topics.",
    },
    "GNOME-25ss": {
        "purpose": "Facilitating localization and translation of GNOME software.",
        "audience": "Translators and developers working on GNOME projects."
    },
    "HPLT-25ss": {
        "purpose": "Providing multilingual data for high-performance language technologies.",
        "audience": "Researchers and developers working on multilingual NLP applications."
    },
    "OpenSubtitles-25ss": {
        "purpose": "Creating parallel corpora from movie and TV subtitles.",
        "audience": "Researchers and developers in NLP and machine translation. And Movies and TV Shows translators"
    },
    "TED2020-25ss": {
        "purpose": "Generating multilingual sentence embeddings using TED transcripts.",
        "audience": "Researchers and developers working on multilingual sentence embeddings."
    },
    "UNPC-24ss": {
        "purpose": "Offering a parallel corpus of United Nations documents for linguistic research.",
        "audience": "Researchers and linguists studying multilingual and legal texts."
    }
}


In [10]:
# Define the translation tool function
purpose_driven_translation_tools = [
    {
        "type": "function",
        "function": {
            "name": "purpose_driven_translation",
            "description": "Translate given the purpose and the target audience.",
            "parameters": {
                "type": "object",
                "properties": {
                    "translation": {
                        "type": "string",
                        "description": "The translated \"source_text\".",
                    },
                },
                "required": ["translation"],
            },
        },
    }
]

# Create the purpose-driven chat completion function using function calling
def purpose_driven_chat_completion(translation_input, dataset):
    """
    Generates a completion for a chat conversation using a specified system prompt and a user input,
    incorporating function calling to retrieve translation context.
    """
    
    # Prepare the prompt
    prompt = f"""Translate the English "source text" into Arabic. Please fulfill the "Purpose of the translation" and tailor it to the "target audience". Respond in a json format with just the translation as the key.
{{
    "Purpose of the translation": "{dataset_to_purpose_target[dataset]['purpose']}"
    "Target audience": "{dataset_to_purpose_target[dataset]['audience']}"
    "source text" `{translation_input}`
}} 
Translation json: """

    # Initial messages, including the function call to get context
    messages = [
        {"role": "system", "content": baseline_system_prompt},
        {
            "role": "user",
            "content": prompt,
        },
    ]

    
    # Call the chat completion API with the function tools and specific tool choice
    return client.chat_completion(messages, max_tokens=10_000, tools=purpose_driven_translation_tools, tool_choice='purpose_driven_translation')

In [11]:
translation_input = "We have observed that when groups of stakeholders work to define … visions, this leads to debate over whether to emphasize ecosystem health or human well-being … Whether the priority is ecosystems or people greatly influences stakeholders' assessment of desirable ecological and social states."
response = purpose_driven_chat_completion(translation_input, "ELRC-24ss")

In [12]:
f"Manual Purpose Driven Total Prompt tokens: {response.usage.prompt_tokens - len(tokenizer(translation_input, return_tensors='pt')['input_ids'][0])}"

'Manual Purpose Driven Total Prompt tokens: 350'

In [13]:
from pprint import pprint
description_json = response.choices[0].message.tool_calls[0].function.arguments
pprint(description_json)

{'translation': 'لاحظنا أنه عندما تعمل مجموعات أصحاب المصلحة على تحديد ... '
                'الرؤى، فإن هذا يؤدي إلى نقاش حول ما إذا كان ينبغي التركيز على '
                'صحة النظام البيئي أو رفاهية الإنسان ... إن مسألة ما إذا كان '
                'الأولوية للنظم البيئية أو الناس تؤثر بشكل كبير على تقييم '
                'أصحاب المصلحة للحالات الاجتماعية والبيئية المرغوبة.'}


## Automatic Purpose Driven Structured Generation Translation

Manual Purpose Driven Translation is a great step in the right direction, but its challenging to scale. Instead of having the user submit these purposes and target audiences, what if we use a model to do that? The easiest way to get this input in a format that is convenient is going to be by using [structured generation](https://huggingface.co/blog/evaluation-structured-outputs) to get a json. We can easily do this in InferenceClient easily just by using [tools](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion.tools)

## Instruction

Its usually helpful if we tell the LLM what we want to create when we prompt it. 

> ```I want to translate the following source_text from English into Arabic. But first I want to create a json that includes the following:
{"subject": "", "assumptions relating to content": "", "purpose": "", "target audience": ""}.
Can you fill this out and be specific to how this can help you translate in the next step? No need to translate yet!
{
    "source_text": {translation_input}
}```



## Tool Definition

In [14]:
automatic_purpose_driven_translation_tools = [
    {
        "type": "function",
        "function": {
            "name": "get_translation_audience_purpose",
            "description": "Get the background of a text to assist in translation",
            "parameters": {
                "type": "object",
                "properties": {
                    "subject": {
                        "type": "string",
                        "description": "The topic or central theme that the text revolves around.",
                    },
                    "assumptions relating to the content": {
                        "type": "string",
                        "description": "Write out any assumptions relating to the text.",
                    },
                    "purpose": {
                        "type": "string",
                        "description": "Why the text was written",
                    },
                    "audience": {
                        "type": "string",
                        "description": "The inferred audience that the text is written for.",
                    },
                },
                "required": [
                    "subject",
                    "assumptions relating to the content",
                    "purpose",
                    "audience",
                ],
            },
        },
    }
]

In [15]:
def tool_call_chat_completion(translation_input):
    """
    Generates a completion for a chat conversation using a specified system prompt and a user input.
    """

    prompt = f"""I want to translate the following source_text from English into Arabic. But first I want to create a json that includes the following:
{{"subject": "", "assumptions relating to content": "", "purpose": "", "target audience": ""}}.
Can you fill this out and be specific to how this can help you translate in the next step? No need to translate yet!
{{
    "source_text": {translation_input}
}}
"""
    messages = [
        {"role": "system", "content": baseline_system_prompt},
        {
            "role": "user",
            "content": prompt,
        },
    ]
    return client.chat_completion(messages, max_tokens=10_000, tools=automatic_purpose_driven_translation_tools, tool_choice='get_translation_audience_purpose')

In [16]:
translation_input = "We have observed that when groups of stakeholders work to define … visions, this leads to debate over whether to emphasize ecosystem health or human well-being … Whether the priority is ecosystems or people greatly influences stakeholders' assessment of desirable ecological and social states."
response = tool_call_chat_completion(translation_input)

In [17]:
f"Function Calling Prompt tokens: {response.usage.prompt_tokens - len(tokenizer(translation_input, return_tensors='pt')['input_ids'][0])}"

'Function Calling Prompt tokens: 406'

In [18]:
from pprint import pprint
description_json = response.choices[0].message.tool_calls[0].function.arguments
pprint(description_json)

{'assumptions relating to the content': 'The source text assumes that there is '
                                        'a debate between ecological health '
                                        'and human well-being, and that '
                                        'stakeholders have different '
                                        'priorities that influence their '
                                        'assessment of desirable ecological '
                                        'and social outcomes.',
 'audience': 'Individuals interested in environmental policy, ecology, '
             'sustainability, and/or stakeholder engagement.',
 'purpose': 'To communicate observations about the varying priorities of '
            'different stakeholder groups and how these priorities impact '
            'their definition of vision, particularly in the context of '
            'ecosystem health versus human well-being.',
 'subject': 'Stakeholder priorities and their impact on defin

In [19]:
def automatic_purpose_driven_chat_completion(translation_input, description_json):
    """
    Generates a completion for a chat conversation using a specified system prompt and a user input.
    """

    prompt = f"""Given the following description translate source_text from English to Arabic
{{
    "description": {description_json},
    "translation": {translation_input}
}}
Translation:
"""
    messages = [
        {"role": "system", "content": baseline_system_prompt},
        {"role": "user", "content": prompt},
    ]
    return client.chat_completion(messages, max_tokens=10_000)

In [20]:
response = automatic_purpose_driven_chat_completion(translation_input, description_json)

In [21]:
f"Automatic Purpose Driven Total Prompt tokens: {response.usage.prompt_tokens - len(tokenizer(translation_input, return_tensors='pt')['input_ids'][0])}"

'Automatic Purpose Driven Total Prompt tokens: 235'

In [22]:
print(response.choices[0].message.content)

{
    "description": {
        "الافتراضات المتعلقة بالمحتوى": "يفترض النص المصدري وجود نقاش بين الصحة البيئية ورفاهية الإنسان، وأن أصحاب المصلحة لديهم أولويات مختلفة تؤثر على تقييمهم للنتائج البيئية والاجتماعية المرجوة.",
        "الجمهور": "الأفراد المهتمون بالسياسة البيئية، أو علم البيئة، أو الاستدامة، و/أو مشاركة أصحاب المصلحة.",
        "الغرض": "إيصال الملاحظات حول الأولويات المتنوعة لمجموعات أصحاب المصلحة المختلفة، وكيف تؤثر هذه الأولويات على تعريفهم للرؤى، خاصة في سياق صحة الأنظمة البيئية مقابل رفاهية الإنسان.",
        "الموضوع": "أولويات أصحاب المصلحة وتأثيرها على تحديد الرؤى المتعلقة بالنتائج البيئية والاجتماعية."
    },
    "الترجمة": "لاحظنا أنه عندما تعمل مجموعات أصحاب المصلحة على تحديد ... الرؤى، فإن هذا يؤدي إلى نقاش حول ما إذا كان ينبغي التأكيد على صحة النظام البيئي أو رفاهية الإنسان ... سواء كانت الأولوية للنظم البيئية أو للبشر يؤثر بشكل كبير على تقييم أصحاب المصلحة للحالات البيئية والاجتماعية المرغوبة."
}


### Helper Function

In [23]:
def automatic_purpose_driven_chat(translation_input):
    response = tool_call_chat_completion(translation_input)
    description_json = response.choices[0].message.tool_calls[0].function.arguments
    return automatic_purpose_driven_chat_completion(translation_input, description_json)

In [24]:
automatic_purpose_driven_chat("This is a test").choices[0].message.content

'الافتراضات المتعلقة بالمحتوى: لا توجد افتراضات محددة.\n\nالجمهور المستهدف: جمهور عام لا يحتاج إلى معرفة تقنية محددة.\n\nالغرض: نقل رسالة بسيطة لاختبار الترجمة.\n\nالموضوع: اختبار الترجمة\n\nالترجمة: هذا اختبار'

## Dataset Creation

In [25]:
from datasets import load_dataset

subsets = ['ELRC-24ss', 'GNOME-25ss', 'HPLT-25ss', 'OpenSubtitles-25ss', 'TED2020-25ss', 'UNPC-24ss']

# Iterate over each subset
for subset in subsets:
    # Load the dataset for the specific subset
    dataset = load_dataset("arabic-translation-prompt-engineering/TpDwD", subset)

    # Rename the columns
    dataset = dataset.rename_column("ar_text", "human_translation")
    dataset = dataset.rename_column("en_text", "source_text")

    # Apply functions to add new columns
    dataset = dataset.map(lambda example: {
        "baseline_translation": baseline_chat_completion(example['source_text']).choices[0].message.content,
        "purpose_driven_translation": purpose_driven_chat_completion(example['source_text'], subset).choices[0].message.tool_calls[0].function.arguments['translation'],
        "automatic_purpose_driven_translation": automatic_purpose_driven_chat(example['source_text']).choices[0].message.content
    })
    
    # Push the processed dataset to the Hub
    dataset.push_to_hub(f"arabic-translation-prompt-engineering/TpDwD_translated",subset)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Map:   0%|          | 0/24 [00:00<?, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Downloading data:   0%|          | 0.00/5.38k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25 [00:00<?, ? examples/s]

Map:   0%|          | 0/25 [00:00<?, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/599 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/13.8k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25 [00:00<?, ? examples/s]

Map:   0%|          | 0/25 [00:00<?, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/5.74k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25 [00:00<?, ? examples/s]

Map:   0%|          | 0/25 [00:00<?, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/1.74k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/10.9k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25 [00:00<?, ? examples/s]

Map:   0%|          | 0/25 [00:00<?, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/2.34k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/24 [00:00<?, ? examples/s]

Map:   0%|          | 0/24 [00:00<?, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/2.91k [00:00<?, ?B/s]

# Push to the hub

In [None]:
from huggingface_hub import HfApi

api = HfApi()
api.upload_file(
    path_or_fileobj="translate-prompts.ipynb",
    path_in_repo="translate-prompts.ipynb",
    repo_id="arabic-translation-prompt-engineering/atpe-notebooks",
    repo_type="model",
)