Sabiá-3 (Qwen-2) tokenizer v2

This new version improves the chat template by handling custom system prompts and including the new tool role.

Simple usage

messages = [
    {"role": "system", "content": "Você é um assistante inteligente."},
    {"role": "user", "content": "Olá!"},
    {"role": "assistant", "content": "Olá! Como posso ajudar você hoje?"},
    {"role": "user", "content": "Qual é a capital do Brasil?"},
]

print(tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))

Will result in the following string:

Você é um assistante inteligente. USUÁRIO: Olá! ASSISTENTE: Olá! Como posso ajudar você hoje?<|endoftext|>USUÁRIO: Qual é a capital do Brasil? ASSISTENTE:

Tools

messages = [
    {"role": "system", "content": """You are a helpful customer support assistant. Use the supplied tools to assist the user.
Uso de funções: Opcional

Lista de funções disponíveis:

<functions>
[{"type": "function", "function": {"name": "get_delivery_date", "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "The customer's order ID."}}, "required": ["order_id"], "additionalProperties": false}}}]
</functions>

Formato esperado de resposta quando houver utilização de funções:
<multiplefunctions>
    <functioncall> {"name": "function_name", "arguments": {"arg_1": "value_1", "arg_2": "value_2", ...}} </functioncall>
    <functioncall> {"name": "function_name", "arguments": {"arg_1": "value_1", "arg_2": "value_2", ...}} </functioncall>
    ...
</multiplefunctions>

Aspectos importantes:
- Chamadas de funções DEVEM seguir o formato especificado
- Parâmetros necessários DEVEM ser especificados"""},
    {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},
    {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},
    {"role": "user", "content": "i think it is order_12345"},
    {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "call_62136354",
                "type": "function",
                "function": {
                    "arguments": "{'order_id': 'order_12345'}",
                    "name": "get_delivery_date"
                }
            }
        ]
    },
    {
        "role": "tool",
        "content": json.dumps({
            "order_id": "order_12345",
            "delivery_date": "00-00-2000"
        }),
        "tool_call_id": "call_62136354"
    }
]

The resulting string will be:

You are a helpful customer support assistant. Use the supplied tools to assist the user.
Uso de funções: Opcional

Lista de funções disponíveis:

<functions>
[{"type": "function", "function": {"name": "get_delivery_date", "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "The customer's order ID."}}, "required": ["order_id"], "additionalProperties": false}}}]
</functions>

Formato esperado de resposta quando houver utilização de funções:
<multiplefunctions>
    <functioncall> {"name": "function_name", "arguments": {"arg_1": "value_1", "arg_2": "value_2", ...}} </functioncall>
    <functioncall> {"name": "function_name", "arguments": {"arg_1": "value_1", "arg_2": "value_2", ...}} </functioncall>
    ...
</multiplefunctions>

Aspectos importantes:
- Chamadas de funções DEVEM seguir o formato especificado
- Parâmetros necessários DEVEM ser especificados USUÁRIO: Hi, can you tell me the delivery date for my order? ASSISTENTE: Hi there! I can help with that. Can you please provide your order ID?<|endoftext|>USUÁRIO: i think it is order_12345 ASSISTENTE: <multiplefunctions>
    <functioncall> {"name": "get_delivery_date", "arguments": {'order_id': 'order_12345'}} </functioncall>
</multiplefunctions><|endoftext|>FUNCTION_RESPONSE: {"order_id": "order_12345", "delivery_date": "00-00-2000"} ASSISTENTE: