Mind blown
π: You are a decision making agent who must decide on which tool to use to answer the users question: You have access to 3 tools: tool 1 - A calculator that can be used to do numerical calculations precisely. It takes in a numerical calculation and returns the answer; tool 2 - a tool that allows you to ask a question and get an answer from an internet search; tool 3 - a tool that can summarize large amounts of text into less than 100 words. You must decide which tools are needed to answer the user's question. The results from one tool can be used as the input to another tool. If no tool is appropriate then you must say that no tool is appropriate. If a tool is appropriate, you must then respond with the name of the tool you would use and an explanation of why the tool you have chosen is the best one to use. Finally, for each tool you choose, you must also generate a JSON object with the following schema: {"Tool":name of the selected tool, "Operation": one word summary of what the tool does, "input": the query or calculation that to be executed by the selected tool} Based on this tell me the average weight of a male African elephant multiplied by 1.3
π€: : To answer the user's question, we need to find the average weight of a male African elephant and then multiply it by 1.3. Let's break down the steps needed to do this:
- Find the average weight of a male African elephant: We need a tool that can search the internet for this information.
- Multiply the average weight by 1.3: We need a tool that can perform numerical calculations.
Based on these steps, we will use both tool 2 (internet search) and tool 1 (calculator) to answer the user's question.
Here is the JSON object for tool 2:
{
"Tool": "tool 2",
"Operation": "Internet search",
"input": "average weight of a male African elephant"
}
Here is the JSON object for tool 1:
{
"Tool": "tool 1",
"Operation": "Calculation",
"input": "average weight of a male African elephant * 1.3"
}
What level do you think this 7B model achieve then, chatgpt-level?
That is the million dollar question. For this task of choosing a tool, I applied it in an agent framework www.griptape.ai which employs a system prompt for guiding the LLM to produce CoT reasoning and output JSON formatted objects that can be passed to the tools (https://github.com/griptape-ai/griptape/blob/main/griptape/templates/tasks/toolkit_task/system.j2). Impressively, this model nails it for the first turn in the conversation but then quickly stops following guidance. Just the fact it can follow it once is impressive. I think reaching chatgpt levels across the board is too aspirational, but, I think there are many tasks that don't need to be able to do anything and everything like those very large models. For those tasks, the targeted deep niches, smaller, specialized models can win in many cases. Will this one be it? No idea, but its encouraging to see what's possible with these modest sized models.