Add tool use template
This PR is a work in progress, please don't merge it yet! It is intended to be used with the PR branch here. It adds proper tool use support to Hermes-2-Pro's chat template. You can test it out with the following code block, after installing from the PR branch with pip install --upgrade git+https://github.com/huggingface/transformers.git@new_chat_template_args
from transformers import AutoTokenizer
from typing import Dict
def get_stock_fundamentals(symbol: str) -> Dict:
"""
Get fundamental data for a given stock symbol using yfinance API.
Args:
symbol: The stock symbol.
Returns:
A dictionary containing fundamental data.
Keys:
- 'symbol': The stock symbol.
- 'company_name': The long name of the company.
- 'sector': The sector to which the company belongs.
- 'industry': The industry to which the company belongs.
- 'market_cap': The market capitalization of the company.
- 'pe_ratio': The forward price-to-earnings ratio.
- 'pb_ratio': The price-to-book ratio.
- 'dividend_yield': The dividend yield.
- 'eps': The trailing earnings per share.
- 'beta': The beta value of the stock.
- '52_week_high': The 52-week high price of the stock.
- '52_week_low': The 52-week low price of the stock.
"""
pass
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Hermes-2-Pro-Llama-3-8B", revision="pr/13")
test_chat = [{"role": "user", "content": "Fetch the stock fundamentals data for Tesla (TSLA)"}]
tools = [get_stock_fundamentals]
inputs = tokenizer.apply_chat_template(test_chat, tools=tools, chat_template="tool_use", tokenize=False, add_generation_prompt=True)
print(inputs)
Thanks for the PR Matt — this is super useful for models that support tool-use like Hermes-2-Pro. Will test it out soon! 👏
@Rocketknight1
Reminder to add | tojson
when rendering the tool.parameters:
{{- tool.parameters }}
produces inconsistent values across Python, Rust, and JS (e.g., python produces single quotes, while JS/Rust produce double quotes for obj -> str)
->
{{- tool.parameters | tojson }}
should fix this.
@Xenova done!
Working on implementing tool use in vLLM's openAI-compatible API and this would be awesome! Digging into it now.
HI
@Rocketknight1
and
@Xenova
- just a quick request for an addition both to this and to your PR Into Mistral 7B instruct v0.3.
Currently both tokenizer configs have a bos_token
field in their tokenizer_config.json.
Would it be possible to add a similarbot_token
field (bot -> Beginning of Tool call) or maybe tool_token
to the tokenizer config that indicates when a tool call is being generated by the model as opposed to a chat completion?
For Hermes 2 Pro this would look like "bot_token": "<tool_call>"
for Hermes 2 Pro and for Mistral 7b instruct v0.3 it would be `"bot_token": "[TOOL_CALLS]"
In some cases, there may need to be a eot_token
for end-of-tool call e..g </tool_call>
for Hermes 2 Pro I don't think there is one (although I could be wrong).
Why am I asking for this?
This will provide a standardized way for open-source serving tools & frameworks to determine which token to look for when trying to infer if the response that the model has started generating is a chat response or a tool call, so that the tool/framework can provide an appropriate response to the client that is indicative of that fact.
This has been a sticking point for me when trying to implement OpenAI API-compatible tool calling into vLLM, and in particular trying to implement SSE streaming of said tool calls in an OpenAI-compatible way.
Depending on the model that you're using, you have to know which token to look for that indicates the start of a tool call (and sometimes for the end of one as well!), and right now I'm stuck hard-coding it depending on which model I'm using, since there's nowhere that I can look it up in the tokenizer or tokenizer config. The tokens are in the tokenizer, but I can't "auto-detect" which tokens are the right ones to look for because there's no field that tells me. I need a consistent way, provided in the tokenizer/tokenizer config, to look up which token indicates the start of a tool call so that I can handle sending either a chat completion response or a tool call response normally, but also more importantly for streaming since OpenAI API uses different SSE formats when streaming tool calls vs. chat completions.
I anticipate that this will be a problem for maintainers and contributors of other open-source serving frameworks that try to provide an OpenAI API-compatible interface as well, so I think it would be really great to have this!
Please let me know your thoughts on this :)
cc @interstellarninja @teknium per our conversations in the Nous Research discord here
Hi
@kmistele
, yes! This is a problem we've been thinking a lot about too. The problem, though, is that even if we can identify the markers that surround the tool call region, the actual tool calls can have very different formats between different models. For example, Command-R uses Python tool defs, so its tools look very different from the tools in other models. This means that even if we set a bot_token
for tool models, users will still need to hardcode tool parsing for each model.
Instead, what we're thinking of is a "reverse chat template" feature. The plan is:
- Reverse chat templates are Jinja which can be run in a sandbox, so we don't have to worry about remote code execution.
- Reverse chat templates are included with the model tokenizer, like chat templates are.
- Reverse chat templates take a rendered/formatted string as input, and output a list of message dicts (and optionally tool dicts). They are the inverse of a normal chat template.
- This may require some string operations or regexes that Jinja won't natively have support for, but we can write these in Python and add them as Jinja extensions which will be callable inside the template.
Although this creates a bit more work for model authors (and for me, lol), it should hopefully result in a very clean UX once a model has a reverse chat template, and make it very easy to automatically extract things like tool calls in the standard dict format, even in complex chats with models that render tools in very unusual ways. WDYT?
Thanks for the fast response
@Rocketknight1
! I actually had this exact same conversation with someone this morning and we identified this as a problem as well. Different models' tool call formats are totally different, and
@interstellarninja
flagged that the Hermes 2 Models actually enclose separate tool calls in separate XML tags so instead of <tool_call>[{...}]</tool_call>
you get <tool_call>{...}</tool_call><tool_call>{...}</tool_call>
, and also noted that Hermes plans to extend its' tool call responses in the future to include a <scratch_pad>
section.
This really challenged my assumption it was going to be as easy as looking for a bot_token
or something similar. Someone in the Nous discord suggested using regex and/or XML parsers both of which are non-starters for various reasons including lack of flexibility for different models' tool call formats, possible security/DOS issues with regex, etc. And, to your point, there is a wide breadth of tool calling formats for different models. It would be difficult to implement and support a dozen different XML/regex parsers for different models in a serving engine, and you would still have to know which one to use for which model.
The idea of a reverse chat template had actually occurred to me, but I didn't think it was actually a real thing that jinja supports (I have only used regular jinja templates and haven't read the docs from end-to-end) so I was kind of stuck on this :P
Being able to auto-detect a reverse template from the tokenizer_config.json
that you can use to transform tool calls into a uniform schema/protocol like OpenAI's format is (I think) the ideal solution - this would create a much better DX than having to implement custom parsing code for each separate model/tool call format and since (for better or for worse) OpenAI's API constructs are treated as the default protocol for LLM serving engines, this is an important feature to have in serving engines.
So I think this would be the ideal solution. Happy to help on this in any way that I can!
Well, I wouldn't say that Jinja "supports" it, lol, but I think we can make it work. Jinja is fundamentally a template engine; it doesn't return anything, it always outputs a string using a template, some inputs and the functions and methods it has access to. However, it does have a lot of methods around e.g. JSON support, and internally it can work with objects like lists and dicts. Although we might need to give it access to some string search tools to make it work, I think it should be possible for a template to parse out blocks of the input, add them to dicts or lists, and finally render those dicts/lists as JSON, which the calling program could just load with json.loads()
or similar.
I'll probably have to try writing a couple of examples for this before I get a feel for the extensions it needs, though! And thanks for the offer of help - I'll definitely ping you when I open the draft PR, but in the meantime, if you have any ideas about the functions that would need to be callable inside the template to enable a parser like this, let me know!
Hmm. Looks like this might be helpful, even if only in terms of how to think about framing the template reversing, although obviously it's quite verbose - https://github.com/mudler/LocalAI/pull/2445
Hi
@Rocketknight1
and
@kmistele
great discussion and initiative regarding tool call parsing template as part of tokenizer_config.json
I reiterate Matt's points with regards to using a single bot_token
since Hermes models will have more added_tokens
as part of agentic workflow such as scratch_pad
has already been sort of approved as part of new function calling format.
In light of that I would like to share a yaml
config from LocalAI API that supports Hermes Pro model's multiple tool calls + scratchpad using a yaml config file with jinja templates.
Basically this kind of config file not only allows us to potentially change the tool call tokens like <tool_call>
but also parse additional output such as <scratch_pad>
and more!
here's the config for Hermes Pro:
context_size: 8192
f16: true
mmap: true
name: hermes-2-theta-llama-3-8b
parameters:
model: huggingface://NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-Instruct-Merged-DPO-Q4_K_M.gguf
stopwords:
- <|im_end|>
function:
# disable injecting the "answer" tool
disable_no_action: true
return_name_in_function_response: true
grammar:
#disable: true
mixed_mode: true
parallel_calls: true
# It might be needed when parallel tools should be processed
expect_strings_after_json: true
json_regex_match:
- "(?s)<tool_call>(.*?)</tool_call>"
- "(?s)<tool_call>(.*)"
capture_llm_results:
- (?s)<scratchpad>(.*?)</scratchpad>
replace_llm_results:
- key: (?s)<scratchpad>(.*?)</scratchpad>
value: ""
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
{{- if .FunctionCall }}
<tool_call>
{{- else if eq .RoleName "tool" }}
<tool_response>
{{- end }}
{{- if .Content}}
{{.Content }}
{{- end }}
{{- if .FunctionCall}}
{{toJson .FunctionCall}}
{{- end }}
{{- if .FunctionCall }}
</tool_call>
{{- else if eq .RoleName "tool" }}
</tool_response>
{{- end }}<|im_end|>
completion: |
{{.Input}}
function: |
<|im_start|>system
You are a function calling AI model.
Here are the available tools:
<tools>
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
</tools>
You should call the tools provided to you sequentially
Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
<scratchpad>
{step-by-step reasoning and plan in bullet points}
</scratchpad>
For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|im_end|>
{{.Input -}}
<|im_start|>assistant
as you can see we can actually also define the exact regex match
while also having some control over the system prompt for tool inputs.
https://github.com/mudler/LocalAI/pull/2445
Ah yep and then at this point it should just be necessary to implement the functions in this file from LocalAI into javascript, python etc
package functions
import (
"encoding/json"
"regexp"
"strings"
"github.com/go-skynet/LocalAI/pkg/utils"
"github.com/rs/zerolog/log"
)
type GrammarConfig struct {
// ParallelCalls enables the LLM to return multiple function calls in the same response
ParallelCalls bool `yaml:"parallel_calls"`
DisableParallelNewLines bool `yaml:"disable_parallel_new_lines"`
// MixedMode enables the LLM to return strings and not only JSON objects
// This is useful for models to not constraing returning only JSON and also messages back to the user
MixedMode bool `yaml:"mixed_mode"`
// NoMixedFreeString disables the mixed mode for free strings
// In this way if the LLM selects a free string, it won't be mixed necessarly with JSON objects
NoMixedFreeString bool `yaml:"no_mixed_free_string"`
// NoGrammar disables the grammar parsing and parses the responses directly from the LLM
NoGrammar bool `yaml:"disable"`
// Prefix is the suffix to append to the grammar when being generated
// This is useful when models prepend a tag before returning JSON
Prefix string `yaml:"prefix"`
// ExpectStringsAfterJSON enables mixed string suffix
ExpectStringsAfterJSON bool `yaml:"expect_strings_after_json"`
}
// FunctionsConfig is the configuration for the tool/function call.
// It includes setting to map the function name and arguments from the response
// and, for instance, also if processing the requests with BNF grammars.
type FunctionsConfig struct {
// DisableNoAction disables the "no action" tool
// By default we inject a tool that does nothing and is used to return an answer from the LLM
DisableNoAction bool `yaml:"disable_no_action"`
// Grammar is the configuration for the grammar
GrammarConfig GrammarConfig `yaml:"grammar"`
// NoActionFunctionName is the name of the function that does nothing. It defaults to "answer"
NoActionFunctionName string `yaml:"no_action_function_name"`
// NoActionDescriptionName is the name of the function that returns the description of the no action function
NoActionDescriptionName string `yaml:"no_action_description_name"`
// ResponseRegex is a named regex to extract the function name and arguments from the response
ResponseRegex []string `yaml:"response_regex"`
// JSONRegexMatch is a regex to extract the JSON object from the response
JSONRegexMatch []string `yaml:"json_regex_match"`
// ReplaceFunctionResults allow to replace strings in the results before parsing them
ReplaceFunctionResults []ReplaceResult `yaml:"replace_function_results"`
// ReplaceLLMResult allow to replace strings in the results before parsing them
ReplaceLLMResult []ReplaceResult `yaml:"replace_llm_results"`
// CaptureLLMResult is a regex to extract a string from the LLM response
// that is used as return string when using tools.
// This is useful for e.g. if the LLM outputs a reasoning and we want to get the reasoning as a string back
CaptureLLMResult []string `yaml:"capture_llm_results"`
// FunctionName enable the LLM to return { "name": "function_name", "arguments": { "arg1": "value1", "arg2": "value2" } }
// instead of { "function": "function_name", "arguments": { "arg1": "value1", "arg2": "value2" } }.
// This might be useful for certain models trained with the function name as the first token.
FunctionName bool `yaml:"return_name_in_function_response"`
}
type ReplaceResult struct {
Key string `yaml:"key"`
Value string `yaml:"value"`
}
type FuncCallResults struct {
Name string
Arguments string
}
func (g GrammarConfig) Options() []func(o *GrammarOption) {
opts := []func(o *GrammarOption){}
if g.MixedMode {
opts = append(opts, EnableMaybeString)
}
if g.ParallelCalls {
opts = append(opts, EnableMaybeArray)
}
if g.DisableParallelNewLines {
opts = append(opts, DisableParallelNewLines)
}
if g.Prefix != "" {
opts = append(opts, SetPrefix(g.Prefix))
}
if g.NoMixedFreeString {
opts = append(opts, NoMixedFreeString)
}
if g.ExpectStringsAfterJSON {
opts = append(opts, ExpectStringsAfterJSON)
}
return opts
}
func CleanupLLMResult(llmresult string, functionConfig FunctionsConfig) string {
log.Debug().Msgf("LLM result: %s", llmresult)
for _, item := range functionConfig.ReplaceLLMResult {
k, v := item.Key, item.Value
log.Debug().Msgf("Replacing %s with %s", k, v)
re := regexp.MustCompile(k)
llmresult = re.ReplaceAllString(llmresult, v)
}
log.Debug().Msgf("LLM result(processed): %s", llmresult)
return llmresult
}
func ParseTextContent(llmresult string, functionConfig FunctionsConfig) string {
log.Debug().Msgf("ParseTextContent: %s", llmresult)
log.Debug().Msgf("CaptureLLMResult: %s", functionConfig.CaptureLLMResult)
for _, r := range functionConfig.CaptureLLMResult {
// We use a regex to extract the JSON object from the response
var respRegex = regexp.MustCompile(r)
match := respRegex.FindStringSubmatch(llmresult)
if len(match) >= 1 {
m := strings.TrimSpace(match[1])
return m
}
}
return ""
}
func ParseFunctionCall(llmresult string, functionConfig FunctionsConfig) []FuncCallResults {
log.Debug().Msgf("LLM result: %s", llmresult)
for _, item := range functionConfig.ReplaceFunctionResults {
k, v := item.Key, item.Value
log.Debug().Msgf("Replacing %s with %s", k, v)
re := regexp.MustCompile(k)
llmresult = re.ReplaceAllString(llmresult, v)
}
log.Debug().Msgf("LLM result(function cleanup): %s", llmresult)
functionNameKey := "function"
if functionConfig.FunctionName {
functionNameKey = "name"
}
results := []FuncCallResults{}
llmResults := []string{}
returnResult := func(results []string) (result []FuncCallResults, e error) {
// As we have to change the result before processing, we can't stream the answer token-by-token (yet?)
result = make([]FuncCallResults, 0)
for _, s := range results {
var ss []map[string]interface{}
s = utils.EscapeNewLines(s)
err := json.Unmarshal([]byte(s), &ss)
if err != nil {
// If the LLM result is a single object, try unmarshaling it into a single map
var singleObj map[string]interface{}
err = json.Unmarshal([]byte(s), &singleObj)
if err != nil {
log.Debug().Err(err).Str("escapedLLMResult", s).Msg("unable to unmarshal llm result in a single object or an array of JSON objects")
} else {
ss = []map[string]interface{}{singleObj}
}
}
log.Debug().Msgf("Function return: %s %+v", s, ss)
for _, s := range ss {
// The grammar defines the function name as "function", while OpenAI returns "name"
func_name, ok := s[functionNameKey]
if !ok {
continue
//return result, fmt.Errorf("unable to find function name in result")
}
// Similarly, while here arguments is a map[string]interface{}, OpenAI actually want a stringified object
args, ok := s["arguments"] // arguments needs to be a string, but we return an object from the grammar result (TODO: fix)
if !ok {
continue
//return result, fmt.Errorf("unable to find arguments in result")
}
d, _ := json.Marshal(args)
funcName, ok := func_name.(string)
if !ok {
continue
//return result, fmt.Errorf("unable to cast function name to string")
}
result = append(result, FuncCallResults{Name: funcName, Arguments: string(d)})
}
}
return result, nil
}
// the response is a string that we have to parse
result := make(map[string]string)
if len(functionConfig.JSONRegexMatch) != 0 {
for _, r := range functionConfig.JSONRegexMatch {
// We use a regex to extract the JSON object from the response
var respRegex = regexp.MustCompile(r)
match := respRegex.FindAllStringSubmatch(llmresult, -1)
var allMatches []string
for _, m := range match {
if len(m) > 1 {
// we match the first group
allMatches = append(allMatches, m[1])
}
}
if len(allMatches) > 0 {
llmResults = append(llmResults, allMatches...)
break
}
}
}
if len(functionConfig.ResponseRegex) > 0 {
// We use named regexes here to extract the function name and arguments
// obviously, this expects the LLM to be stable and return correctly formatted JSON
// TODO: optimize this and pre-compile it
for _, r := range functionConfig.ResponseRegex {
var respRegex = regexp.MustCompile(r)
matches := respRegex.FindAllStringSubmatch(llmresult, -1)
for _, match := range matches {
for i, name := range respRegex.SubexpNames() {
if i != 0 && name != "" && len(match) > i {
result[name] = match[i]
}
}
functionName := result[functionNameKey]
if functionName == "" {
return results
}
results = append(results, FuncCallResults{Name: result[functionNameKey], Arguments: result["arguments"]})
}
}
} else {
if len(llmResults) == 0 {
llmResults = append(llmResults, llmresult)
}
results, _ = returnResult(llmResults)
}
return results
}
I'd like to add a suggestion, which is to add a returns/output
field to tool calls. So the following tool call format laid out by
@interstellarninja
:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call>
would be:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>, "returns": <output-variable-list>}
</tool_call>
which enables for chained tool calling. Example user query (full prompt can be found here)
<|user_query|>
Can you send my friend (jasper3131@gmail.com) an email about the current sentiment about MMA in twitter and the stats for Sean Strickland please?
<|end_user_query|>
which results in the response (full response found here)
... (Previous part of the response including thoughts)
<|function_calls|>
[
{ "name": "get_tweets_by_hashtag", "arguments": {"hashtag": "MMA", "count": 100}, "returns": ["tweets"]},
{ "name": "get_sentiment_analysis", "arguments": {"text": "$tweets$"}, "returns": ["sentiment"]},
{ "name": "get_fighter_stats", "arguments": {"fighter_name": "Sean Strickland"}, "returns": ["stats"]},
{ "name": "construct_dict", "arguments": {"keys": ["wins", "losses", "knockouts"], "values": ["$stats$['wins']", "$stats$['losses']", "$stats$['knockouts']"]}, "returns": ["stats_dict"]},
{ "name": "turn_into_string", "arguments": {"value": "$stats_dict$"}, "returns": ["stats_string"]},
{ "name": "concatanate_strings", "arguments": {"list_of_strings": ["The current sentiment about MMA in Twitter is: $sentiment$", "The stats for Sean Strickland are: $stats_string$"]}, "returns": ["email_body"]},
{ "name": "send_email", "arguments": {"subject": "MMA Sentiment and Sean Strickland Stats", "body": "$email_body$", "recipient": "jasper3131@gmail.com"}, "returns": []}
]
<|end_function_calls|>
Ah yep and then at this point it should just be necessary to implement the functions in this file from LocalAI into javascript, python etc
package functions import ( "encoding/json" "regexp" "strings" "github.com/go-skynet/LocalAI/pkg/utils" "github.com/rs/zerolog/log" ) type GrammarConfig struct { // ParallelCalls enables the LLM to return multiple function calls in the same response ParallelCalls bool `yaml:"parallel_calls"` DisableParallelNewLines bool `yaml:"disable_parallel_new_lines"` // MixedMode enables the LLM to return strings and not only JSON objects // This is useful for models to not constraing returning only JSON and also messages back to the user MixedMode bool `yaml:"mixed_mode"` // NoMixedFreeString disables the mixed mode for free strings // In this way if the LLM selects a free string, it won't be mixed necessarly with JSON objects NoMixedFreeString bool `yaml:"no_mixed_free_string"` // NoGrammar disables the grammar parsing and parses the responses directly from the LLM NoGrammar bool `yaml:"disable"` // Prefix is the suffix to append to the grammar when being generated // This is useful when models prepend a tag before returning JSON Prefix string `yaml:"prefix"` // ExpectStringsAfterJSON enables mixed string suffix ExpectStringsAfterJSON bool `yaml:"expect_strings_after_json"` } // FunctionsConfig is the configuration for the tool/function call. // It includes setting to map the function name and arguments from the response // and, for instance, also if processing the requests with BNF grammars. type FunctionsConfig struct { // DisableNoAction disables the "no action" tool // By default we inject a tool that does nothing and is used to return an answer from the LLM DisableNoAction bool `yaml:"disable_no_action"` // Grammar is the configuration for the grammar GrammarConfig GrammarConfig `yaml:"grammar"` // NoActionFunctionName is the name of the function that does nothing. It defaults to "answer" NoActionFunctionName string `yaml:"no_action_function_name"` // NoActionDescriptionName is the name of the function that returns the description of the no action function NoActionDescriptionName string `yaml:"no_action_description_name"` // ResponseRegex is a named regex to extract the function name and arguments from the response ResponseRegex []string `yaml:"response_regex"` // JSONRegexMatch is a regex to extract the JSON object from the response JSONRegexMatch []string `yaml:"json_regex_match"` // ReplaceFunctionResults allow to replace strings in the results before parsing them ReplaceFunctionResults []ReplaceResult `yaml:"replace_function_results"` // ReplaceLLMResult allow to replace strings in the results before parsing them ReplaceLLMResult []ReplaceResult `yaml:"replace_llm_results"` // CaptureLLMResult is a regex to extract a string from the LLM response // that is used as return string when using tools. // This is useful for e.g. if the LLM outputs a reasoning and we want to get the reasoning as a string back CaptureLLMResult []string `yaml:"capture_llm_results"` // FunctionName enable the LLM to return { "name": "function_name", "arguments": { "arg1": "value1", "arg2": "value2" } } // instead of { "function": "function_name", "arguments": { "arg1": "value1", "arg2": "value2" } }. // This might be useful for certain models trained with the function name as the first token. FunctionName bool `yaml:"return_name_in_function_response"` } type ReplaceResult struct { Key string `yaml:"key"` Value string `yaml:"value"` } type FuncCallResults struct { Name string Arguments string } func (g GrammarConfig) Options() []func(o *GrammarOption) { opts := []func(o *GrammarOption){} if g.MixedMode { opts = append(opts, EnableMaybeString) } if g.ParallelCalls { opts = append(opts, EnableMaybeArray) } if g.DisableParallelNewLines { opts = append(opts, DisableParallelNewLines) } if g.Prefix != "" { opts = append(opts, SetPrefix(g.Prefix)) } if g.NoMixedFreeString { opts = append(opts, NoMixedFreeString) } if g.ExpectStringsAfterJSON { opts = append(opts, ExpectStringsAfterJSON) } return opts } func CleanupLLMResult(llmresult string, functionConfig FunctionsConfig) string { log.Debug().Msgf("LLM result: %s", llmresult) for _, item := range functionConfig.ReplaceLLMResult { k, v := item.Key, item.Value log.Debug().Msgf("Replacing %s with %s", k, v) re := regexp.MustCompile(k) llmresult = re.ReplaceAllString(llmresult, v) } log.Debug().Msgf("LLM result(processed): %s", llmresult) return llmresult } func ParseTextContent(llmresult string, functionConfig FunctionsConfig) string { log.Debug().Msgf("ParseTextContent: %s", llmresult) log.Debug().Msgf("CaptureLLMResult: %s", functionConfig.CaptureLLMResult) for _, r := range functionConfig.CaptureLLMResult { // We use a regex to extract the JSON object from the response var respRegex = regexp.MustCompile(r) match := respRegex.FindStringSubmatch(llmresult) if len(match) >= 1 { m := strings.TrimSpace(match[1]) return m } } return "" } func ParseFunctionCall(llmresult string, functionConfig FunctionsConfig) []FuncCallResults { log.Debug().Msgf("LLM result: %s", llmresult) for _, item := range functionConfig.ReplaceFunctionResults { k, v := item.Key, item.Value log.Debug().Msgf("Replacing %s with %s", k, v) re := regexp.MustCompile(k) llmresult = re.ReplaceAllString(llmresult, v) } log.Debug().Msgf("LLM result(function cleanup): %s", llmresult) functionNameKey := "function" if functionConfig.FunctionName { functionNameKey = "name" } results := []FuncCallResults{} llmResults := []string{} returnResult := func(results []string) (result []FuncCallResults, e error) { // As we have to change the result before processing, we can't stream the answer token-by-token (yet?) result = make([]FuncCallResults, 0) for _, s := range results { var ss []map[string]interface{} s = utils.EscapeNewLines(s) err := json.Unmarshal([]byte(s), &ss) if err != nil { // If the LLM result is a single object, try unmarshaling it into a single map var singleObj map[string]interface{} err = json.Unmarshal([]byte(s), &singleObj) if err != nil { log.Debug().Err(err).Str("escapedLLMResult", s).Msg("unable to unmarshal llm result in a single object or an array of JSON objects") } else { ss = []map[string]interface{}{singleObj} } } log.Debug().Msgf("Function return: %s %+v", s, ss) for _, s := range ss { // The grammar defines the function name as "function", while OpenAI returns "name" func_name, ok := s[functionNameKey] if !ok { continue //return result, fmt.Errorf("unable to find function name in result") } // Similarly, while here arguments is a map[string]interface{}, OpenAI actually want a stringified object args, ok := s["arguments"] // arguments needs to be a string, but we return an object from the grammar result (TODO: fix) if !ok { continue //return result, fmt.Errorf("unable to find arguments in result") } d, _ := json.Marshal(args) funcName, ok := func_name.(string) if !ok { continue //return result, fmt.Errorf("unable to cast function name to string") } result = append(result, FuncCallResults{Name: funcName, Arguments: string(d)}) } } return result, nil } // the response is a string that we have to parse result := make(map[string]string) if len(functionConfig.JSONRegexMatch) != 0 { for _, r := range functionConfig.JSONRegexMatch { // We use a regex to extract the JSON object from the response var respRegex = regexp.MustCompile(r) match := respRegex.FindAllStringSubmatch(llmresult, -1) var allMatches []string for _, m := range match { if len(m) > 1 { // we match the first group allMatches = append(allMatches, m[1]) } } if len(allMatches) > 0 { llmResults = append(llmResults, allMatches...) break } } } if len(functionConfig.ResponseRegex) > 0 { // We use named regexes here to extract the function name and arguments // obviously, this expects the LLM to be stable and return correctly formatted JSON // TODO: optimize this and pre-compile it for _, r := range functionConfig.ResponseRegex { var respRegex = regexp.MustCompile(r) matches := respRegex.FindAllStringSubmatch(llmresult, -1) for _, match := range matches { for i, name := range respRegex.SubexpNames() { if i != 0 && name != "" && len(match) > i { result[name] = match[i] } } functionName := result[functionNameKey] if functionName == "" { return results } results = append(results, FuncCallResults{Name: result[functionNameKey], Arguments: result["arguments"]}) } } } else { if len(llmResults) == 0 { llmResults = append(llmResults, llmresult) } results, _ = returnResult(llmResults) } return results }
@mudler any thoughts on standardizing the yaml config file and go parser script generalizable across open-source function calling models such that it can be made part of the huggingface ecosystem?
This should probably move into another issue at this point, by the way - if we have a big debate about reverse templates and YAML here it might make it hard to actually review/merge the tool use templates, and that should be fairly soon because support for it in Transformers is coming next week!
we were trying to implement Hermes format with llama-cpp-python using a chatml-function-calling template, a while back we did chime in and it was still a WIP.
today i just tried Hermes-Theta-8B using the same template and it just seems to work. there's still slight differences in the tool-call format which i think could be fixed.
i'm sharing their template just in case it could be useful for templating work:
https://github.com/abetlen/llama-cpp-python/blob/04959f1884c8ef93bd5a4aa40ff0accb8438c0c1/llama_cpp/llama_chat_format.py#L3165
here's the function calling template they are using:
function_calling_template = (
"{% for message in messages %}"
"<|im_start|>{{ message.role }}\n"
# System message
"{% if message.role == 'system' %}"
"{{ message.content }}"
"{% if tool_calls %}"
"\n\nYou have access to the following functions:\n"
"{% for tool in tools %}"
"\nfunctions.{{ tool.function.name }}:\n"
"{{ tool.function.parameters | tojson }}"
"\n{% endfor %}"
"\n\nYou can respond to users messages with either a single message or one or more function calls."
"\n\nTo respond with a message begin the message with 'message:', use the following format:"
"\n\nmessage:"
"\n<message>"
"\n\nTo respond with one or more function calls begin the message with 'functions.<function_name>:', use the following format:"
"\n\nfunctions.<function_name>:"
'\n{ "arg1": "value1", "arg2": "value2" }'
"\nfunctions.<function_name>:"
'\n{ "arg1": "value1", "arg2": "value2" }'
"{% endif %}"
"<|im_end|>\n"
"{% endif %}"
# User message
"{% if message.role == 'user' %}"
"{{ message.content }}"
"<|im_end|>\n"
"{% endif %}"
# Assistant message
"{% if message.role == 'assistant' %}"
## Reglar message
"{% if message.content and message.content | length > 0 %}"
"{% if tool_calls %}"
"message:\n"
"{% endif %}"
"{{ message.content }}"
"<|im_end|>\n"
"{% endif %}"
## Function calls
"{% if 'tool_calls' in message %}"
"{% for tool_call in message.tool_calls %}"
"functions.{{ tool_call.function.name }}:\n"
"{{ tool_call.function.arguments }}"
"{% endfor %}"
"<|im_end|>\n"
"{% endif %}"
"{% endif %}"
"{% endfor %}"
"{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
)
please check the example implementation of Hermes function calling with llama.cpp here:
https://github.com/NousResearch/Hermes-Function-Calling/blob/main/examples/lllama-cpp-multiple-fn.ipynb
you can try it on colab here:
https://colab.research.google.com/drive/10UPSepK_cp-pvJLChW8GihM-ETJaDn1l#scrollTo=TPJ-J9TDeYIw
Hi @interstellarninja , looks interesting! Is there a reason to prefer that template to the one in this PR?
Ah I shared this as a reference for building the reference template for parsing the tool calls. The parsing logic is actually buried in the code — this template isn’t the same as Hermes btw
Ah, got it, I'm sorry!
btw, we've now merged the tool call API update to Transformers, so in theory we could merge this PR (+ related PRs to other Nous models) - is there anything you want to change or review, or do you think it's good to go?
Looks good so far — ideally a yaml file to update the system prompt would be nice for when we add additional structure to say down the line.
@interstellarninja Hm, I'm not sure the templates can easily read from YAML, but that's actually a good idea - it would be nice to have some simpler tokenizer variables that can be inserted into the template.
Can we start with merging the templates here, to add support for the new API, and I'll ping you on a Transformers PR later when we have support for exposing other variables to the template renderer?
@Rocketknight1
Hey Matt, I've tested the new tokenizer.apply_chat_template()
method with tool_use
chat_template using both functions callable list and function signatures json list.
I can confirm that the new method works reliably with Hermes Pro models. I will be merging this PR now.
<|im_start|>system
You are a bot that responds to weather and stock market queries.<|im_end|>
<|im_start|>user
Hey, what's the temperature in Paris in celsius right now? And fetch the stock price of Hermes International (RMS.PA).<|im_end|>
<|im_start|>assistant
<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call>
<tool_call>
{"arguments": {"symbol": "RMS.PA"}, "name": "get_stock_price"}
</tool_call><|im_end|>
Nice! I'll see about making similar PRs to the other models and adding README code snippets.
Hey
@interstellarninja
, other PRs are open!
NousResearch/Hermes-2-Theta-Llama-3-70B
NousResearch/Hermes-2-Pro-Llama-3-70B
NousResearch/Hermes-2-Theta-Llama-3-8B
NousResearch/Hermes-2-Pro-Mistral-7B
In all cases, I just added the new chat template and pushed the tokenizer. In some cases, this caused minor formatting changes to other config files, or added the missing tokenizer.json
to the repo - this shouldn't affect things for users, except for making it a bit faster to load those models!