All responses come back as "!!!!!..." repeated like 100 times
#10
by
jamie-de
- opened
I've got this model up on an inference endpoint, and I've tried both a curl command:
curl "https://jdx35ariibmj1auz.us-east-1.aws.endpoints.huggingface.cloud/v1/chat/completions" \
-X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "tgi",
"messages": [
{
"role": "user",
"content": "What is deep learning?"
}
],
"max_tokens": 150,
"stream": false
}'
{"object":"chat.completion","id":"","created":1729893704,"model":"/repository","system_fingerprint":"2.3.1-sha-a094729","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":205,"completion_tokens":150,"total_tokens":355}}
I'm running on the: GPU · Nvidia T4
It also seems EXTREMELY slow.