All responses come back as "!!!!!..." repeated like 100 times

#10
by jamie-de - opened

I've got this model up on an inference endpoint, and I've tried both a curl command:

curl "https://jdx35ariibmj1auz.us-east-1.aws.endpoints.huggingface.cloud/v1/chat/completions" \
-X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{
    "model": "tgi",
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning?"
        }
    ],
    "max_tokens": 150,
    "stream": false
}'

{"object":"chat.completion","id":"","created":1729893704,"model":"/repository","system_fingerprint":"2.3.1-sha-a094729","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":205,"completion_tokens":150,"total_tokens":355}}

I'm running on the: GPU · Nvidia T4

It also seems EXTREMELY slow.

Sign up or log in to comment