Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
automatedstockminingorg 
posted an update 26 days ago
Post
1757
hi everyone, i have just uploaded my first fine tuned model, but serverless inference client is'nt available, its built with transformer architecture and is just a fine tuned llama 8b instruct. does anyone know how to make serverless inference available on a model?

The Serverless Inference API was significantly degraded a few months ago, making it almost unusable unless it is labeled Warm.
The conditions under which it will be Warm are unknown, but it is safe to say that it is impossible to aim for.
If you create a Spaces with Gradio, it may work, so you can try that.
https://www.gradio.app/guides/using-hugging-face-integrations

·

You can use Modal Labs to run inference. GPU's take ~ 30 seconds to provision. Here's a quickstart on the matter: https://modal.com/docs/examples/vllm_inference

This comment has been hidden
This comment has been hidden