Inference Endpoints Changelog 🚀

Community Article Published October 11, 2024

Week 47, Nov 18 - Nov 24

Week 46, Nov 11 - Nov 17

Week 45, Nov 04 - Nov 10

Week 44, Oct 28 - Nov 03

Week 43, Oct 21-27

Week 42, Oct 14-20

Week 41, Oct 7-13

Week 47, Nov 18 - Nov 24

Unfortunately, a wave of flu has hit our team, and we needed some time to recover 🤒 No updates this week, but stay tuned for next week—we have a lot of exciting things coming up! 🔥

Week 46, Nov 11 - Nov 17

No changes this week as the team was on an off-site in Martinique! But a lot of ideas and energy cooked up for the coming week 🙌

Week 45, Nov 04 - Nov 10

This week, we have some awesome updates that are finally out 🙌

Scaling replicas based on pending requests is now in beta 🔥 Since it's in beta, things might change, but you can try it out and read more about it here
Improved analytics with a graph of the replica history
Updates to the widgets
- Fixed bug in streaming
- Conversations can now be cleared
- Submit message with cmd+enter

Week 44, Oct 28 - Nov 03

Probably the biggest update this week was a revamp to the Inference Catalogue 🔥 You can now with a one-click-deploy find a model based on:

license
price range
inference server
accelerator
and the previously existing task and search filters

Additionally:

we fixed the config for MoritzLaurer/deberta-v3-large-zeroshot-v2.0 so that you can run it on CPU as well
and also thanks to @ngxson for fixing a bug in the llama.cpp snippet

Week 43, Oct 21-27

This week you'll get a sneak peak of the upcoming autoscaling, in the form of analytics 👀

We have:

Added pending http requests to the analytics
Support for Image-Text-To-Text, aka language vision models 🔥 (llama vision has some good jokes 😅)
Improved the log pagination and added some nice visual touches
Fixed a bug related to total request count in the analytics

Week 42, Oct 14-20

This week was unfortunately slower on the user-facing updates.

Behind the scenes, we:

fixed several recommendation values for LLaMA and Qwen 2,
improved our internal analytics,
debugged issues related to weights downloading and getting 429s,
and hopefully squashed the last bugs so we can soon release the new autoscaling 🔥

Week 41, Oct 7-13

This week we had a lot of nice UI/UX improvements:

clearer error on models that are too large for any instance type, like for llama 405B 😅
better logs loading message if the endpoint isn't ready