@singhsidhukuldeep on Hugging Face: "Good folks from @Microsoft Research have just released bitnet.cpp, a…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

singhsidhukuldeep

posted an update Oct 26

Post

1169

Good folks from @Microsoft Research have just released bitnet.cpp, a game-changing inference framework that achieves remarkable performance gains.

Key Technical Highlights:
- Achieves speedups of up to 6.17x on x86 CPUs and 5.07x on ARM CPUs
- Reduces energy consumption by 55.4–82.2%
- Enables running 100B parameter models at human reading speed (5–7 tokens/second) on a single CPU

Features Three Optimized Kernels:
1. I2_S: Uses 2-bit weight representation
2. TL1: Implements 4-bit index lookup tables for every two weights
3. TL2: Employs 5-bit compression for every three weights

Performance Metrics:
- Lossless inference with 100% accuracy compared to full-precision models
- Tested across model sizes from 125M to 100B parameters
- Evaluated on both Apple M2 Ultra and Intel i7-13700H processors

This breakthrough makes running large language models locally more accessible than ever, opening new possibilities for edge computing and resource-constrained environments.

deleted

Oct 26

This comment has been hidden

m-conrad-202

Oct 27

A proper link to the Microsoft page would be appreciated.

m-conrad-202

Oct 27

https://github.com/microsoft/BitNet

SerialKicked

Oct 27

•

edited Oct 27

It's kinda misleading to say they have the same accuracy as full precision. It was only tested on one very specific 0.7B parameter model over 1000 undisclosed prompts. That's kinda weak sauce for a testing environment, and wholly insufficient to make such a statement. I doubt those results will scale up this flawlessly for models on which this feature would actually be useful.

In this post