Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeep 
posted an update Oct 26
Post
1169
Good folks from @Microsoft Research have just released bitnet.cpp, a game-changing inference framework that achieves remarkable performance gains.

Key Technical Highlights:
- Achieves speedups of up to 6.17x on x86 CPUs and 5.07x on ARM CPUs
- Reduces energy consumption by 55.4–82.2%
- Enables running 100B parameter models at human reading speed (5–7 tokens/second) on a single CPU

Features Three Optimized Kernels:
1. I2_S: Uses 2-bit weight representation
2. TL1: Implements 4-bit index lookup tables for every two weights
3. TL2: Employs 5-bit compression for every three weights

Performance Metrics:
- Lossless inference with 100% accuracy compared to full-precision models
- Tested across model sizes from 125M to 100B parameters
- Evaluated on both Apple M2 Ultra and Intel i7-13700H processors

This breakthrough makes running large language models locally more accessible than ever, opening new possibilities for edge computing and resource-constrained environments.
deleted
This comment has been hidden

A proper link to the Microsoft page would be appreciated.

It's kinda misleading to say they have the same accuracy as full precision. It was only tested on one very specific 0.7B parameter model over 1000 undisclosed prompts. That's kinda weak sauce for a testing environment, and wholly insufficient to make such a statement. I doubt those results will scale up this flawlessly for models on which this feature would actually be useful.