Titus von Koeller

Titus-von-Koeller

AI & ML interests

NN Quantization, Generative AI, LLMs, alignment, algorithms for social justice, ethical humanism, mitigating gender bias, audio compression, AGI

Recent Activity

liked a model about 1 month ago

TheBloke/Qwen-7B-Chat-AWQ

View all activity

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20

• 25

Organizations

Posts 3

Post

1935

🔥 Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware!

⬇️ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank.

👩🏿‍💻 Install via pip install transformers>=4.39.0 galore-torch. #ProudlyGpuPoor

The integration of GaLore into the training of large language models (LLMs) marks a significant advancement in the field of deep learning, particularly in terms of memory efficiency and the democratization of AI research. By allowing for the training of billion-parameter models on consumer-grade hardware, reducing memory footprint in optimizer states, and leveraging advanced projection matrix techniques, GaLore opens new horizons for researchers and practitioners with limited access to high-end computational resources.

🔬 Find out more about GaLore and investigate lots of juicy technical details: https://huggingface.co/blog/galore

🤗 Huge thanks to everyone involved ❤️:

• authors: @jiaweizhao @Kyriection @beidic Zhangyang Wang @animakumar @tydsh
• community contributors: @hiyouga @mdouglas and others!
• @ybelkada for taking such swift action in composing and coordinating necessary PRs to get this live at ⚡ speed!

🏗️📈 Super rewarding to see how @timdettmers work with optimizers is being built upon to achieve even greater heights!

🚧 Actually, there are ongoing works to integrate GaLore into bitsandbytes and optimize memory efficiency even further 💪. We'll keep you posted!

Post

We just released bitsandbytes==0.43.0 📦 , with these significant new additions:

‣ 🛫 FSDP+QLoRA support (alpha release)
◦ now anyone with 2 powerful gaming GPUs can fine-tune 70B param models at home!
◦ in collab with Jeremy Howard + team @ answer.ai
◦ answer.ai blogpost: https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html
◦ example repo: https://github.com/AnswerDotAI/fsdp_qlora/

‣ 🌈⊞ Official Windows support
◦ now via simple pip install bitsandbytes>=0.43.0

‣ 📄 Huge docs update:
◦ https://huggingface.co/docs/bitsandbytes/main
◦ Be sure to check out the optimizers and the API docs
◦ ... even more upcoming ...

Under the hood there we have many other improvements, due to extensive maintenance activity, community contributions by super active + knowledgable volunteers ✨ 🚀 and the official sponsorship by Hugging Face that makes all this possible 🤗 ❤️ 🌍

We would greatly appreciate any further community contributions, be it to help with refactorings, exterminating flaky tests, writing doc-strings, tutorials, new features. Don't be shy, just contact us and we see where this leads us:
https://github.com/TimDettmers/bitsandbytes/discussions

Have a great weekend everyone!

View all posts

models

None public yet

datasets

None public yet