Let's add system requirements to models.
Itโd be super helpful to require or at least suggest that AI models include system requirements, just like other software. Minimum and recommended specs, especially for inference with Hugging Face libraries, would make things easier. Hardware info is often hard to find, and not everyone has access to H100 clusters. Setting this as a standard would make models way more accessible.
discussion here
ลael Al-Halawani
ljhwild
AI & ML interests
None yet
Recent Activity
New activity
about 1 month ago
jainr3/diffusiondb-pixelart:Poor quality
updated
a model
about 2 months ago
ljhwild/madlad400-3b-mt-Q6_K-GGUF
Organizations
None yet
ljhwild's activity
Poor quality
#1 opened about 1 month ago
by
ljhwild
I used to be so pro open source AI, until I saw what china is doing with open source AI. I'm sorry but the risk is just too great. If we can't cut them out, we shouldn't share things out in the open.
reacted to
m-ric's
post with โค๏ธ
about 2 months ago
Post
2268
๐ฅ ๐-๐๐ฎ๐ฅ: ๐๐๐๐ข๐ญ๐ข๐จ๐ง-๐๐ง๐ฅ๐ฒ ๐๐ฎ๐ฅ๐ญ๐ข๐ฉ๐ฅ๐ข๐๐๐ญ๐ข๐จ๐ง ๐๐๐ง ๐ฌ๐ฅ๐๐ฌ๐ก ๐๐จ๐ฆ๐ฉ๐ฎ๐ญ๐๐ญ๐ข๐จ๐ง๐๐ฅ ๐๐จ๐ฌ๐ญ๐ฌ ๐๐ฒ ๐๐%!
Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications.
๐ก Quick reminder on how floats are coded on 8 bits (FP8):
In the e4m3 FP8 standard, you encode a number as:
Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits)
Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625)
Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3):
โก๏ธย You get (1 + 0.625) ร 2^(8-7) = 3.25
Now back to the paper. ๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
โก๏ธ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)!
๐งฎ Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) ยท 2^xe ยท (1 + ym) ยท 2^ye = (1 + xm + ym + xm ยท ym) ยท 2^(xe+ye)
๐ก L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) ยท 2^(xe+ye), eliminating the costly xm ยท ym term
๐ง l(m) term is adaptively set based on mantissa size for optimal accuracy
๐ Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision
๐ฌ Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%."
This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but itโs a really exciting path for more sustainable AI! ๐ฑ
Read the paper here ๐ย Addition is All You Need for Energy-efficient Language Models (2410.00907)
Microsoft researchers dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm approximates floating-point multiplication using energy-efficient integer addition instead of costly multiplications.
๐ก Quick reminder on how floats are coded on 8 bits (FP8):
In the e4m3 FP8 standard, you encode a number as:
Sign (1 bit) | Exponent (4 bits) | Mantissa (3 bits)
Example: 0 (positive) | 1000 (8) | 101 (1/2 + 1/8 = 0.625)
Calculation: you add one to the mantissa, and multiply it by 2 power (the exponent - a bias term which is 7 for e4m3):
โก๏ธย You get (1 + 0.625) ร 2^(8-7) = 3.25
Now back to the paper. ๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
โก๏ธ Multiplication is extremely energy-intensive compared to addition. For 32-bit operations, multiplication (3.7 pJ) uses 37x more energy than addition (0.1 pJ)!
๐งฎ Traditional floating-point multiplication go like (noting xm the mantissa and xe the exponent): Mul(x,y) = (1 + xm) ยท 2^xe ยท (1 + ym) ยท 2^ye = (1 + xm + ym + xm ยท ym) ยท 2^(xe+ye)
๐ก L-Mul cleverly approximates this as: L-Mul(x,y) = (1 + xm + ym + 2^-l(m)) ยท 2^(xe+ye), eliminating the costly xm ยท ym term
๐ง l(m) term is adaptively set based on mantissa size for optimal accuracy
๐ Benchmarks on the Llama-3.1-8B-Instruct model show L-Mul preserves precision across various NLP tasks, with performance nearly identical to full BFloat16 precision
๐ฌ Authors claim: "We can achieve the same model inference performance while reducing the energy cost of attention computations by 80%."
This breakthrough is still theoretical and would need implementation on dedicated hardware to confirm real-world gains, but itโs a really exciting path for more sustainable AI! ๐ฑ
Read the paper here ๐ย Addition is All You Need for Energy-efficient Language Models (2410.00907)
upvoted
a
collection
2 months ago
Update README.md
#10 opened 3 months ago
by
ljhwild
Support for longrope implementation in llama.cpp
2
#88 opened 5 months ago
by
ManniX-ITA
Updated the base model url to the correct one
#5 opened 3 months ago
by
ljhwild
upvoted
an
article
5 months ago
Article
Fine Tuning TinyLlama for Text Generation with TRL
By
โข
โข
10upvoted
a
paper
5 months ago
Can we run this in FP16 instead of FP32 ?
6
#3 opened 11 months ago
by
vince62s
IndexError: index out of range in self
4
#2 opened 5 months ago
by
ljhwild
Half precision
1
#1 opened 6 months ago
by
ljhwild