KnutJaegersberg (Knut Jägersberg)

posted an update 2 months ago

Post

1085

appvoid/arco

arco consistently outperforms every sota model below 600m parameters on average

appvoid/arco

posted an update 3 months ago

Post

2160

Wrote a blog post with some ideas about prompt engineering

https://huggingface.co/blog/KnutJaegersberg/first-principles-prompt-engineering

posted an update 3 months ago

Post

2308

mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq

99% of the performance across various benchmarks!

mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq

posted an update 3 months ago

Post

922

neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8

Requant of the big llama, using 20% less memory

neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8

posted an update 4 months ago

Post

1362

Decensored Gemma2-27b

TheDrummer/Big-Tiger-Gemma-27B-v1

reacted to merve's post with 👍 4 months ago

Post

3274

Just shipped: introduction to vision language models (aka image-text-to-text) https://huggingface.co/tasks/image-text-to-text

Learn about more machine learning tasks at https://huggingface.co/tasks

posted an update 5 months ago

Post

637

Unsocial Intelligence: an Investigation of the Assumptions of AGI Discourse

I don't agree with some of the assertions made here, but it is an interesting paper and a good overview.

https://arxiv.org/abs/2401.13142

reacted to merve's post with ❤️ 5 months ago

Post

4319

Florence-2 is a new vision foundation model capable of a wide variety of tasks 🤯
Demo 👉🏻 gokaygokay/Florence-2
Collection 👉🏻 microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder 🤓📉
They have released fine-tuned models too, you can find them in the collection above 🤗

3 replies

·

reacted to merve's post with 🔥 5 months ago

Post

3010

Finally @CVPR2024 is here! 🩷
Have you claimed your papers and linked your models/datasets/demos?
This will increase visibility and impact of your paper 💫

To index your papers, go here
CVPR2024/CVPR2024-papers
Find your paper, click on paper page link, index the paper, then click on your name (workflow is below 👇🏻)
If you'd like to add links to your paper, go here CVPR2024/update-CVPR2024-papers
login, find your paper's id, retrieve the paper, fill in the info and submit!

replied to s3nh's post 5 months ago

Don't burn out! Lighten up again will you.

posted an update 5 months ago

Post

1551

What We Learned from a Year of Building with LLMs

It's a nice perspective outlined in here.

“When a measure becomes a target, it ceases to be a good measure.”

— Goodhart’s Law

https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/

reacted to s3nh's post with ❤️ 6 months ago

Post

GPU Poor POV: Burnout

Sometimes we do not have an energy to post about AI and new methods.
And thats totally ok, I guess.
Remember to sleep well and drink a lot of water. Have a great day :D <3

2 replies

·

replied to BramVanroy's post 8 months ago

it mixed up stuff in the output, gave weird answers. didn't have that problem with other models. maybe the update they released sovled that issue, I just never cared, given the alternatives.

reacted to BramVanroy's post with 👍 8 months ago

Post

2389

Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others.

Any prior experience that you can share or suggestions to improve throughout?

4 replies

·

replied to BramVanroy's post 8 months ago

I got some weird results, since there are a lot of other models in that performance-parameter range, I just didn't try anymore.

reacted to clefourrier's post with ❤️ 9 months ago

Post

🔥 New LLM leaderboard blog: Open Ko LLM!

One of the oldest leaderboards on the hub, it has already evaluated more than 1000 models! It uses Korean translations of MMLU, ARC, HellaSwag, TruthfulQA, and a new dataset, Korean CommonGen, about specific common sense alignement.

upstage/open-ko-llm-leaderboard

What's interesting about this leaderboard is how it drove LLM development in Korea, with on average about 4 submissions/models per day since it started!
Really looking forward to seeing similar initiatives in other languages, to help qualitative models emerge outside of "just English" (for the other 2/3rds of the world).

Read more about how the leaderboard in the intro blog: https://huggingface.co/blog/leaderboards-on-the-hub-upstage
Congrats to @Chanjun , @hunkim and the Upstage team!

reacted to macadeliccc's post with ❤️ 9 months ago

Post

Reducing perplexity in LLM's through layer selective rank reduction

Layer-Selective Rank Reduction (LASER) is a denoising method that improves reasoning by the strategic removal of higher-order components from weight matrices in the multi-layer perceptron (MLP) layers without the need for additional parameters or training data. This process leverages singular value decomposition to identify and eliminate these components. This simple, yet effective, method has shown to improve question-answering performance by up to 27.4 percentage points.

LaserRMT implements this through a process by calculating signal to noise ratio (SNR) for each layer and selectively reducing the rank of these layers.The SNR method meticulously computes the SNR by leveraging singular value decomposition (SVD) to separate the signal (higher-order components) from the noise (lower-order components) within the weight matrices of the model's layers. The SNR calculation is what determines which layers would benefit from rank reduction without compromising the models integrity.

If a layer is identified that could benefit from rank reduction, then the layer will enter an incremental process where the weight matrices are reduced and reconstructed by retaining only the singular values that surpass the threshold. In the case of laserRMT, the threshold is calculated by Marchenko-Pastur Law.

@staticmethod
    def marchenko_pastur_threshold(sigma, n, m):
        beta = n / m if n < m else m / n
        threshold = sigma * np.sqrt((1 + np.sqrt(beta))**2)
        return thr

The two primary benefits of applying this method are reducing computational overhead of large language models and simultaneously improving output quality.

Credit to @ehartford @fernandofernandes @DavidGF for laserRMT

Resources:
☄️ AutoLaser: https://colab.research.google.com/drive/11j0e-w6BfvqeFN1gUrpOqdW0vcKqfVqP?usp=sharing
laserRMT: https://github.com/cognitivecomputations/laserRMT
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction (2312.13558)

8 replies

·

replied to macadeliccc's post 9 months ago

Want this to run on CPU

replied to bwang0911's post 9 months ago

Exciting!

reacted to bwang0911's post with 👍 9 months ago

Post

We've been busy cooking up some interesting models at @jinaai , with a recent highlight being the release of our first batch of bilingual embedding models.

Internally labeled as X+EN, where X represents the target language and EN stays fixed, these models specialize in both monolingual tasks and cross-lingual retrieval tasks, crossing from X to EN.

You can find these models available on Huggingface:
1. German-English bilingual embedding: jinaai/jina-embeddings-v2-base-de
2. Chinese-English bilingual embedding: jinaai/jina-embeddings-v2-base-zh

We're also excited to announce that a Spanish bilingual embedding will be released in approximately two weeks.

Our evaluation across various MLM tasks has demonstrated that the Bilingual Backbone consistently outperforms state-of-the-art Multilingual Backbones like XLM-Roberta (given its focus on just two languages).

Despite being three times smaller than the leading multilingual models (e5-multilingual-large), our released bilingual embedding models have shown superior performance compared to e5-multilingual-large, excelling in both monolingual and cross-lingual search tasks.

Currently, we're putting the finishing touches on the technical report, which should be available on Arxiv by next week.

Looking ahead, the embedding team is gearing up for jina-embeddings-v3
with some initial groundwork already underway. Stay tuned for more updates!

1 reply

·

Knut Jägersberg

AI & ML interests

Articles

Perspectives for first principles prompt engineering

Towards actively reasoning LLM systems

Organizations

KnutJaegersberg's activity