51 4 45

Umar Butler

umarbutler

https://umarbutler.com/

umarbutler

AI & ML interests

Law, technology, AI and everything in between.

Recent Activity

replied to victor's post 3 days ago

🙋 Calling all Hugging Face users! We want to hear from YOU! What feature or improvement would make the biggest impact on Hugging Face? Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears! Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

liked a dataset about 1 month ago

MoritzLaurer/synthetic_zeroshot_mixtral_v0.1

New activity about 1 month ago

MoritzLaurer/deberta-v3-large-zeroshot-v2.0:Why not SNLI?

View all activity

Organizations

umarbutler's activity

replied to victor's post 3 days ago

semantic search for models, datasets, etc... would be awesome and is critically lacking! (eg, if you wanted to find all the legal datasets on Hugging Face, you're better off using Google. "law", "contract", "legal", etc.. you'd need to search them all and then you have stuff like umarbutler/emubert which doesn't mention law in its title but is definitely legal related).

liked a dataset about 1 month ago

MoritzLaurer/synthetic_zeroshot_mixtral_v0.1

Viewer • Updated Mar 27 • 2.63M • 194 • 7

New activity in MoritzLaurer/deberta-v3-large-zeroshot-v2.0 about 1 month ago

Why not SNLI?

#6 opened about 1 month ago by

umarbutler

liked a dataset about 1 month ago

yale-nlp/FOLIO

Viewer • Updated Dec 21, 2023 • 1.2k • 353 • 21

New activity in pietrolesci/nli_fever about 1 month ago

Premise and hypothesis wrong way around?

#2 opened 9 months ago by

MoritzLaurer

liked 2 datasets about 2 months ago

coastalcph/lex_glue

Viewer • Updated Jan 4 • 237k • 7.18k • 50

ricdomolm/lawma-tasks

Viewer • Updated Sep 14 • 692k • 9.19k • 2

liked a Space about 2 months ago

Running on CPU Upgrade

111

🏆

Open Arabic LLM Leaderboard

Track, rank and evaluate open Arabic LLMs and chatbots

New activity in nguha/legalbench about 2 months ago

Significant train/test imbalance makes this more tailored to GenAI rather than LLMs in general

#31 opened 2 months ago by

umarbutler

liked a model 2 months ago

MoritzLaurer/roberta-base-zeroshot-v2.0-c

Zero-Shot Classification • Updated Apr 4 • 17.8k • 4

Reacted to MoritzLaurer's post with 👍 2 months ago

Post

1620

Why would you fine-tune a model if you can just prompt an LLM? The new paper "What is the Role of Small Models in the LLM Era: A Survey" provides a nice pro/con overview. My go-to approach combines both:

1. Start testing an idea by prompting an LLM/VLM behind an API. It's fast and easy and I avoid wasting time on tuning a model on a task that might not make it into production anyways.

2. The LLM/VLM then needs to be manually validated. Anyone seriously considering putting AI into production has to do at least some manual validation. Setting up a good validation pipeline with a tool like Argilla is crucial and it can be reused for any future experiments. Note: you can use LLM-as-a-judge to automate some evals, but you always also need to validate the judge!

3. Based on this validation I can then (a) either just continue using the prompted LLM if it is accurate enough and it makes sense financially given my load; or (b) if the LLM is not accurate enough or too expensive to run in the long-run, I reuse the existing validation pipeline to annotate some additional data for fine-tuning a smaller model. This can be sped up by reusing & correcting synthetic data from the LLM (or just pure distillation).

Paper: https://arxiv.org/pdf/2409.06857
Argilla docs: https://docs.argilla.io/latest/
Argilla is also very easy to deploy with Hugging Face Spaces (or locally): https://huggingface.co/new-space?template=argilla%2Fargilla-template-space