157 14 29

Nathan Habib

SaylorTwift

AI & ML interests

None yet

Recent Activity

Reacted to elliesleightholm's post with 🤗 7 days ago

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :) YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

posted an update 7 days ago

How do I test an LLM for my unique needs? If you work in finance, law, or medicine, generic benchmarks are not enough. This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models. https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

Reacted to Symbol-LLM's post with 🔥 7 days ago

🥳 Thrilled to introduce our recent efforts on bootstrapping VLMs for multi-modal chain-of-thought reasoning ! 📕 Title: Vision-Language Models Can Self-Improve Reasoning via Reflection 🔗 Link: https://huggingface.co/papers/2411.00855 😇Takeaways: - We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing. - Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !

View all activity

Articles

Open LLM Leaderboard: DROP deep dive

Dec 1, 2023

• 4

What's going on with the Open LLM Leaderboard?

Jun 23, 2023

• 21

Organizations

SaylorTwift's activity

Reacted to elliesleightholm's post with 🤗 7 days ago

Post

2681

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :)

YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ

Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

8 replies

posted an update 7 days ago

Post

331

How do I test an LLM for my unique needs?
If you work in finance, law, or medicine, generic benchmarks are not enough.
This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.

https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md

Reacted to Symbol-LLM's post with 🔥 7 days ago

Post

890

🥳 Thrilled to introduce our recent efforts on bootstrapping VLMs for multi-modal chain-of-thought reasoning !

📕 Title: Vision-Language Models Can Self-Improve Reasoning via Reflection

🔗 Link: Vision-Language Models Can Self-Improve Reasoning via Reflection (2411.00855)

😇Takeaways:

- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.

- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !

Reacted to cfahlgren1's post with ❤️ 7 days ago

Post

2900

You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned

1 reply

updated a dataset 23 days ago

open-llm-leaderboard/contents

Viewer • Updated 33 minutes ago • 2.06k • 3.91k • 4

updated a collection 23 days ago

Open LLM Leaderboard best models ❤️‍🔥

Collection

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 60 items • Updated 33 minutes ago • 446