noobmldude (Noob)

NVIDIA just dropped NVEagle 🦅

Super impressive vision language model that comes in 7B, 13B and 13B fine-tuned on chat 💬
Model repositories: merve/nveagle-66d0705108582d73bb235c26
Try it: NVEagle/Eagle-X5-13B-Chat 💬 (works very well! 🤯)

This model essentially explores having different experts (MoE) for image encoder part of vision language model.
How? 🧐
The authors concatenate the vision encoder output tokens together, and they apply "pre-alignment" essentially fine-tune experts with frozen text encoder.

Then they freeze both experts and the decoder and just train the projection layer, and finally, they unfreeze everything for supervised fine-tuning ✨

In the paper, they explore different fusion strategies and vision encoders, extending basic CLIP encoder, and figure out simply concatenating visual tokens works well.
Rest of the architecture is quite similar to LLaVA. (see below the architecture)

Reacted to singhsidhukuldeep's post with 👍 2 months ago

Post

2501

Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1️⃣ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2️⃣ Training Pipeline:
• Started with pretrained Llama 3.1 text models
• Added image adapters and encoders
• Pretrained on large-scale noisy (image, text) pair data
• Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3️⃣ Vision Integration:
• Trained adapter weights to integrate a pre-trained image encoder
• Used cross-attention layers to feed image representations into the language model
• Preserved text-only capabilities by not updating language model parameters during adapter training

4️⃣ Post-Training Alignment:
• Multiple rounds of supervised fine-tuning (SFT)
• Rejection sampling (RS)
• Direct preference optimization (DPO)
• Synthetic data generation using Llama 3.1 for Q&A augmentation
• Reward model ranking for high-quality fine-tuning data

5️⃣ Lightweight Models:
• Used pruning and distillation techniques for 1B and 3B models
• Structured pruning from Llama 3.1 8B model
• Knowledge distillation using Llama 3.1 8B and 70B as teachers

6️⃣ Context Length:
All models support an impressive 128K token context length.

7️⃣ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!

liked a model 2 months ago

kyutai/moshika-pytorch-bf16

Updated Sep 18 • 517 • 45

upvoted a collection 3 months ago

Nemotron 4 340B

Collection

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 26 days ago • 159

upvoted 2 articles 4 months ago

Article

XetHub is joining Hugging Face!

Aug 8

• 80

Article

🪆 Introduction to Matryoshka Embedding Models

Feb 23

• 57

liked a Space 4 months ago

Sleeping

40

🌖

Tokenize Anything

upvoted a paper 5 months ago

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22 • 45

upvoted an article 5 months ago

Article

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

Jun 18

• 42

liked a model 7 months ago

microsoft/Phi-3-mini-4k-instruct

Text Generation • Updated Sep 20 • 1.44M • • 1.08k

Reacted to merve's post with ❤️ 7 months ago

Post

3841

just landed at Hugging Face Hub: community-led computer vision course 📖🤍
learn from fundamentals to details of the bleeding edge vision transformers!

1 reply

·

upvoted an article 7 months ago

Article

How to Finetune phi-3 on MacBook Pro

By

•

Apr 24

• 64

liked a model 7 months ago

bigcode/starcoder2-15b

Text Generation • Updated Jun 5 • 20.7k • • 570

Reacted to fdaudens's post with 🔥 7 months ago

Post

2409

5 interesting news stories today:

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary
- "'I think we might just have to say goodbye to finding out about the truth in a quick way,” says Sandra Wachter, a professor at the Oxford Internet Institute"
- "Synthesia uses both large language models and diffusion models to do this. Sees itself as a platform for businesses. Its bet is this: As people spend more time watching videos on YouTube and TikTok, there will be more demand for video content."
- "Synthesia’s policy is to not create avatars of people without their explicit consent. But it hasn’t been immune from abuse."
https://www.technologyreview.com/2024/04/25/1091772/new-generative-ai-avatar-deepfake-synthesia/

WIRED found thousands of ads running on Meta's social platforms promoting sexually explicit "Al girlfriend" apps.
- "Some human sex workers say the platform unfairly polices their own posts more harshly."
- "Many of the virtual women seen in ads reviewed by WIRED are lifelike—if somewhat uncanny—young, and stereotypically pornographic."
https://www.wired.com/story/ads-for-explicit-ai-girlfriends-swarming-facebook-and-instagram/

Wall Street’s Patience for a Costly A.I. Arms Race Is Waning
- "A sell-off in Meta’s stock after the company disclosed huge investments in the technology may be a sign of investor fears about tech giants’ spending."
- "The company plans to spend $35 billion to $40 billion this year — much of that on the technology."
https://www.nytimes.com/2024/04/25/business/dealbook/meta-artificial-intelligence-spending.html

Saudia Arabia Spends Big to Become an A.I. Superpower
https://www.nytimes.com/2024/04/25/technology/to-the-future-saudi-arabia-spends-big-to-become-an-ai-superpower.html

UK competition watchdog steps up scrutiny of big tech’s role in AI startups
https://www.theguardian.com/technology/2024/apr/24/uk-competition-watchdog-steps-up-scrutiny-of-big-techs-role-in-ai-startups-cma-microsoft-amazon

liked a model 7 months ago

rhasspy/piper-voices

Updated 10 days ago • 153

updated a collection 9 months ago

RAG4Code

Collection

1 item • Updated Feb 28

Noob

AI & ML interests

Recent Activity

Organizations

noobmldude's activity

Qwen/Qwen2.5-Coder-7B-Instruct

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Code Evaluation

FineVideo: behind the scenes

kyutai/moshika-pytorch-bf16

Nemotron 4 340B

XetHub is joining Hugging Face!

🪆 Introduction to Matryoshka Embedding Models

Tokenize Anything

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

microsoft/Phi-3-mini-4k-instruct

How to Finetune phi-3 on MacBook Pro

bigcode/starcoder2-15b

rhasspy/piper-voices

RAG4Code