Thomas Wolf PRO

thomwolf

AI & ML interests

NLP and open-source :-)

Articles

Organizations

thomwolf's activity

reacted to merve's post with ❀️ 15 days ago
view post
Post
2381
Lotus πŸͺ· is a new foundation model on monocular depth estimation ✨
Compared to previous diffusion-based MDE models, Lotus is modified for dense prediction tasks
Authors also released a model for normal prediction πŸ€—
Find everything in this collection merve/lotus-6718fb957dc1c85a47ca1210
reacted to singhsidhukuldeep's post with ❀️ 15 days ago
view post
Post
2634
If you have ~300+ GB of V-RAM, you can run Mochi from @genmo

A SOTA model that dramatically closes the gap between closed and open video generation models.

Mochi 1 introduces revolutionary architecture featuring joint reasoning over 44,520 video tokens with full 3D attention. The model implements extended learnable rotary positional embeddings (RoPE) in three dimensions, with network-learned mixing frequencies for space and time axes.

The model incorporates cutting-edge improvements, including:
- SwiGLU feedforward layers
- Query-key normalization for enhanced stability
- Sandwich normalization for controlled internal activations

What is currently available?
The base model delivers impressive 480p video generation with exceptional motion quality and prompt adherence. Released under the Apache 2.0 license, it's freely available for both personal and commercial applications.

What's Coming?
Genmo has announced Mochi 1 HD, scheduled for release later this year, which will feature:
- Enhanced 720p resolution
- Improved motion fidelity
- Better handling of complex scene warping
  • 2 replies
Β·
reacted to fdaudens's post with ❀️ 15 days ago
posted an update 15 days ago
view post
Post
3782
Parents in the 1990: Teach the kids to code
Parents now: Teach the kids to fix the code when it starts walking around πŸ€–βœ¨
  • 2 replies
Β·
reacted to singhsidhukuldeep's post with πŸ”₯ about 2 months ago
view post
Post
1212
Remember when @Google launched MediaPipe in an effort to create efficient on-device pipelines?

They've just unlocked the ability to run 7B+ parameter language models directly in your browser. This is a game-changer for on-device AI!

Yes, they are streaming 8.6 GB model files!

Currently, they have Gemma 2B/7B running, but imagine Dynamic LoRA, multimodal support, quantization, and you never leaving Chrome!

This is a significant technical advancement, especially in Memory Optimization:

- Redesigned the model-loading code to work around WebAssembly's 4 GB memory limit.
- Implemented asynchronous loading of transformer stack layers (28 for Gemma 1.1 7B).
- Reduced peak WebAssembly memory usage to less than 1% of previous requirements.

Cross-Platform Compatibility
- Compiled the C++ codebase to WebAssembly for broad browser support.
- Utilized the WebGPU API for native GPU acceleration in browsers.

Here's why this matters:

1. Privacy: No need to send data to remote servers.
2. Cost-Efficiency: Eliminates server expenses.
3. Offline Capabilities: Use powerful AI without an internet connection.

Blog: https://research.google/blog/unlocking-7b-language-models-in-your-browser-a-deep-dive-with-google-ai-edges-mediapipe/
reacted to alex-abb's post with πŸ‘πŸ”₯ 5 months ago
view post
Post
4751
Hi everyone!
I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.

alex-abb/LLM_Feeling_Analyzer
Β·
reacted to fdaudens's post with ❀️ 5 months ago
view post
Post
3410
A nice improvement for Hugging Face on Sheets: You can now customize your prompt and select the model of your choice directly on the sheet.

Thanks to @louisbrulenaudet for the contribution. Really cool to see the community improving this tool!

Try it here: JournalistsonHF/huggingface-on-sheets
reacted to yunusserhat's post with πŸš€ 5 months ago
view post
Post
3112
Hello everyone,

I am pleased to announce that I have founded the University of Glasgow organization on Huggingface. If you are affiliated with the University of Glasgow or have a relative who is, you can log in through the relevant link.

https://huggingface.co/UniversityofGlasgow
  • 1 reply
Β·
reacted to KnutJaegersberg's post with πŸ‘ 5 months ago
reacted to frimelle's post with β€οΈπŸ€— 5 months ago
view post
Post
1836
Wikimedia and Hugging Face seem kind of naturally complementary: Both are community-centred, value openness and consent. That's why I'd love to see more Wikipedia and other Wikimedia projects' datasets on Hugging Face to advance machine learning with diverse, community-curated data! See my new article on the Hugging Face hub for why and how to create more Wikimedia datasets on Hugging Face: https://huggingface.co/blog/frimelle/wikipedias-treasure-trove-ml-data
reacted to Salama1429's post with πŸ˜Žβ€οΈπŸš€πŸ”₯ 5 months ago
view post
Post
2417
πŸ“Ί Introducing the YouTube-Commons Dataset πŸ“Ί

🌐 Overview: The YouTube Commons Dataset is a comprehensive collection of 30 billion words from 15,112,121 original and automatically translated transcripts, drawn from 2,063,066 videos on YouTube.

πŸ”— License: All videos are shared under the CC-BY license, with the majority (71%) in English.

πŸ€– Applications: This dataset is ideal for training powerful AI models for converting speech to text (ASR) and translation models.

πŸ“Š Utilization: The text can be used for model training and is republishable for reproducibility purposes.

🀝 Collaboration: This dataset is the result of a collaboration between state start-up LANGU:IA, the French Ministry of Culture, and DINUM. It will be expanded in the coming months.

πŸ”— Explore the dataset here: https://lnkd.in/d_paWKFE

#YouTubeCommons #AIResearch #MachineLearning #OpenData #ArtificialIntelligence #NLP #Dataset #TechCollaboration #Innovation #DigitalTransformation
posted an update 5 months ago
view post
Post
4398
[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce πŸ“šFineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1
  • 1 reply
Β·
reacted to zolicsaki's post with πŸ‘€ 6 months ago
view post
Post
886
SambaNova just released a revolutionary paper about how the SN40L AI chip can host many LLMs on a single node and run inference so efficiently that it enables running a "composition of experts." These experts can be interconnected via a router, resulting in remarkable accuracy. This method allows you to take open source expert models from HuggingFace and continuously build and integrate them into a composition of experts.

I am also super excited about the possibilities that SN40Ls unlock for LLM agent workflows and pipelined calls. With the release of GPT4o, it seems that monolithic LLMs are starting to reach a plateau, and I believe that the next wave of AI will be driven by pipelined LLM calls and agent workflows. Most pipelined LLM workflows are bottlenecked by prohibitively expensive compute and high latency, but the SN40L provides a one stop shop solution for this. We need to get the word out to the community that this hardware exists, because it will open up a realm of possibilities that developers working with Nvidia hardware did not know exist.

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts (2405.07518)
reacted to dhruvabansal's post with πŸ€―πŸ‘ 6 months ago
view post
Post
1637
πŸš€ Introducing RefuelLLM-2 and RefuelLLM-2-small, the next version of our large language models purpose built for data labeling, enrichment and cleaning.

RefuelLLM-2 (83.82%) outperforms all state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%) and Gemini-1.5-Pro (74.59%), across a benchmark of ~30 data labeling tasks.

RefuelLLM-2-small (79.67%), aka Llama-3-Refueled, outperforms all comparable LLMs including Claude3-Sonnet (70.99%), Haiku (69.23%) and GPT-3.5-Turbo (68.13%).

πŸ“– Open sourcing the model weights: refuelai/Llama-3-Refueled
πŸ“ Detailed blog post: https://www.refuel.ai/blog-posts/announcing-refuel-llm-2
πŸ§ͺ Try out the model here: https://labs.refuel.ai/playground
Β·