50 70 447

Prithiv Sakthi PRO

prithivMLmods

https://huggingface.co/strangerzonehf

AI & ML interests

multi-modal, computer-vision, adapters & natural language understanding

Recent Activity

Reacted to dylanebert's post with 🚀 about 3 hours ago

Generate meshes with AI locally in Blender 📢 New open-source release meshgen, a local blender integration of LLaMa-Mesh, is open source and available now 🤗 get started here: https://github.com/huggingface/meshgen

updated a Space about 3 hours ago

prithivMLmods/FLUX-LoRA-DLC

updated a model about 3 hours ago

prithivMLmods/Marco-o1-GGUF

View all activity

Articles

Unlocking Creativity with Text-to-Image Generation: Exploring LoRA Models and Styles

Aug 8

• 13

Organizations

prithivMLmods's activity

Reacted to dylanebert's post with 🚀 about 3 hours ago

Post

Generate meshes with AI locally in Blender

📢 New open-source release

meshgen, a local blender integration of LLaMa-Mesh, is open source and available now 🤗

get started here: https://github.com/huggingface/meshgen

Reacted to andito's post with 🔥 about 11 hours ago

Post

372

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

Reacted to merve's post with 🚀 1 day ago

Post

2063

Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗

Reacted to vilarin's post with 🔥 1 day ago

Post

891

A few days ago, Blackforestlabs released FLUX.1 Tools, which has surprised everyone with its quality and effects. Now that diffusers support these features, you can easily deploy and build your own Tools.
Combined with the powerful Gradio and ZeroGPU, you can experience the Tools immediately, which is truly wonderful.
I was impressed by the Flux.1 Fill dev, so here I've built a demo for it, making it easy to use for inpainting and outpainting images.

🏄Model: black-forest-labs/FLUX.1-Fill-dev
🦖Demo: vilarin/Flux.1-Fill-dev
👏diffusers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/flux

Reacted to davidberenstein1957's post with 🔥 1 day ago

Post

1156

Let’s make a generation of amazing image-generation models

The best image generation models are trained on human preference datasets, where annotators have selected the best image from a choice of two. Unfortunately, many of these datasets are closed source so the community cannot train open models on them. Let’s change that!

The community can contribute image preferences for an open-source dataset that could be used for building AI models that convert text to image, like the flux or stable diffusion families. The dataset will be open source so everyone can use it to train models that we can all use.

Blog: https://huggingface.co/blog/burtenshaw/image-preferences

Reacted to maxiw's post with 🤗 1 day ago

Post

1602

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

2 replies

posted an update 2 days ago

Post

2465

HF Posts Receipts 🏆🚀

[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT

🥠The one thing that needs to be remembered is the 'username'.

🥠And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! 🙌

🥠[ Dataset ] : maxiw/hf-posts

.
.
.
@prithivMLmods

Reacted to Dref360's post with 🤝 2 days ago

Post

1217

New week, new #cv Gradio app for human understanding.( Dref360/human-interaction-demo) 🥳

This demo highlights when a person touches an object. For instance, it is useful to know if someone is touching a wall, a vase or a door. It works for multiple people too!

Still using nielsr/vitpose-base-simple for pose estimation, very excited to see the PR approved!

Reacted to maxiw's post with 🔥 2 days ago

Post

1058

🤖 Controlling Computers with Small Models 🤖

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1

1 reply

Reacted to victor's post with 🔥 3 days ago

Post

1832

Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer 🔥
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights 🚀.

Reacted to akhaliq's post with ❤️ 3 days ago

Post

2472

anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat

posted an update 3 days ago

Post

3865

CRISP 🔥 [ Isometric-3D-Cinematography / Isometric-3D-Obj / 3D-Kawaii / Long Toons ]

[ Flux DLC ] : prithivMLmods/FLUX-LoRA-DLC

[ Stranger Zone ] : https://huggingface.co/strangerzonehf

🎃[ Isometric 3D Cinematography ] : strangerzonehf/Flux-Isometric-3D-Cinematography
🎃[ Isometric 3D ] : strangerzonehf/Flux-Isometric-3D-LoRA
🎃[ Cute 3D Kawaii ] : strangerzonehf/Flux-Cute-3D-Kawaii-LoRA
🌚[ Long Toon 3D ] : prithivMLmods/Flux-Long-Toon-LoRA

[ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-6737118adcf2cb40d66d0c7e

[ Flux Collection ] : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

[ Flux Mix ] : prithivMLmods/Midjourney-Flux

.
.
.
@prithivMLmods

Reacted to hexgrad's post with 🔥 5 days ago

Post

1618

hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! 🔥

Read more and listen to before/after audio samples at https://hf.co/blog/hexgrad/kokoro-short-burst-upgrade

(Probably would have made that Article a Post instead, if audio could be embedded into Posts.)

2 replies

Reacted to AdinaY's post with 🔥 5 days ago

Post

874

Marco-o1🔥 an open Reasoning Models by AIDC team

Model: AIDC-AI/Marco-o1
Paper: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions (2411.14405)

✨Fine-tuned with CoT data (open-source + synthetic).
✨Expands solution space with MCTS, guided by model confidence.
✨Novel reasoning strategies & self-reflection enhance complex problem-solving.
✨Pioneers LRM in multilingual machine translation.

posted an update 5 days ago

Post

2855

Weekend Dribble 📦🍺

Adapters for Product Ad Backdrops, Smooth Polaroids, Minimalist Sketch cards, Super Blends!!

🤏Demo on: prithivMLmods/FLUX-LoRA-DLC

Stranger Zones :
👉🏼{ Super Blend } : strangerzonehf/Flux-Super-Blend-LoRA

👉🏼{ Product Concept Ad } : prithivMLmods/Flux-Product-Ad-Backdrop
👉🏼{ Frosted Mock-ups } : prithivMLmods/Flux.1-Dev-Frosted-Container-LoRA
👉🏼{ Polaroid Plus } : prithivMLmods/Flux-Polaroid-Plus
👉🏼{Sketch Cards} : prithivMLmods/Flux.1-Dev-Sketch-Card-LoRA

👉Stranger Zone: https://huggingface.co/strangerzonehf

👉Flux LoRA Collections: prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

.
.
.
@prithivMLmods 🤗

Reacted to lin-tan's post with 🤗 5 days ago

Post

1384

Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security

1 reply

Reacted to merve's post with ❤️ 6 days ago

Post

2747

your hugging face profile now has your recent activities 🤗

Reacted to victor's post with 🚀 6 days ago

Post

1757

Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!

replied to their post 6 days ago

Glad 🤗

Reacted to elliesleightholm's post with 🤗 7 days ago

Post

2681

I made a beginners guide to Hugging Face Spaces 🤗 I hope it's useful to some of you :)

YouTube video: https://www.youtube.com/watch?v=xqdTFyRdtjQ

Blog: https://www.marqo.ai/blog/how-to-create-a-hugging-face-space

8 replies

Prithiv Sakthi PRO

AI & ML interests

Recent Activity

Articles

GRID-6X : Layout for Seamless Image Assembly

Flux1.1 [pro] Ultra : Endpoint by BFL ⛵

Create Dynamic Typed Videos with 'Type Byte🐧'

Unlocking Creativity with Text-to-Image Generation: Exploring LoRA Models and Styles

Organizations

prithivMLmods's activity