Prithiv Sakthi PRO

prithivMLmods

AI & ML interests

multi-modal, computer-vision, adapters & natural language understanding

Recent Activity

Articles

Organizations

prithivMLmods's activity

Reacted to dylanebert's post with πŸš€ about 3 hours ago
view post
Post
60
Generate meshes with AI locally in Blender

πŸ“’ New open-source release

meshgen, a local blender integration of LLaMa-Mesh, is open source and available now πŸ€—

get started here: https://github.com/huggingface/meshgen
Reacted to andito's post with πŸ”₯ about 11 hours ago
view post
Post
372
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🀯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! πŸš€
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Reacted to merve's post with πŸš€ 1 day ago
view post
Post
2063
Small yet mighty! πŸ’«

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🀠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO πŸ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO πŸ’—
Reacted to vilarin's post with πŸ”₯ 1 day ago
view post
Post
891
A few days ago, Blackforestlabs released FLUX.1 Tools, which has surprised everyone with its quality and effects. Now that diffusers support these features, you can easily deploy and build your own Tools.
Combined with the powerful Gradio and ZeroGPU, you can experience the Tools immediately, which is truly wonderful.
I was impressed by the Flux.1 Fill dev, so here I've built a demo for it, making it easy to use for inpainting and outpainting images.

πŸ„Model: black-forest-labs/FLUX.1-Fill-dev
πŸ¦–Demo: vilarin/Flux.1-Fill-dev
πŸ‘diffusers: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/flux
Reacted to davidberenstein1957's post with πŸ”₯ 1 day ago
view post
Post
1156
Let’s make a generation of amazing image-generation models

The best image generation models are trained on human preference datasets, where annotators have selected the best image from a choice of two. Unfortunately, many of these datasets are closed source so the community cannot train open models on them. Let’s change that!

The community can contribute image preferences for an open-source dataset that could be used for building AI models that convert text to image, like the flux or stable diffusion families. The dataset will be open source so everyone can use it to train models that we can all use.

Blog: https://huggingface.co/blog/burtenshaw/image-preferences
Reacted to maxiw's post with πŸ€— 1 day ago
view post
Post
1602
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. πŸ’»

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
  • 2 replies
Β·
posted an update 2 days ago
view post
Post
2465
HF Posts Receipts πŸ†πŸš€

[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT

πŸ₯ The one thing that needs to be remembered is the 'username'.

πŸ₯ And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! πŸ™Œ

πŸ₯ [ Dataset ] : maxiw/hf-posts

.
.
.
@prithivMLmods
Reacted to Dref360's post with 🀝 2 days ago
view post
Post
1217
New week, new #cv Gradio app for human understanding.( Dref360/human-interaction-demo) πŸ₯³

This demo highlights when a person touches an object. For instance, it is useful to know if someone is touching a wall, a vase or a door. It works for multiple people too!

Still using nielsr/vitpose-base-simple for pose estimation, very excited to see the PR approved!


Reacted to maxiw's post with πŸ”₯ 2 days ago
view post
Post
1058
πŸ€– Controlling Computers with Small Models πŸ€–

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1
  • 1 reply
Β·
Reacted to victor's post with πŸ”₯ 3 days ago
view post
Post
1832
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer πŸ”₯
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights πŸš€.
Reacted to akhaliq's post with ❀️ 3 days ago
view post
Post
2472
anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat
posted an update 3 days ago
view post
Post
3865
CRISP πŸ”₯ [ Isometric-3D-Cinematography / Isometric-3D-Obj / 3D-Kawaii / Long Toons ]

[ Flux DLC ] : prithivMLmods/FLUX-LoRA-DLC

[ Stranger Zone ] : https://huggingface.co/strangerzonehf

πŸŽƒ[ Isometric 3D Cinematography ] : strangerzonehf/Flux-Isometric-3D-Cinematography
πŸŽƒ[ Isometric 3D ] : strangerzonehf/Flux-Isometric-3D-LoRA
πŸŽƒ[ Cute 3D Kawaii ] : strangerzonehf/Flux-Cute-3D-Kawaii-LoRA
🌚[ Long Toon 3D ] : prithivMLmods/Flux-Long-Toon-LoRA

[ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-6737118adcf2cb40d66d0c7e

[ Flux Collection ] : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

[ Flux Mix ] : prithivMLmods/Midjourney-Flux

.
.
.
@prithivMLmods
Reacted to hexgrad's post with πŸ”₯ 5 days ago
view post
Post
1618
hexgrad/Kokoro-TTS just got an upgrade that substantially improves TTS naturalness for short bursts while maintaining parity for longer utterances! πŸ”₯

Read more and listen to before/after audio samples at https://hf.co/blog/hexgrad/kokoro-short-burst-upgrade

(Probably would have made that Article a Post instead, if audio could be embedded into Posts.)
  • 2 replies
Β·
Reacted to AdinaY's post with πŸ”₯ 5 days ago
posted an update 5 days ago
view post
Post
2855
Weekend Dribble πŸ“¦πŸΊ

Adapters for Product Ad Backdrops, Smooth Polaroids, Minimalist Sketch cards, Super Blends!!

🀏Demo on: prithivMLmods/FLUX-LoRA-DLC

Stranger Zones :
πŸ‘‰πŸΌ{ Super Blend } : strangerzonehf/Flux-Super-Blend-LoRA

πŸ‘‰πŸΌ{ Product Concept Ad } : prithivMLmods/Flux-Product-Ad-Backdrop
πŸ‘‰πŸΌ{ Frosted Mock-ups } : prithivMLmods/Flux.1-Dev-Frosted-Container-LoRA
πŸ‘‰πŸΌ{ Polaroid Plus } : prithivMLmods/Flux-Polaroid-Plus
πŸ‘‰πŸΌ{Sketch Cards} : prithivMLmods/Flux.1-Dev-Sketch-Card-LoRA

πŸ‘‰Stranger Zone: https://huggingface.co/strangerzonehf

πŸ‘‰Flux LoRA Collections: prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

.
.
.
@prithivMLmods πŸ€—
Reacted to lin-tan's post with πŸ€— 5 days ago
view post
Post
1384
Can language models replace developers? #RepoCod says β€œNot Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 1 reply
Β·
Reacted to merve's post with ❀️ 6 days ago
view post
Post
2747
your hugging face profile now has your recent activities πŸ€—
Reacted to victor's post with πŸš€ 6 days ago
view post
Post
1757
Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!
replied to their post 6 days ago
Reacted to elliesleightholm's post with πŸ€— 7 days ago