Nyaribari Reuben

foscraft

AI & ML interests

LLMs, VLMs , Vision

Recent Activity

Reacted to DavidGF's post with 🔥 23 days ago

🎉 Celebrating One Year of #SauerkrautLM with Two Groundbreaking Releases! We're thrilled to announce the release of SauerkrautLM-v2-14b in two specialized versions: https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-SFT and https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO. Built on the robust Qwen2.5-14B foundation, these models represent a significant leap forward in multilingual AI capabilities. 🔬 Technical Breakthroughs: 💠 Innovative three-phase Fine-Tuning approach 💠 Two-step Spectrum SFT + one-step Spectrum DPO optimization phase for enhanced performance 💠 Balance of German and English language capabilities 💠 Advanced function calling - almost on par with Claude-3.5-Sonnet-20240620 🇩🇪 German Language Excellence: What sets this release apart is our unique achievement in simultaneously improving both German and English capabilities. Through our specialized training approach with over 1.2B tokens across two phases, we've managed to: 💠 Enhance German language understanding and generation (SFT Version > DPO Version) 💠 Maintain authentic German linguistic nuances 💠 Improve cross-lingual capabilities 💠 Preserve cultural context awareness 📊 Training Innovation: Our three-phase approach targeted specific layer percentages (15%, 20% and 25%) with carefully curated datasets, including: 💠 Mathematics-focused content (proprietary classifier-selected) 💠 High-quality German training data 💠 Specialized function calling datasets 💠 Premium multilingual content 🎁 Community Contribution: We're also releasing two new datasets in a few days: 1️⃣ SauerkrautLM-Fermented-GER-DPO: 3,300 high-quality German training samples 2️⃣ SauerkrautLM-Fermented-Irrelevance-GER-DPO: 2,000 specialized samples for optimized function call irrelevance handling Thank you to our incredible community and partners who have supported us throughout this journey. Here's to another year of AI innovation! 🚀

replied to automatedstockminingorg's post 24 days ago

hi everyone, i have trained a Qwen 14b model on a smaller dataset, but its now very tricky because i have got nowhere to use it via inference (the paid for inference on hf costs quite a lot), does anyone know of anywhere where i can deploy my model and use it via api for a reasonable cost, or ideally none. thanks

View all activity

Organizations

foscraft's activity

liked 4 models 3 months ago

liked 2 models 4 months ago

TheBloke/Llama-2-7B-Chat-GGUF

Text Generation • Updated Oct 14, 2023 • 83.1k • 440

AdamCodd/donut-receipts-extract

Image-to-Text • Updated Jun 14 • 74 • 30

liked a Space 4 months ago

Runtime error

🍡

Donut Base Finetuned Kuzushiji

liked a dataset 4 months ago

naver-clova-ix/cord-v1

Viewer • Updated Jul 14, 2022 • 1k • 451 • 11

liked a model 4 months ago

naver-clova-ix/donut-base

Image-to-Text • Updated Aug 13, 2022 • 42.1k • 178

liked a model 5 months ago

microsoft/DialoGPT-medium

Text Generation • Updated Feb 29 • 204k • 327

liked a dataset 7 months ago

HuggingFaceFW/fineweb

Viewer • Updated Jul 16 • 46B • 403k • 1.76k

liked a model about 1 year ago

spacy/en_core_web_trf

Token Classification • Updated Jun 13 • 235 • 46

liked a model over 1 year ago

databricks/dolly-v2-12b

Text Generation • Updated Jun 30, 2023 • 4.41k • 1.95k