What a week! A recap for everything you missed โ๏ธ merve/nov-22-releases-673fbbcfc1c97c4f411def07 Multimodal โจ > Mistral AI released Pixtral 124B, a gigantic open vision language model > Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU > OpenGVLab released MMPR: a new multimodal reasoning dataset > Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings > Apple released new SotA vision encoders AIMv2
LLMs ๐ฆ > AllenAI dropped a huge release of models, datasets and scripts for Tรผlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR > Jina has released embeddings-v3: new multilingual embeddings with longer context > Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning > Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs
Image Generation ๐ผ๏ธ > Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations
Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them ๐ $ pip install observers
Apple released AIMv2 ๐ a family of state-of-the-art open-set vision encoders apple/aimv2-6720fe1558d94c7805f7688c > like CLIP, but add a decoder and train on autoregression ๐คฏ > 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448 > Load and use with ๐ค transformers
It's been a while we shipped native quantization support in diffusers ๐งจ
We currently support bistandbytes as the official backend but using others like torchao is already very simple.
This post is just a reminder of what's possible:
1. Loading a model with a quantization config 2. Saving a model with quantization config 3. Loading a pre-quantized model 4. enable_model_cpu_offload() 5. Training and loading LoRAs into quantized checkpoints
For anyone who struggles with NER or information extraction with LLM.
We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla
โจ Unified 3D generation & text understanding. โจ 3D meshes as plain text for seamless LLM integration. โจ High-quality 3D outputs rivaling specialized models.
Reacted to prithivMLmods's
post with ๐ค12 days ago
OmniVision-968M: a new local VLM for edge devices, fast & small but performant ๐จ a new vision language model with 9x less image tokens, super efficient ๐ aligned with DPO for reducing hallucinations โก๏ธ Apache 2.0 license ๐ฅ
Models ๐ป Coding: Qwen team released two Qwen2.5-Coder checkpoints of 32B and 7B. Infly released OpenCoder: 1.5B and 8B coding models with instruction SFT'd versions and their datasets! ๐
๐ผ๏ธ Image/Video Gen: Alibaba vision lab released In-context LoRA -- 10 LoRA models on different themes based on Flux. Also Mochi the sota video generation model with A2.0 license now comes natively supported in diffusers ๐
๐ผ๏ธ VLMs/Multimodal: NexaAIDev released Omnivision 968M a new vision language model aligned with DPO for reducing hallucinations, also comes with GGUF ckpts ๐ Microsoft released LLM2CLIP, a new CLIP-like model with longer context window allowing complex text inputs and better search
๐ฎ AGI?: Etched released Oasis 500M, a diffusion based open world model that takes keyboard input and outputs gameplay ๐คฏ
Datasets Common Corpus: A text dataset with 2T tokens with permissive license for EN/FR on various sources: code, science, finance, culture ๐