kashif HF staff commited on
Commit
27e944c
1 Parent(s): 7c1915a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -1,37 +1,37 @@
1
  ---
2
  base_model: HuggingFaceTB/SmolVLM-Instruct
3
  library_name: peft
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
 
 
 
 
13
 
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
 
 
 
 
32
  - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
 
1
  ---
2
  base_model: HuggingFaceTB/SmolVLM-Instruct
3
  library_name: peft
4
+ license: apache-2.0
5
+ datasets:
6
+ - HuggingFaceH4/rlaif-v_formatted
7
+ language:
8
+ - en
9
+ pipeline_tag: image-text-to-text
10
+ tags:
11
+ - trl
12
+ - dpo
13
  ---
14
 
15
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM.png" width="800" height="auto" alt="Image description">
16
 
17
+ # SmolVLM
18
 
19
+ SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. Designed for efficiency, SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks.
20
 
21
+ ## Model Summary
22
 
23
+ - **Developed by:** Hugging Face 🤗
24
+ - **Model type:** Multi-modal model (image+text)
25
+ - **Language(s) (NLP):** English
26
+ - **License:** Apache 2.0
27
+ - **Architecture:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
28
 
29
+ ## Resources
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ - **Demo:** [SmolVLM Demo](https://huggingface.co/spaces/HuggingFaceTB/SmolVLM)
32
+ - **Blog:** [More Information Needed]
33
+ - **Technical Report:** [More Information Needed]
34
  - **Repository:** [More Information Needed]
 
 
35
 
36
  ## Uses
37