ggcristian
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
base_model:
|
5 |
+
- openai/clip-vit-large-patch14
|
6 |
+
- microsoft/phi-2
|
7 |
+
pipeline_tag: image-classification
|
8 |
+
tags:
|
9 |
+
- emotion
|
10 |
+
- visual emotion recognition
|
11 |
+
- affective computing
|
12 |
+
- emotional classification
|
13 |
+
- metric learning
|
14 |
+
---
|
15 |
+
|
16 |
+
# TinyEmo-CLIP-Phi-2
|
17 |
+
|
18 |
+
[TinyEmo GitHub repo](https://github.com/ggcr/TinyEmo)
|
19 |
+
|
20 |
+
[Metric Projector Card] [TinyEmo MM-LLM Card]
|
21 |
+
|
22 |
+
[[Reasoning Pre-training Dataset]](https://huggingface.co/datasets/ggcristian/TinyEmo-Pretrain-525k) [[Reasoning Fine-tuning Dataset]](https://huggingface.co/datasets/ggcristian/TinyEmo-EmoReason-175k) [[Reasoning Claude Dataset]](https://huggingface.co/datasets/ggcristian/TinyEmo-EmoReasonHQ-Claude-1.4k)
|
23 |
+
|
24 |
+
TinyEmo is a family of small multi-modal language models for emotional reasoning and classification. Our
|
25 |
+
approach features: (1) a synthetic emotional instruct dataset for both pre-training and fine-tuning stages, (2) a Metric Projector
|
26 |
+
that delegates classification from the language model allowing for more efficient training and inference, (3) a multi-modal large
|
27 |
+
language model (MM-LLM) for emotional reasoning, and (4) a semi-automated framework for bias detection. TinyEmo is able to
|
28 |
+
perform emotion classification and emotional reasoning, all while using substantially fewer parameters than comparable models.
|
29 |
+
This efficiency allows us to freely incorporate more diverse emotional datasets, enabling strong performance on classification tasks,
|
30 |
+
with our smallest model (700M parameters) outperforming larger state-of-the-art models based on general-purpose MM-LLMs
|
31 |
+
with over 7B parameters. Additionally, the Metric Projector allows for interpretability and indirect bias detection in large models
|
32 |
+
without additional training, offering an approach to understand and improve AI systems.
|
33 |
+
|
34 |
+
## Installation and Requirements
|
35 |
+
|
36 |
+
1. Clone this repository and navigate to the root of the project:
|
37 |
+
```
|
38 |
+
git clone https://github.com/ggcr/TinyEmo.git
|
39 |
+
cd TinyEmo
|
40 |
+
```
|
41 |
+
|
42 |
+
2. Create an environment and install dependencies:
|
43 |
+
```
|
44 |
+
conda create -n projector_mps python=3.10 -y
|
45 |
+
conda activate projector_mps
|
46 |
+
pip install --upgrade pip # enable PEP 660 support
|
47 |
+
pip install -e projector_mps/.
|
48 |
+
```
|
49 |
+
|
50 |
+
## Quickstart
|
51 |
+
|
52 |
+
### Metric Projector inference
|
53 |
+
|
54 |
+
We provide precomputed CLIP features for the Emotion6 dataset, and you can evaluate them using two methods:
|
55 |
+
|
56 |
+
#### Our Projectors from Hugging Face
|
57 |
+
|
58 |
+
To evaluate the projectors from Hugging Face, use the [scripts/eval.sh](https://github.com/ggcr/TinyEmo/blob/main/projector_mps/scripts/eval.sh) script:
|
59 |
+
|
60 |
+
```bash
|
61 |
+
conda activate projector_mps
|
62 |
+
bash projector_mps/scripts/eval.sh
|
63 |
+
```
|
64 |
+
|
65 |
+
Below is a table of the available projectors:
|
66 |
+
|
67 |
+
| Model Architecture | Parameters | Zero-shot Accuracy | HuggingFace Link |
|
68 |
+
|----------------------------------------| ---------- |--------------------|----------------------------------------------------------------------|
|
69 |
+
| CLIP ViT-L/14 + OpenELM-270M-I | 0.70B | 57.87% | [HF Projector 0.70B Link](https://huggingface.co/ggcristian/TinyEmo-CLIP-OpenELM-270M) |
|
70 |
+
| CLIP ViT-L/14 + OpenELM-450M-I | 0.88B | 55.24% | [HF Projector 0.88B Link](https://huggingface.co/ggcristian/TinyEmo-CLIP-OpenELM-450M) |
|
71 |
+
| CLIP ViT-L/14 + TinyLLaMA 1.1 | 1.53B | 56.13% | [HF Projector 1.53B Link](https://huggingface.co/ggcristian/TinyEmo-CLIP-TinyLlama-1_1-Syn) |
|
72 |
+
| CLIP ViT-L/14 + Microsoft Phi 2 | 3.21B | 56.28% | [HF Projector 3.21B Link](https://huggingface.co/ggcristian/TinyEmo-CLIP-Phi-2) |
|
73 |
+
|
74 |
+
#### Custom Projectors with Local Weights
|
75 |
+
|
76 |
+
To use custom local weights or models, run the following:
|
77 |
+
|
78 |
+
```bash
|
79 |
+
conda activate projector_mps
|
80 |
+
bash projector_mps/scripts/eval_custom.sh
|
81 |
+
```
|
82 |
+
|
83 |
+
This allows you to specify different vision encoders, language models, and loss functions, as well as use your own projector weights.
|
84 |
+
|
85 |
+
|
86 |
+
## Acknowledgement
|
87 |
+
|
88 |
+
The Metric Projector was built from the foundations of [CLIP-E](https://arxiv.org/abs/2310.12062) paper!
|
89 |
+
|
90 |
+
Our codebase for the MM-LLM is forked from the [TinyLLaVA](https://github.com/TinyLLaVA/TinyLLaVA_Factory) project.
|
91 |
+
|
92 |
+
## Citation
|
93 |
+
|
94 |
+
```
|
95 |
+
@mastersthesis{gutierrez2024tinyemo,
|
96 |
+
title = {TinyEmo: Scaling down Emotional Reasoning via Metric Projection},
|
97 |
+
author = {Cristian Gutierrez},
|
98 |
+
year = 2024,
|
99 |
+
month = {September},
|
100 |
+
address = {Barcelona, Spain},
|
101 |
+
school = {Universitat Autònoma de Barcelona (UAB)},
|
102 |
+
type = {Master's thesis}
|
103 |
+
}
|
104 |
+
```
|