Update Idefics instruct eval table with new merged table
Browse files
README.md
CHANGED
@@ -288,18 +288,18 @@ Similarly to the base IDEFICS models, we performed checkpoint selection to stop
|
|
288 |
|
289 |
Idefics Instruct Evaluations:
|
290 |
| Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
|
291 |
-
|
292 |
-
| 80B
|
293 |
-
|
|
294 |
-
|
|
295 |
-
|
|
296 |
-
|
|
297 |
<br>
|
298 |
-
| 9B
|
299 |
-
|
|
300 |
-
|
|
301 |
-
|
|
302 |
-
|
|
303 |
|
304 |
Fairness Evaluations:
|
305 |
| Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
|
@@ -316,22 +316,6 @@ Fairness Evaluations:
|
|
316 |
| | 16 | 96.1 | 58.9 | 41.7 |
|
317 |
| | 32 | 96.1 | 59.7 | 44.8 |
|
318 |
|
319 |
-
IDEFICS vs IDEFICS-instruct.
|
320 |
-
| Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
|
321 |
-
|:----------------------------------------|:--------|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
|
322 |
-
| Difference IDEFICS 80B Instruct vs Base | 0 | -22.7 | -8.2 | 1.9 | -9.8 | 19.7 | 25.4 | 39.5 | 11.7 | 0.4 | -1.7 | 0.5 | 6.8 | 1.2 |
|
323 |
-
| | 4 | 4.0 | 1.7 | 3.5 | -0.7 | - | 6.6 | 4.4 | -6.6 | 0.5 | -0.3 | 1.6 | -1.1 | - |
|
324 |
-
| | 8 | 3.4 | 1.8 | 2.5 | -1.3 | -4.9 | 2.5 | -0.9 | -5.9 | 0.3 | -0.2 | - | 0.8 | - |
|
325 |
-
| | 16 | 3.2 | 1.4 | 2.8 | 0.4 | -4.5 | 4.0 | 0.4 | -4.1 | - | 0.7 | - | 2.4 | - |
|
326 |
-
| | 32 | 2.9 | 1.8 | 2.6 | 1.2 | -3.0 | 6.5 | 1.0 | -2.7 | - | 2.4 | - | 3.2 | - |
|
327 |
-
| Average Difference 80B | | -1.8 | -0.3 | 2.6 | -2.0 | 1.3 | 9.0 | 8.9 | -1.5 | 0.4 | 0.2 | 1.1 | 2.4 | 1.2 |
|
328 |
-
<br>
|
329 |
-
| Difference IDEFICS 9B Instruct vs Base | 0 | 15.0 | 7.6 | 3.3 | 5.6 | 41.7 | 83.0 | 64.3 | 44.6 | 0.5 | 1.8 | 16.4 | 1.0 | 0.8 |
|
330 |
-
| | 4 | 10.8 | 3.3 | 3.4 | 2.1 | 8.2 | 35.1 | 19.6 | 15.0 | 1.0 | 1.1 | 16.4 | -1.8 | - |
|
331 |
-
| | 8 | 10.2 | 3.1 | 3.5 | 1.6 | 6.7 | 31.8 | 14.8 | 13.6 | 0.6 | 0.6 | - | -4.9 | - |
|
332 |
-
| | 16 | 9.8 | 3.3 | 3.7 | 2.3 | 2.7 | 29.1 | 12.2 | 11.4 | - | 0.7 | - | -4.6 | - |
|
333 |
-
| | 32 | 9.0 | 2.7 | 3.7 | 2.2 | 3.6 | 29.8 | 10.5 | 11.9 | - | 1.0 | - | -6.1 | - |
|
334 |
-
| Average Difference 9B | | 10.9 | 4.0 | 3.5 | 2.8 | 12.6 | 41.8 | 24.3 | 19.3 | 0.7 | 1.0 | 16.4 | -3.3 | 0.8 |
|
335 |
|
336 |
# Technical Specifications
|
337 |
|
|
|
288 |
|
289 |
Idefics Instruct Evaluations:
|
290 |
| Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
|
291 |
+
| :--------------------- | --------: | ---------------------: | ---------------------: | -----------------------: | ----------------------: | -------------------: | ---------------: | -----------------: | -----------------: | -----------------: | -------------------------: | -----------------------: | --------------------------: | ----------------------------------: |
|
292 |
+
| IDEFICS 80B Instruct | 0 | 37.4 (-22.7) | 36.9 (-8.2) | 32.9 (1.9) | 26.2 (-9.8) | 76.5 (19.7) | 117.2 (25.4) | 104.5 (39.5) | 65.3 (11.7) | 49.3 (0.4) | 58.9 (-1.7) | 69.5 (0.5) | 67.3 (6.8) | 9.2/20.0/25.0 (1.2/1.2/2.5) |
|
293 |
+
| | 4 | 67.5 (4.0) | 54.0 (1.7) | 37.8 (3.5) | 39.8 (-0.7) | 71.7 (-1.0) | 116.9 (6.6) | 104.0 (4.4) | 67.1 (-6.6) | 48.9 (0.5) | 57.5 (-0.3) | 60.5 (1.6) | 65.5 (-1.1) | - |
|
294 |
+
| | 8 | 68.1 (3.4) | 56.9 (1.8) | 38.2 (2.5) | 44.8 (-1.3) | 72.7 (-4.9) | 116.8 (2.5) | 104.8 (-0.9) | 70.7 (-5.9) | 48.2 (0.3) | 58.0 (-0.2) | - | 68.6 (0.8) | - |
|
295 |
+
| | 16 | 68.6 (3.2) | 58.2 (1.4) | 39.1 (2.8) | 48.7 (0.4) | 77.0 (-4.5) | 120.5 (4.0) | 107.4 (0.4) | 76.0 (-4.1) | - | 56.4 (0.7) | - | 70.1 (2.4) | - |
|
296 |
+
| | 32 | 68.8 (2.9) | 59.5 (1.8) | 39.3 (2.6) | 51.2 (1.2) | 79.7 (-3.0) | 123.2 (6.5) | 108.4 (1.0) | 78.4 (-2.7) | - | 54.9 (2.4) | - | 70.5 (3.2) | - |
|
297 |
<br>
|
298 |
+
| IDEFICS 9B Instruct | 0 | 65.8 (15.0) | 46.1 (7.6) | 29.2 (3.3) | 41.2 (5.6) | 67.1 (41.7) | 129.1 (83.0) | 101.1 (64.3) | 71.9 (44.6) | 49.2 (0.5) | 53.5 (1.8) | 60.6 (16.4) | 62.8 (1.0) | 5.8/20.0/18.0 (0.8/2.2/-2.8)|
|
299 |
+
| | 4 | 66.2 (10.8) | 48.7 (3.3) | 31.0 (3.4) | 39.0 (2.1) | 68.2 (8.2) | 128.2 (35.1) | 100.9 (19.6) | 74.8 (15.0) | 48.9 (1.0) | 51.8 (1.1) | 53.8 (16.4) | 60.6 (-1.8) | - |
|
300 |
+
| | 8 | 66.5 (10.2) | 50.8 (3.1) | 31.0 (3.5) | 41.9 (1.6) | 70.0 (6.7) | 128.8 (31.8) | 101.5 (14.8) | 75.5 (13.6) | 48.2 (0.6) | 51.7 (0.6) | - | 61.3 (-4.9) | - |
|
301 |
+
| | 16 | 66.8 (9.8) | 51.7 (3.3) | 31.6 (3.7) | 44.8 (2.3) | 70.2 (2.7) | 128.8 (29.1) | 101.5 (12.2) | 75.8 (11.4) | - | 51.7 (0.7) | - | 63.3 (-4.6) | - |
|
302 |
+
| | 32 | 66.9 (9.0) | 52.3 (2.7) | 32.0 (3.7) | 46.0 (2.2) | 71.7 (3.6) | 127.8 (29.8) | 101.0 (10.5) | 76.3 (11.9) | - | 50.8 (1.0) | - | 60.9 (-6.1) | - |
|
303 |
|
304 |
Fairness Evaluations:
|
305 |
| Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
|
|
|
316 |
| | 16 | 96.1 | 58.9 | 41.7 |
|
317 |
| | 32 | 96.1 | 59.7 | 44.8 |
|
318 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
319 |
|
320 |
# Technical Specifications
|
321 |
|