InterleavedBench (EMNLP'24 Main Conference)

This is the official huggingface repo for the paper "Holistic Evaluation for Interleaved Text-and-Image Generation" accepted in EMNLP 2024 Main Conference.

Paper: https://arxiv.org/abs/2406.14643

Website: https://vt-nlp.github.io/InterleavedEval/

How to use InterleavedBench

Repo hierarchy

  • interleaved_bench.json is the main json file of the dataset.
  • zipped_images is the directory of zipped images for each subset, including the images for the context and ground truths.
  • src/interleavedeval_gpt4o.py is the python script for InterleavedEval with GPT-4o. Its input is the model prediction file.

To get started

  • unzip the images files under zipped_images.
  • Run the inference on interleaved_bench.json with your model and get your model output (including text and image).
  • Use the script in src/interleavedeval_gpt4o.py to perform evaluation.

Important notes

  • For image editing and subject-driven generation tasks, the scores on text-related aspects (text quality, text-image coherence) are directly set to 0. Please skip those scores when you compute the overall performance.

One example in interleaved_bench.json is as follows:

{
    "id": "wikihow_next_step_0_489157",
    "image": [
        "wiki_images_test/489157_0_0.png",
        "wiki_images_test/489157_0_1.png",
        "wiki_images_test/489157_0_2.png",
        "wiki_images_test/489157_0_3.png",
        "wiki_images_test/489157_0_4.png"
    ],
    "task_name": "wikihow_next_step",
    "conversations": [
        {
            "from": "human",
            "value": "In this task, you are given a high-level goal 'How to Make a Banana Shake': Banana shakes are a tasty way to get a lot of nutrients all at once. Bananas provide a creamy, smooth texture when turned into a drink. Bananas also fill empty stomachs, staving off hunger pangs and giving you a nice energy burst. In this article you'll find a few ways to make banana shakes, among the many possibilities. \n  You need to assist human user to complete this task via making a banana shake with kefir. Given the previous steps, you need to predict the subsequent 4 steps to help the user to finish the task. The previous steps are: \n <BEGIN>  Put 2 to 3 bananas in a bowl. <image>\n"
        },
        {
            "from": "gpt",
            "value": "Now put in a liter of kefir and a teaspoon of sugar. <image>\n Put 1 cup milk into the mix. <image>\n Using a blender, blend all ingredients together. <image>\n Relax with your fresh banana smoothie! <image>\n"
        }
    ],
    "goal": "How to Make a Banana Shake",
    "category": [
        "Food and Entertaining",
        "Drinks",
        "Smoothies Shakes and Milk",
        "Fruit Based Shakes"
    ],
    "dataset_id": "wikihow_selected_test_uni"
},

Reference

If you find our work useful or interesting, please cite:

@article{liu_holistic_2024,
  author       = {Minqian Liu and
                  Zhiyang Xu and
                  Zihao Lin and
                  Trevor Ashby and
                  Joy Rimchala and
                  Jiaxin Zhang and
                  Lifu Huang},
  title        = {Holistic Evaluation for Interleaved Text-and-Image Generation},
  journal      = {CoRR},
  volume       = {abs/2406.14643},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2406.14643},
  doi          = {10.48550/ARXIV.2406.14643},
  eprinttype    = {arXiv},
  eprint       = {2406.14643},
  timestamp    = {Tue, 16 Jul 2024 16:17:50 +0200}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .