John6666's picture
Upload 3 files
c566e3a verified

Documentation for the scripts in the scripts directory, starting with batch-caption.py, which is used to run JoyCaption in bulk. Other scripts might be added in the future.

batch-caption.py

Basic Command

To run the script, use the following command:

./batch-caption.py --glob "path/to/images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone."

This command will caption all the .jpg images in the specified directory using the provided prompt, writing .txt files alongside each image.

Command-Line Arguments

Note: You must specify either --glob or --filelist to provide images, and either --prompt or --prompt-file to provide a prompt for caption generation.

Argument Description Default
--glob Glob pattern to find images N/A
--filelist File containing a list of images N/A
--prompt Prompt to use for caption generation N/A
--prompt-file JSON file containing prompts N/A
--batch-size Batch size for image processing 1
--greedy Use greedy decoding instead of sampling False
--temperature Sampling temperature (used when not using greedy decoding) 0.6
--top-p Top-p sampling value (nucleus sampling) 0.9
--top-k Top-k sampling value None
--max-new-tokens Maximum length of the generated caption (in tokens) 256
--num-workers Number of workers loading images in parallel 4
--model Pre-trained model to use fancyfeast/llama-joycaption-alpha-two-hf-llava

Examples

  1. Caption images with a specific prompt

    ./batch-caption.py --glob "images/*.png" --prompt "Write a descriptive caption for this image in a formal tone."
    
  2. Use a JSON file for prompts

    python batch-caption.py --filelist "image_paths.txt" --prompt-file "prompts.json"
    
  3. Use Greedy Decoding

    python batch-caption.py --glob "images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone." --greedy
    

Prompt Handling

  • For a list of prompts that the model understands, please refer to the project's root README.

  • You can specify a prompt directly using the --prompt argument or use a JSON file containing a list of prompts with weights using --prompt-file.

  • If multiple prompts are specified in the prompt file, the prompt used for each image will be randomly selected.

  • Prompt File Format: The JSON file should contain either strings or objects with prompt and weight fields.

    • Weighting: The weight field indicates the probability of selecting a particular prompt during caption generation. Higher weights make a prompt more likely to be chosen. For example, if one prompt has a weight of 2.0 and another has a weight of 1.0, the first prompt will be twice as likely to be used.

Example prompts.json:

[
  { "prompt": "Describe the scene in detail.", "weight": 2.0 },
  { "prompt": "Summarize the main elements of the image.", "weight": 1.0 }
]

Output

  • Captions are saved as .txt files in the same directory as the corresponding image.
  • If a .txt caption file already exists for an image, the script will skip that image.