Documentation for the scripts in the scripts
directory, starting with batch-caption.py
, which is used to run JoyCaption in bulk. Other scripts might be added in the future.
batch-caption.py
Basic Command
To run the script, use the following command:
./batch-caption.py --glob "path/to/images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone."
This command will caption all the .jpg
images in the specified directory using the provided prompt, writing .txt
files alongside each image.
Command-Line Arguments
Note: You must specify either --glob
or --filelist
to provide images, and either --prompt
or --prompt-file
to provide a prompt for caption generation.
Argument | Description | Default |
---|---|---|
--glob |
Glob pattern to find images | N/A |
--filelist |
File containing a list of images | N/A |
--prompt |
Prompt to use for caption generation | N/A |
--prompt-file |
JSON file containing prompts | N/A |
--batch-size |
Batch size for image processing | 1 |
--greedy |
Use greedy decoding instead of sampling | False |
--temperature |
Sampling temperature (used when not using greedy decoding) | 0.6 |
--top-p |
Top-p sampling value (nucleus sampling) | 0.9 |
--top-k |
Top-k sampling value | None |
--max-new-tokens |
Maximum length of the generated caption (in tokens) | 256 |
--num-workers |
Number of workers loading images in parallel | 4 |
--model |
Pre-trained model to use | fancyfeast/llama-joycaption-alpha-two-hf-llava |
Examples
Caption images with a specific prompt
./batch-caption.py --glob "images/*.png" --prompt "Write a descriptive caption for this image in a formal tone."
Use a JSON file for prompts
python batch-caption.py --filelist "image_paths.txt" --prompt-file "prompts.json"
Use Greedy Decoding
python batch-caption.py --glob "images/*.jpg" --prompt "Write a descriptive caption for this image in a formal tone." --greedy
Prompt Handling
For a list of prompts that the model understands, please refer to the project's root README.
You can specify a prompt directly using the
--prompt
argument or use a JSON file containing a list of prompts with weights using--prompt-file
.If multiple prompts are specified in the prompt file, the prompt used for each image will be randomly selected.
Prompt File Format: The JSON file should contain either strings or objects with
prompt
andweight
fields.- Weighting: The
weight
field indicates the probability of selecting a particular prompt during caption generation. Higher weights make a prompt more likely to be chosen. For example, if one prompt has a weight of 2.0 and another has a weight of 1.0, the first prompt will be twice as likely to be used.
- Weighting: The
Example prompts.json
:
[
{ "prompt": "Describe the scene in detail.", "weight": 2.0 },
{ "prompt": "Summarize the main elements of the image.", "weight": 1.0 }
]
Output
- Captions are saved as
.txt
files in the same directory as the corresponding image. - If a
.txt
caption file already exists for an image, the script will skip that image.