Extracts a subset of COCO
The training dataset of the COCO2017 dataset includes 118K images, and the validation set includes 5K images, which is a relatively large dataset. Loading JSON in debugging or quick verification scenarios will consume more resources and bring slower startup speed.
The extract_subcoco.py
script provides the ability to extract a specified number/classes/area-size of images. The user can use the --num-img
, --classes
, --area-size
parameter to get a COCO subset of the specified condition of images.
For example, extract images use scripts as follows:
python tools/misc/extract_subcoco.py \
${ROOT} \
${OUT_DIR} \
--num-img 20 \
--classes cat dog person \
--area-size small
It gone be extract 20 images, and only includes annotations which belongs to cat(or dog/person) and bbox area size is small, after filter by class and area size, the empty annotation images won't be chosen, guarantee the images be extracted definitely has annotation info.
Currently, only support COCO2017. In the future will support user-defined datasets of standard coco JSON format.
The root path folder format is as follows:
βββ root
β βββ annotations
β βββ train2017
β βββ val2017
β βββ test2017
- Extract 10 training images and 10 validation images using only 5K validation sets.
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 10
- Extract 20 training images using the training set and 20 validation images using the validation set.
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-training-set
- Set the global seed to 1. The default is no setting.
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-training-set --seed 1
- Extract images by specify classes
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --classes cat dog person
- Extract images by specify anchor size
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --area-size small