File size: 4,578 Bytes
186701e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
# Dataset preparation and description
## DOTA Dataset
### Download dataset
The DOTA dataset can be downloaded from [DOTA](https://captain-whu.github.io/DOTA/dataset.html)
or [OpenDataLab](https://opendatalab.org.cn/DOTA_V1.0).
We recommend using [OpenDataLab](https://opendatalab.org.cn/DOTA_V1.0) to download the dataset, as the folder structure has already been arranged as needed and can be directly extracted without the need to adjust the folder structure.
Please unzip the file and place it in the following structure.
```none
${DATA_ROOT}
βββ train
β βββ images
β β βββ P0000.png
β β βββ ...
β βββ labelTxt-v1.0
β β βββ labelTxt
β β β βββ P0000.txt
β β β βββ ...
β β βββ trainset_reclabelTxt
β β β βββ P0000.txt
β β β βββ ...
βββ val
β βββ images
β β βββ P0003.png
β β βββ ...
β βββ labelTxt-v1.0
β β βββ labelTxt
β β β βββ P0003.txt
β β β βββ ...
β β βββ valset_reclabelTxt
β β β βββ P0003.txt
β β β βββ ...
βββ test
β βββ images
β β βββ P0006.png
β β βββ ...
```
The folder ending with reclabelTxt stores the labels for the horizontal boxes and is not used when slicing.
### Split DOTA dataset
Script `tools/dataset_converters/dota/dota_split.py` can split and prepare DOTA dataset.
```shell
python tools/dataset_converters/dota/dota_split.py \
[--splt-config ${SPLIT_CONFIG}] \
[--data-root ${DATA_ROOT}] \
[--out-dir ${OUT_DIR}] \
[--ann-subdir ${ANN_SUBDIR}] \
[--phase ${DATASET_PHASE}] \
[--nproc ${NPROC}] \
[--save-ext ${SAVE_EXT}] \
[--overwrite]
```
shapely is required, please install shapely first by `pip install shapely`.
**Description of all parameters**οΌ
- `--split-config` : The split config for image slicing.
- `--data-root`: Root dir of DOTA dataset.
- `--out-dir`: Output dir for split result.
- `--ann-subdir`: The subdir name for annotation. Defaults to `labelTxt-v1.0`.
- `--phase`: Phase of the data set to be prepared. Defaults to `trainval test`
- `--nproc`: Number of processes. Defaults to 8.
- `--save-ext`: Extension of the saved image. Defaults to `png`
- `--overwrite`: Whether to allow overwrite if annotation folder exist.
Based on the configuration in the DOTA paper, we provide two commonly used split config.
- `./split_config/single_scale.json` means single-scale split.
- `./split_config/multi_scale.json` means multi-scale split.
DOTA dataset usually uses the trainval set for training and the test set for online evaluation, since most papers
provide the results of online evaluation. If you want to evaluate the model performance locally firstly, please split
the train set and val set.
Examples:
Split DOTA trainval set and test set with single scale.
```shell
python tools/dataset_converters/dota/dota_split.py
--split-config 'tools/dataset_converters/dota/split_config/single_scale.json'
--data-root ${DATA_ROOT} \
--out-dir ${OUT_DIR}
```
If you want to split DOTA-v1.5 dataset, which have different annotation dir 'labelTxt-v1.5'.
```shell
python tools/dataset_converters/dota/dota_split.py
--split-config 'tools/dataset_converters/dota/split_config/single_scale.json'
--data-root ${DATA_ROOT} \
--out-dir ${OUT_DIR} \
--ann-subdir 'labelTxt-v1.5'
```
If you want to split DOTA train and val set with single scale.
```shell
python tools/dataset_converters/dota/dota_split.py
--split-config 'tools/dataset_converters/dota/split_config/single_scale.json'
--data-root ${DATA_ROOT} \
--phase train val \
--out-dir ${OUT_DIR}
```
For multi scale split:
```shell
python tools/dataset_converters/dota/dota_split.py
--split-config 'tools/dataset_converters/dota/split_config/multi_scale.json'
--data-root ${DATA_ROOT} \
--out-dir ${OUT_DIR}
```
The new data structure is as follows:
```none
${OUT_DIR}
βββ trainval
β βββ images
β β βββ P0000__1024__0___0.png
β β βββ ...
β βββ annfiles
β β βββ P0000__1024__0___0.txt
β β βββ ...
βββ test
β βββ images
β β βββ P0006__1024__0___0.png
β β βββ ...
β βββ annfiles
β β βββ P0006__1024__0___0.txt
β β βββ ...
```
Then change `data_root` to ${OUT_DIR}.
|