gyrojeff commited on
Commit
911fce3
1 Parent(s): 3daa9d7

doc: add broken detection

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -44,6 +44,18 @@ The generated dataset will be saved in the `dataset/font_img` directory.
44
 
45
  Note that `batch_generate_script_cmd_32.bat` and `batch_generate_script_cmd_64.bat` are batch scripts for Windows that can be used to generate the dataset in parallel with 32 partitions and 64 partitions.
46
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ### (Optional) Linux Cluster Generation Walkthrough
48
 
49
  If you would like to run the generation script on linux clusters, we also provides the environment setup script `linux_venv_setup.sh`.
 
44
 
45
  Note that `batch_generate_script_cmd_32.bat` and `batch_generate_script_cmd_64.bat` are batch scripts for Windows that can be used to generate the dataset in parallel with 32 partitions and 64 partitions.
46
 
47
+ ### Final Check
48
+
49
+ Since the task might be terminated unexpectedly or deliberately by user. The script has a caching mechanism to avoid re-generating the same image.
50
+
51
+ In this case, the script might not be able to detect corruption in cache (might be caused by terminating when writing to files) during this task, thus we also provides a script checking the generated dataset and remove the corrupted images and labels.
52
+
53
+ ```bash
54
+ python font_ds_detect_broken.py
55
+ ```
56
+
57
+ After running the script, you might want to rerun the generation script to fill up the holes of the removed corrupted files.
58
+
59
  ### (Optional) Linux Cluster Generation Walkthrough
60
 
61
  If you would like to run the generation script on linux clusters, we also provides the environment setup script `linux_venv_setup.sh`.