Update README.md upon finalizing the dataset
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ This model represents the first practical implementation under the **Sophon** (S
|
|
16 |
|
17 |
For more details, refer to the following links: [[Paper]](https://arxiv.org/abs/2405.12972), [[Github]](https://github.com/jet-universe/sophon).
|
18 |
|
19 |
-
Try out this [[
|
20 |
|
21 |
|
22 |
## Model Details
|
@@ -34,9 +34,11 @@ Key features of the model include:
|
|
34 |
|
35 |
## Uses and Impact
|
36 |
|
|
|
|
|
37 |
The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc).
|
38 |
|
39 |
-
This model also offers insights for the future development of generic and
|
40 |
|
41 |
|
42 |
## Training Details
|
@@ -62,13 +64,13 @@ cd sophon
|
|
62 |
|
63 |
### Download dataset
|
64 |
|
65 |
-
Download the JetClass-II dataset from [[
|
66 |
The training and validation files are used in this work, while the test files are not used.
|
67 |
|
68 |
Ensure that all ROOT files are accessible from:
|
69 |
|
70 |
```bash
|
71 |
-
./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.
|
72 |
```
|
73 |
|
74 |
### Training
|
@@ -87,7 +89,7 @@ Ensure that all ROOT files are accessible from:
|
|
87 |
|
88 |
> **Note:** Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:
|
89 |
> - Enable PyTorch's DDP for parallel training, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3` (the learning rate should be scaled according to `DDP_NGPUS`).
|
90 |
-
|
91 |
|
92 |
**Step 3** (optional): Convert the model to ONNX.
|
93 |
|
|
|
16 |
|
17 |
For more details, refer to the following links: [[Paper]](https://arxiv.org/abs/2405.12972), [[Github]](https://github.com/jet-universe/sophon).
|
18 |
|
19 |
+
Try out this [[Demo on Colab]](https://colab.research.google.com/github/jet-universe/sophon/blob/main/notebooks/Interacting_with_JetClassII_and_Sophon.ipynb) to get started with the model.
|
20 |
|
21 |
|
22 |
## Model Details
|
|
|
34 |
|
35 |
## Uses and Impact
|
36 |
|
37 |
+
### Inferring Sophon model via ONNX
|
38 |
+
|
39 |
The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc).
|
40 |
|
41 |
+
This model also offers insights for the future development of generic and foundation AI models for particle physics experiments.
|
42 |
|
43 |
|
44 |
## Training Details
|
|
|
64 |
|
65 |
### Download dataset
|
66 |
|
67 |
+
Download the JetClass-II dataset from [[HuggingFace Dataset]](https://huggingface.co/datasets/jet-universe/jetclass2).
|
68 |
The training and validation files are used in this work, while the test files are not used.
|
69 |
|
70 |
Ensure that all ROOT files are accessible from:
|
71 |
|
72 |
```bash
|
73 |
+
./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.parquet
|
74 |
```
|
75 |
|
76 |
### Training
|
|
|
89 |
|
90 |
> **Note:** Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:
|
91 |
> - Enable PyTorch's DDP for parallel training, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3` (the learning rate should be scaled according to `DDP_NGPUS`).
|
92 |
+
> - Configure the number of data loader workers and the number of splits for the entire dataset. The script uses the default configuration `--num-workers 5 --data-split-num 200`, which means there are 5 workers, each responsible for processing 1/5 of the data files and reading the data synchronously; the data assigned to each worker is split into 200 parts, with each worker sequentially reading 1/200 of the total data in order.
|
93 |
|
94 |
**Step 3** (optional): Convert the model to ONNX.
|
95 |
|