jet-universe
/

sophon

TensorBoard

ONNX

particle physics

jet tagging

Model card Files Files and versions Metrics Training metrics Community

colizz commited on Aug 16

Commit

59dc4b7

•

1 Parent(s): 632d7bb

Update README.md upon finalizing the dataset

Browse files

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ This model represents the first practical implementation under the **Sophon** (S
 For more details, refer to the following links: [[Paper]](https://arxiv.org/abs/2405.12972), [[Github]](https://github.com/jet-universe/sophon).
-Try out this [[demo on Colab]](https://colab.research.google.com/github/jet-universe/sophon/blob/main/notebooks/Interacting_with_JetClassII_and_Sophon.ipynb) to get started with the model.
 ## Model Details
@@ -34,9 +34,11 @@ Key features of the model include:
 ## Uses and Impact
 The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc).
-This model also offers insights for the future development of generic and foundational AI models for particle physics experiments.
 ## Training Details
@@ -62,13 +64,13 @@ cd sophon
 ### Download dataset
-Download the JetClass-II dataset from [[Hugging Face Dataset]]().
 The training and validation files are used in this work, while the test files are not used.
 Ensure that all ROOT files are accessible from:
 ```bash
-./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.root
 ```
 ### Training
@@ -87,7 +89,7 @@ Ensure that all ROOT files are accessible from:
  > **Note:** Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:
  > - Enable PyTorch's DDP for parallel training, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3` (the learning rate should be scaled according to `DDP_NGPUS`).
- > - Configure the number of data loader workers, the fetch step for loading each ROOT file, and the dataset split number to alleviate memory burden. Example command: `./train_sophon.sh train --num-workers 8 --fetch-step 0.02 --data-split-num 4`.
 **Step 3** (optional): Convert the model to ONNX.

 For more details, refer to the following links: [[Paper]](https://arxiv.org/abs/2405.12972), [[Github]](https://github.com/jet-universe/sophon).
+Try out this [[Demo on Colab]](https://colab.research.google.com/github/jet-universe/sophon/blob/main/notebooks/Interacting_with_JetClassII_and_Sophon.ipynb) to get started with the model.
 ## Model Details
 ## Uses and Impact
+### Inferring Sophon model via ONNX
 The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [[here]](https://github.com/jet-universe/sophon?tab=readme-ov-file#using-sophon-model-pythonc).
+This model also offers insights for the future development of generic and foundation AI models for particle physics experiments.
 ## Training Details
 ### Download dataset
+Download the JetClass-II dataset from [[HuggingFace Dataset]](https://huggingface.co/datasets/jet-universe/jetclass2).
 The training and validation files are used in this work, while the test files are not used.
 Ensure that all ROOT files are accessible from:
 ```bash
+./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.parquet
 ```
 ### Training
  > **Note:** Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:
  > - Enable PyTorch's DDP for parallel training, e.g., `CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3` (the learning rate should be scaled according to `DDP_NGPUS`).
+> - Configure the number of data loader workers and the number of splits for the entire dataset. The script uses the default configuration `--num-workers 5 --data-split-num 200`, which means there are 5 workers, each responsible for processing 1/5 of the data files and reading the data synchronously; the data assigned to each worker is split into 200 parts, with each worker sequentially reading 1/200 of the total data in order.
 **Step 3** (optional): Convert the model to ONNX.