OPI-PG
/

Qra-1b

@@ -24,7 +24,7 @@ The final distribution of documents by topic is shown in the chart below:
 ## Model details
 The models were trained for one epoch on sequences of 4096 tokens. During training, we used many modern optimizations such as:
-- [torch.compile](pytorch.org/docs/stable/generated/torch.compile.html)
 - [adamw_apex_fused](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#optimizer-choice) optimizer
 - [Flash Attention 2](github.com/Dao-AILab/flash-attention)
 - [Mixed precision](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#bf16) (`--bf16` and `--tf32` options)

 ## Model details
 The models were trained for one epoch on sequences of 4096 tokens. During training, we used many modern optimizations such as:
+- [torch.compile](https://pytorch.org/docs/stable/generated/torch.compile.html)
 - [adamw_apex_fused](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#optimizer-choice) optimizer
 - [Flash Attention 2](github.com/Dao-AILab/flash-attention)
 - [Mixed precision](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_one#bf16) (`--bf16` and `--tf32` options)