--- datasets: - Phando/uspto-50k metrics: - accuracy pipeline_tag: text-classification tags: - chemistry license: mit --- This [ChemBERTa-v2](https://huggingface.co/seyonec/ChemBERTa_zinc250k_v2_40k) checkpoint was fine-tuned on the [USPTO-50k](https://huggingface.co/datasets/Phando/uspto-50k) dataset for sequence classification. Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by "."). - Train/Test split: 0.99/0.01 - Evaluation results: - Accuracy: 87.11% - Loss: 0.4272 - Fine-tuning hyperparameters: - seed = 233 - batch-size = 128 - num_epochs = 5 (but early stopped at epoch 4) - learning_rate = 5e-4 - warmup_steps = 64 - weight_decay = 0.01 - lr_scheduler_type = "cosine"