--- library_name: transformers tags: [] --- ## Model Details ### Model Description This model was part of the Evolutionary Scale BioML Hackathon. ## Uses Used for ddG prediction for single mutation. ## How to Get Started with the Model ```python # Make sure `esm` is installed, if not use: `pip install esm` from transformers import AutoModel from esm.tokenization.sequence_tokenizer import EsmSequenceTokenizer import torch model = AutoModel.from_pretrained("hazemessam/esm3_ddg_v2", trust_remote_code=True) tokenizer = EsmSequenceTokenizer() model.eval() with torch.no_grad(): output = model(tokenized_seq1, tokenized_seq2, positions=mutation_position) ``` ## Training Details ### Training Data Training Data: https://huggingface.co/datasets/hazemessam/ddg/blob/main/S2648.csv ### Training Procedure The results listed below are the best results for each evaluation dataset, but this checkpoint is the best checkpoint based on `Ssym` evaluation dataset #### Training Hyperparameters * Scheduler: Cosine * Warmup steps: 400 * Seed: 7 * Gradient accumulation steps: 16 * Batch size: 1 * DoRA rank: 16 * DoRA alpha: 32 * Updated Layers: ["layernorm_qkv.1", "ffn.1", "ffn.3"] * DoRA bias: "none" [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated on the following: * Ssym: https://huggingface.co/datasets/hazemessam/ddg/blob/main/ssym.csv * Ssym_r: https://huggingface.co/datasets/hazemessam/ddg/blob/main/ssym_r.csv * P53: https://huggingface.co/datasets/hazemessam/ddg/blob/main/p53.csv * Myoglobin: https://huggingface.co/datasets/hazemessam/ddg/blob/main/myoglobin.csv * Myoglobin_r: https://huggingface.co/datasets/hazemessam/ddg/blob/main/myoglobin_r.csv ### Results Ssym pearson correlation: 0.85 Ssym RMSE: 0.83 Ssym_r pearson correlation: 0.85 Ssym_r RMSE: 0.83 Myoglobin pearson correlation: 0.65 Myoglobin RMSE: 0.83 Myoglobin_r pearson correlation: 0.65 Myoglobin_r RMSE: 0.84