Inquiry on ProtGPT2 Model Performance and Fine-Tuning Evaluation
Hi Noelia,
I'd like to thank you with your work on ProtGPT2 and its application in protein design.
It's a notable contribution to the field.
I am currently engaged in a project where I am utilizing ProtGPT2, and I have a couple of queries:
Could you share the accuracy and loss metrics during both the training and evaluation phases of ProtGPT2? This information is vital for me to benchmark my fine-tuning results against the baseline performance of the model.
Following the guidelines provided on the Hugging Face model page, I have fine-tuned the ProtGPT2 model for a specific task. To ensure the quality of the fine-tuning, could you recommend methodologies to effectively evaluate the fine-tuned model, focusing on assessing for 'catastrophic forgetting', ensuring it retains some of its prior knowledge?
Thank you for your time and consideration. I look forward to your response.
Sincerely,
LW
Hi LW,
Sorry I did not reply sooner, I was on leave.
- The loss was quite high for training and eval, probably due to the large vocabulary size (52k tokens). See below the eval and train loss for the last epoch.
"epoch": 49.88,
"learning_rate": 2.4673951357067326e-06,
"loss": 6.5147,
"step": 424500
},
{
"epoch": 49.94,
"learning_rate": 1.292445071084479e-06,
"loss": 6.5139,
"step": 425000
},
{
"epoch": 49.94,
"eval_loss": 6.520303726196289,
"eval_runtime": 231.0929,
"eval_samples_per_second": 5275.233,
"eval_steps_per_second": 5.154,
"step": 425000
},
{
"epoch": 49.99,
"learning_rate": 1.1749500646222535e-07,
"loss": 6.513,
"step": 425500
}
- I haven't evaluated this myself, but I'd probably try to generate a sample of around 1000 sequences and pick the top 1/3 based on perplexity. Then from those, I'd run ESMfold and check their pLDDTs. I'd expect they have pLDDT values over 70. Beyond this, if you are fine-tuned on a specific family, I'd check that the generated sequences indeed look like members of that family. For example, if you fine-tuned on TIM-barrels, I'd check those sequences are indeed TIM-barrels. In my experience, the model tends to generate other families as well after fine-tuning, as opposed to ZymCTRL which sticks to the fine-tuned family. Hope this helps!