--- library_name: transformers tags: - chemistry - biology - cheminformatics - materials science license: mit language: - en metrics: - mse - r_squared base_model: - seyonec/ChemBERTa-zinc-base-v1 --- # ChemSolubilityBERTa ## Model Description ChemSolubilityBERTa is a prototype designed to predict the aqueous solubility of chemical compounds from their SMILES representations. Based on ChemBERTa, a BERT-like transformer-based architecture, ChemBERTa pre-trained on 77M SMILES strings for molecular property prediction. We adapted ChemBERTa to predict solubility values by fine-tuning ChemBERTa with the ESOL (Estimated SOLubility) dataset, a water solubility prediction dataset of 1,128 samples. A user inputs a SMILES string, and the model outputs a log solubility value (log mol/L). You can read the full paper [here](./01_ChemSolubilityBERTa.pdf). ## Fine-Tuning Details - Pretrained model: `seyonec/ChemBERTa-zinc-base-v1` - Dataset: ESOL (delaney-processed) - Task: Aqueous solubility prediction (log mol/L) - Number of training epochs: 3 - Batch size: 16 ## How to Use You can use the model to predict solubility for any molecule represented by a SMILES string: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("username/ChemSolubilityBERTa") model = AutoModelForSequenceClassification.from_pretrained("username/ChemSolubilityBERTa") smiles_string = "CCO" # Example for ethanol inputs = tokenizer(smiles_string, return_tensors='pt') outputs = model(**inputs) solubility = outputs.logits.item() print(f"Predicted solubility: {solubility}") ``` ## Citation and Usage If you use ChemSolubilityBERTa in your research or projects, please cite the following: ```bibtex @misc{ChemSolubilityBERTa, author = {Farooq Khan}, title = {ChemSolubilityBERTa: A Transformer-Based Model for Predicting Aqueous Solubility from SMILES}, year = {2024}, url = {https://huggingface.co/khanfs/ChemSolubilityBERTa} } ``` ## License This model is licensed under the [MIT License](https://opensource.org/licenses/MIT).