ibm 's Collections

Materials

Welcome to IBM’s multi-modal foundation model for materials, FM4M, designed to support and advance research in materials science and chemistry.


    Note A large encoder-decoder chemical foundation model, SMILES-based Transformer Encoder-Decoder (SMI-TED), pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction. Our experiments across multiple benchmark datasets demonstrate state-of-the-art performance for various tasks.


    Note SELFIES-TED introduces a transformer trained on SELFIES strings for improved molecule property prediction. SELFIES-TED uses a BART backbone to learn a molecule representation while also being able to generate novel molecules. SELFIES-TED has 354M parameters and was trained on 1 billion molecules from zinc-22, applying smiles enumeration.


    Note We present MHG-GED, an autoencoder architecture that has an encoder based on GNN and a decoder based on a sequential model with MHG. Since the encoder is a GNN variant, MHG-GNN can accept any molecule as input, and demonstrate high predictive performance on molecular graph data. In addition, the decoder inherits the theoretical guarantee of MHG on always generating a structurally valid molecule as output.


    Note Explore Foundation Models for Materials with an intuitive Gradio app! Test our state-of-the-art models—SMI-TED, SELFIES-TED, and MHG-GED—on your custom datasets for both classification and regression property prediction tasks. Get insights into materials science with ease


    Note Explore Foundation Models for Materials with an intuitive Gradio app! Test our state-of-the-art models—SMI-TED, SELFIES-TED, and MHG-GED—on your custom datasets for both classification and regression property prediction tasks. Get insights into materials science with ease