# Grappa - Machine Learned MM Parameterization _A machine learned molecular mechanics force field based on a graph attentional network_ Paper: [https://pubs.rsc.org/en/content/articlepdf/2025/sc/d4sc05465b](https://pubs.rsc.org/en/content/articlepdf/2025/sc/d4sc05465b) ```bibtex @article{seute2025grappa, author = "Seute, Leif and Hartmann, Eric and Stühmer, Jan and Gräter, Frauke", title = "Grappa – a machine learned molecular mechanics force field", journal = "Chem. Sci.", year = "2025", volume = "16", issue = "6", pages = "2907-2930", publisher = "The Royal Society of Chemistry", doi = "10.1039/D4SC05465B", url = "http://dx.doi.org/10.1039/D4SC05465B",} ```

Table of contents

- [Tutorials](#tutorials) - [Using the Grappa force field](#using-the-grappa-force-field) - [Installation](#installation-for-use-as-force-field) - [Installation for development](#installation-for-the-development-of-custom-grappa-force-fields) - [Pretrained models](#pretrained-models) - [Datasets](#datasets) - [Training](#training) - [Common pitfalls](#common-pitfalls)

Grappa Overview

Grappa predicts MM parameters in two steps. First, atom embeddings are predicted from the molecular graph with a graph neural network. Then, transformers with symmetric positional encoding followed by permutation invariant pooling map the embeddings to MM parameters with desired permutation symmetries. Once the MM parameters are predicted, the potential energy surface can be evaluated with MM-efficiency for different spatial conformations, e.g. in GROMACS or OpenMM.

## Tutorials We provide instructive example scripts for both application and training on custom datasets in the following Google Colab notebooks that run entirely on the cloud and do not require any local installation: - [Using Grappa as GROMACS force field](https://colab.research.google.com/drive/1C1ebqR9CnxkMLSR3aJ87zcY2W_nqIMmN?usp=sharing) - [Using Grappa as OpenMM force field](https://colab.research.google.com/drive/1H6leB4hrgB6MttPokeVntcPNFMtzqZto?usp=sharing) - [Training Grappa models](https://colab.research.google.com/drive/1HCsFIGh8mQu2F9acWw7YFAabAhEjZRfc?usp=sharing) - [Creating and training on custom datasets](https://colab.research.google.com/drive/143Ycnof3-9TLO7P8CWLsH7K0TMHMfr6s?usp=sharing) ## Using the Grappa Force Field The current version of Grappa only predicts bonded parameters; the nonbonded parameters like partial charges and Lennard Jones parameters are predicted with a traditional force field of choice. The input to Grappa is therefore a graph representation of the system of interest that already contains information on the nonbonded parameters. Currently, Grappa is compatible with GROMACS and OpenMM. For instructive example scripts, see the Google Colab tutorials ([GROMACS](https://colab.research.google.com/drive/1C1ebqR9CnxkMLSR3aJ87zcY2W_nqIMmN?usp=sharing), [OpenMM](https://colab.research.google.com/drive/1H6leB4hrgB6MttPokeVntcPNFMtzqZto?usp=sharing)). ### GROMACS In GROMACS, Grappa can be used as command line application that receives the path to a topology file and writes the bonded parameters in a new topology file. ```{bash} # parametrize the system with a traditional forcefield: gmx pdb2gmx -f your_protein.pdb -o your_protein.gro -p topology.top -ignh # create a new topology file with the bonded parameters from Grappa, specifying the tag of the grappa model: grappa_gmx -f topology.top -o topology_grappa.top -t grappa-1.4 -p # (you can create a plot of the parameters for inspection using the -p flag) # continue with ususal gromacs workflow (solvation etc.) ``` Also see the Colab Notebook: [Grappa as GROMACS force field](https://colab.research.google.com/drive/1H6leB4hrgB6MttPokeVntcPNFMtzqZto?usp=sharing) ### OpenMM To use Grappa in OpenMM, parametrize your system with a traditional forcefield, from which the nonbonded parameters are taken, and then pass it to Grappas OpenMM wrapper class: ```{python} from openmm.app import ForceField, Topology from grappa import OpenmmGrappa topology = ... # load your system as openmm.Topology classical_ff = ForceField('amber99sbildn.xml', 'tip3p.xml') system = classical_ff.createSystem(topology) # load the pretrained ML model from a tag. Currently, possible tags are 'grappa-1.4', 'grappa-1.3' and 'latest' grappa_ff = OpenmmGrappa.from_tag('grappa-1.4') # parametrize the system using grappa. system = grappa_ff.parametrize_system(system, topology) ``` There is also the option to obtain an openmm.app.ForceField that calls Grappa for bonded parameter prediction behind the scenes: ```{python} from openmm.app import ForceField, Topology from grappa import as_openmm topology = ... # load your system as openmm.Topology grappa_ff = as_openmm('grappa-1.4', base_forcefield=['amber99sbildn.xml', 'tip3p.xml']) assert isinstance(grappa_ff, ForceField) system = grappa_ff.createSystem(topology) ``` Also see the Colab Notebook: [Grappa as OpenMM force field](https://colab.research.google.com/drive/1H6leB4hrgB6MttPokeVntcPNFMtzqZto?usp=sharing) ## Installation for use as force field For using Grappa in GROMACS or OPENMM, Grappa in cpu mode is sufficient since the inference runtime of Grappa is usually small compared to the simulation runtime. For training, gpu mode is advised, see below. Create a conda environment with python 3.10: ```{bash} conda create -n grappa python=3.10 -y conda activate grappa ``` In cpu mode, Grappa is available on PyPi: ```{bash} pip install grappa-ff ``` Depending on the MD engine used, an installation of OpenMM or GROMACS is needed (see below). The installation is also part of the Colab Notebooks [Grappa as GROMACS force field](https://colab.research.google.com/drive/1H6leB4hrgB6MttPokeVntcPNFMtzqZto?usp=sharing) and [Grappa as OpenMM force field](https://colab.research.google.com/drive/1H6leB4hrgB6MttPokeVntcPNFMtzqZto?usp=sharing) ### GROMACS The creation of custom GROMACS topology files is handled by [gmxtop](https://github.com/graeter-group/gmxtop). If Grappa was installed from source, verify the Grappa-gmx installation by running ```{bash} pytest pytest -m slow ``` ### OpenMM OpenMM has to be installed in the same environment as Grappa. It is advised to install OpenMM via conda: ```{bash} conda install -y -c conda-forge openmm # optional: cudatoolkit= ``` Since the resolution of package dependencies can be slow in conda, it is recommended to install OpenMM first and then install Grappa. If Grappa was installed from source, Grappa-OpenMM installation by running ```{bash} pytest pytest -m slow ``` ### Installation from source (CPU mode) To install Grappa from source, clone the repository and install requirements and the package itself with pip: ```{bash} git clone git@github.com:graeter-group/grappa.git cd grappa pip install -r installation_utils/cpu_requirements.txt pip install -e . ``` Verify the installation by running ``` pytest ``` ## Installation from source (GPU mode) For training Grappa models, neither OpenMM nor GROMACS are needed, only an environment with a working installation of [PyTorch](https://pytorch.org/) and [DGL](https://www.dgl.ai/) for the cuda version of choice. Note that installing Grappa in GPU mode is only recommended if training a model is intended. Instructions for installing dgl with cuda can be found at `installation_utils/README.md`. For torch 2.4 and cuda 12.4, this can be done by ```{bash} conda create -n grappa-dev python=3.10 -y conda activate grappa-dev pip install torch==2.4 --index-url https://download.pytorch.org/whl/cu124 pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/repo.html ``` In this environment, Grappa can be installed by ```{bash} pip install -r installation_utils/requirements.txt pip install -e . ``` Verify the installation by running ``` pytest pytest -m gpu ``` ## Pretrained models Pretrained models can be obtained by using `grappa.utils.run_utils.model_from_tag` with a tag (e.g. `latest`) that will point to a version-dependent url, from which model weights are downloaded. Available models are listed in `models/published_models.csv`. An example can be found at `examples/usage/openmm_wrapper.py`, available tags are listed in `models/published_models.csv`. For full reproducibility, also the respective partition of the dataset and the configuration file used for training is included in the released checkpoints and can be found at `models/tag/config.yaml` and `models/tag/split.json` after downloading the respective model (see `examples/reproducibility`). In the case of `grappa-1.4`, this is equivalent to running ```{bash} python experiments/train.py data=grappa-1.4 model=default experiment=default ``` | Tag | Description | |-----------|--------------------------------------| | grappa-1.4.0 | Covers peptides, small molecules, rna. Used for protein and peptide simulations reported in the paper.| | grappa-1.4.1-radical | Covers peptides, small molecules, rna, peptide radicals.| | grappa-1.4.1-light | Lightweight model with much fewer parameters for testing. Covers peptides, small molecules, rna, peptide radicals.| ## Datasets Datasets of dgl graphs representing molecules can be obtained by using the `grappa.data.Dataset.from_tag` constructor. An example can be found at `examples/usage/dataset.py`, available tags are listed in `data/published_datasets.csv`. To re-create the benchmark experiment, also the splitting into train/val/test sets from Espaloma is needed. This can be done by running `dataset_creation/get_espaloma_split/save_split.py`, which will create a file `espaloma_split.json` that contains lists of smilestrings for each of the sub-datasets. These are used to classify molecules as being train/val/test molecules upon loading the dataset in the train scripts from `experiments/benchmark`. The datasets containing radicals and 1000K states were created as described in the GitHub repository [grappa-data-creation](https://github.com/LeifSeute/grappa-data-creation). Also the evaluation of Grappa on the 3bpa dataset is desribed there. For the creation of custom datasets, take a look at the Colab notebook [Creating and training on custom datasets](https://colab.research.google.com/drive/143Ycnof3-9TLO7P8CWLsH7K0TMHMfr6s?usp=sharing), the `examples/` directory at [grappa-data-creation](https://github.com/LeifSeute/grappa-data-creation). | Tag | Description | |----------------------------|---------------------------------------------------------------------------------------| | spice-pubchem | Small molecule dataset from Espaloma. Sampled from MD. | | rna-nucleoside | Nucleoside dataset from Espaloma. Sampled from MD. | | gen2 | Small molecule dataset from Espaloma. Sampled from optimization trajectories. | | spice-des-monomers | Small molecule dataset from Espaloma. Sampled from MD. | | spice-dipeptide | Dipeptide dataset from Espaloma. Sampled from MD. | | rna-diverse | RNA dataset from Espaloma. Sampled from MD. | | gen2-torsion | Small molecule dataset from Espaloma. Sampled from torsion scans. | | pepconf-dlc | Peptide dataset from Espaloma. Sampled from optimization trajectories. | | protein-torsion | Peptide dataset from Espaloma. Sampled from torsion scans. | | rna-trinucleotide | Trinucleotide dataset from Espaloma. Sampled from MD. | | espaloma_split | Defines the train val test split used for training Espaloma 0.3.0. | | spice-pubchem-filtered | spice-pubchem without molecules with QM forces over 500 kcal/mol/Angstroem. | | spice-dipeptide-amber99 | Spice-dipeptide but with nonbonded parameters from amber99. | | spice-dipeptide-charmm36 | Spice-dipeptide but with nonbonded parameters from charmm36. | | protein-torsion-amber99 | Protein-torsion but with nonbonded parameters from amber99. | | protein-torsion-charmm36 | Protein-torsion but with nonbonded parameters from charmm36. | | dipeptides-hyp-dop-300K-amber99 | Dataset of dipeptides with HYP and DOP residues at 300K with amber99SB-ILDN* nonbonded parameters. Sampled from MD. | | uncapped-300K-openff-1.2.0 | Dataset of peptides without capping at 300K with OpenFF 1.2.0/am1-bcc nonbonded parameters. Sampled from MD. | | peptide-radical-MD | Radical peptides with states sampled from MD. | | peptide-radical-scan | Radical peptides with states sampled from torsion scans. | | peptide-radical-opt | Radical peptides with states sampled from optimization trajectories. | Espaloma datasets from: [https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc00690a](https://pubs.rsc.org/en/content/articlehtml/2024/sc/d4sc00690a) ## Training Grappa models can be trained with a given configuration specified using hydra by running ```{bash} python experiments/train.py data.data_module.datasets=[spice-dipeptide] ``` With hydra, configuration files can be defined in a modular way. For Grappa, we have configuration types `model`, `data` and `experiment`, for each of which default values can be overwritten in the command line or in a separate configuration file. For example, to train a model with less node features: ```{bash} python experiments/train.py data.data_module.datasets=[spice-dipeptide] model.graph_node_features=32 ``` and for training on the datasets of grappa-1.4 (defined in `configs/data/grappa-1.4.0`), one can run ```{bash} python experiments/train.py data=grappa-1.4 model=default experiment=default ``` For starting training with pretrained model weights, call e.g. ```{bash} python experiments/train.py experiment.ckpt_path=models/grappa-1.3.0/checkpoint.ckpt ``` Training is logged in [wandb](https://docs.wandb.ai/quickstart) (for which a free account is required) and can be safely interrupted by pressing `ctrl+c` at any time. C [README truncated...]

/tools

PrismNet

SkipGNN

GNPS_Workflows

Revisiting-PLMs

CADD_Vault

rna3db

LucaVirus

otter-knowledge

LP-PDBBind

3D-MIL-QSAR

CodonFM

metl

EGNO

grappa

OpenQDC

lohi_splitter

mdCATH

data-repo_plm-finetune-eval

GrASP

Samsung-AI-Challenge-for-Scientific-Discovery

SAMPL6

flexdock

hignn

RamaNet