GNN Charge Models



@Lily Wang


@Simon Boothroyd Yuanqing Wang @Joshua Horton


@David Mobley @Michael Shirts @Michael Gilson @Daniel Cole


Develop a graph convolutional charge model that can produce AM1(BCC*) charges to within an accuracy comparable to the original AM1BCC model

*BCCs need not be the original Bayly et al parameters

Due date

Key outcomes


in progress


Problem Statement

Applying charge models to larger molecules, especially biopolymers, is currently intractable due to the cost of the QC calculations required. This leads to force fields often needing to define one charge model for proteins, another for small molecules and so on. Ideally a single, self consistent, charge model could be applied to all molecules within a simulation box. Further, charges for proteins are typically defined only for the standard amino-acids, and non-standard residues or post-translational modifications require a splicing, capping, and re-combination scheme which can be cumbersome to implement. A graph convolutional model on the other hand would simply be able to ingest the full protein, complete with any modifications, and yield a set of charges that are self-consistent with both the other residues and any ligands and solvent.


Must have:


Must have:


Nice to have:


Not in scope:




Open Software

The software required to carry out this project spans most of the OpenFF stack given that changing the charge model will likely have large impacts on everything downstream (i.e. vdW, valence, …). It is expected that this project will require maintenance of and extensions to:


  • Define and train GNN charge models with flexible atom features

OpenFF Recharge

  • Reconstruct ESP and EF data from QCA records

  • Estimate ESP / EF using a FF model.

  • Training BCC parameters on top of the GNN charge model and exporting these into a SMIRNOFF force field

  • Generating RESP charges to serve as a ‘reference model’

  • SMIRKS representation of AM1BCC BCCs


  • Easily visualise data sets of molecules either local or from QCA


  • Rapidly compute / visualise the ESP on the vdW surface of a molecule

OpenFF Evaluator

  • Used to compute the training set of properties while training the vdW parameters


  • Automate the set-up of training the vdW parameters against the phys-prop data

OpenFF Bespokefit

  • Automate the set-up of training the valence parameters against QC data


  • Benchmark FF with v-sites against solvation / hydration free energy data

Open Data

The project will at minimum need a diverse train set of precomputed AM1(BCC) charges to train and test the model against as well as a test set of ESP data that is made publicly available via QCA.

We will also generate a data set of RESP charges using data in the OpenFF ESP Fragment Conformers v1.0 QCA data set.

Selecting the AM1(BCC) train / test set

The GNN charge model will be initially trained on the training sets of molecules assembled by Riniker and Bleiziffer (esp=78):

The test set will be composed of molecules from the OpenFF Industry Benchmark Season 1 Publicset available on the QCA.

A validation set composed of the molecules found in the Enamine 10K diversity set will additionally be used, especially when performing hyperparameter sweeps

All molecule sets are additionally augmented with up to two protomers enumerated using the nagl prepare enumerate command.

See data-set-curation/qc-charges/ and data-set-labelling/ for additional details

ESP Level of theory

It was decided to compute the ESP at the HF/6-31G* level theory as is the current norm in the field. Although not perfect, it is not clear that another candidate that has the right balance of speed to compute and ‘accuracy' (defined in terms of how well do the final charges reproduce properties of interest e.g. Gsolv, Gbind). See for details on this choice.

Generating ESP

The process by which the data is generated is expect to proceed by:

  1. Generate a diverse set of ELF conformers for each molecule in the train (/test) set

  2. Minimize each conformer at HF/6-31G* level theory (no wave functions stored to save storage space on QCA)

  3. Discard any conformers that

    1. Formed h-bonds after minimization

    2. Underwent a connectivity change

  4. Perform a single-point calculation on the minimized conformer to compute and store the wave function

  5. ESP is recomputed from the data on QCA and stored locally using OpenFF recharge

Selecting the ESP test set

The OpenFF Industry Benchmark Season 1 v1.1 set of molecules was used as the ESP test set.

Selecting the RESP train / test set

Train / test RESP charges will be generated using data in the OpenFF ESP Fragment Conformers v1.0 and OpenFF ESP Industry Benchmark Set v1.1 QCA data sets respectively using the OpenFF Recharge framework.

Selecting the physical property test set

The Ghyd test set contained the entries in FreeSolv that would be assigned a v-site parameter as described in the next section.

Open Science

The main focus of this project should be to produce a MVP GNN charge model that performs with a similar accuracy as the popular AM1BCC model. This may either be by learning AM1 charges and training a new set of BCCs on top of this, or be training to AM1BCC directly. Learnining AM1 charges and then correcting the model with BCCs may yield a better charge model as the GNN model does not itself need to be quite so accurate, but rather must just capture the salient features of the electron distribution within a molecule.

A secondary goal for the project would then be to assess whether we can instead train the GNN model to more accurate RESP charges.

The performance of all models will be assessed against deviations of the GNN model vs the original AM1BCC model, the ESP, and a set of physical property measurements likely Ghyd and Gsolv

We opt to test against the ESP given the correlation between improving performance on ESPs and improving the performance on free energies. This metric is likely more meaningful than deviations between original AM1(BCC) charges and model predicted charges as it is not clear how ‘big’ of a deviation should be concerning, and the maginitude of a 'concerning deviation’ will likely depend on the chemical environment the atom is in (i.e. large difference in H charges likely more important than deviations of a similar magnitude of Br given the higher number of Hs in a molecule)

GNN architecture

Building off of the earlier work of Yuanqing and Josh, we intend to use a NN architecture that begins with a series of GCN layers that yield a continuous feature vector for each atom in the molecule, followed by a series of dense readout layers that predict the hardness and softness of each atom in the molecule, which can then be used to compute the charge on each atom in the molecule.

At present we intend to also use GraphSAGE GCN layers given their robustness and widespread use in the field, but consideration for switching to a GCN type that supports edge features such as E-GraphSAGE should be considered in the future.

A hyperparameter sweep should be performed to determine:

  • The number of GCN layers and the number of hidden features for each layer

  • The number of NN layers and the number of hidden features for each layer

  • The batch size

  • The learning rate

where the model with the lowest validation RMSE should likely be preferred. See train-charge-models/gnn-charge-models/ for scripts to run the sweep.

Atom features

The core features should include the total charge on a molecule, and:

  • The connectivity of the atom

  • The element of the atom

  • Whether the atom is in a ring of X size where X in [3, 4, 5, 6] at minimum

An additional feature to consider is the average formal charge on each atom, where the average is computed over the relevant resonance forms. Here we define ‘relevant’ as the stable resonance forms that differ by an electron transfer between two heteroatoms, and don’t consider different kekule forms of e.g. benzene. See the Resonance section for details.

We likely should not include features such as aromaticity which tie the GCN to a likely flawed aromaticity model, and direct formal charge on atoms as this will depend on the input resonance structure.


A GCN will typically not consider atoms that are more than 4-5 atoms away from the atom being considered due to vanishing gradient issues. This means that important phenomenology, such as resonance atoms separated by several bonds

will not be accurately captured. To ensure that resonance information is correctly incorporated into the GCN, we propose applying a modified version of the resonance form enumeration method proposed by Gilson et al (10.1021/ci034148o) to enumerate over all relevant resonance forms of a molecule, and then average the formal charge on each atom across those forms.

The algorithm requires some modification however due to the combinatoric nature of resonance forms. As an exaggerated example, consider a peptide composed of 300 deprotonated glutamic acid residues. In this case there would be 2^300 resonance forms to enumerate!

We propose to instead:

  1. Remove all sp3 carbons and hydrogens from the molecule as these cannot be involved in electron transfers

  2. For every isolated sub-graph

    1. apply the V-Charge resonance enumeration algorithm

    2. compute the average formal charge across sub-resonance forms

This approach reduces the above example then to simply computing average formal charges across 2*300 sub-graphs which is significantly more tractable.

There is some question of how useful it is to include this resonance sensitive feature - i.e. would we have a model with comparable accuracy if we skipped this somewhat cumbersome enumeration step? This should likely be explored as part of this project.

Training procedure

Three different GCN models should be trained once relevant hyperparameters have been selected:

  • a GCN trained to AM1 charges directly and

  • a GCN trained to AM1BCC charges

For the model trained only to AM1 charges a set of BCC parameters (taken from the original AM1BCC model) will then be trained to the ESP training data as described above.

An additional set of BCCs should also be trained on top to the GCN-AM1BCC model with initial values of 0.0. This will again allow the GNN some margin for error rather than needing to produce ‘perfect’ AM1BCC charges as we can correct these against ESP data as was the original intent of the BCCs in the AM1BCC model (i.e. we know AM1 is not perfect but we can greatly improve it’s accuracy through BCCs).

In all above cases:

  1. The vdW parameters of the Sage force field augmented with the new charge model will be trained against the Sage physical property training set

  2. The valence terms of the force field yielded in step 2. will be trained against the Sage QC training set.

Test procedure

The performance of the new force field will be initially assessed against two main metrics:

  • deviation from the ESP of the test set defined above, were the ESP produced by RESP charges will be used as a reference

  • hydration free energies from the FreeSolv data set, and compared using similar metrics as used in 10.26434/chemrxiv-2021-gsgr4-v3 .

Current Status

The files associated with this project can be found at and on lilac under /data/chodera/boothros/gnn-charge-models

Open Software

The software is continually evolving to meet the needs of this project, but at present meets the need of

  • training GNN charge models, and performing parameter sweeps

  • training BCCs on top of a GNN charge model.

  • re-training the vdW parameters of Sage on top of the new charge model

  • re-training the valence parameters of Sage on top of the new charge model + vdW parameters

  • benchmarking against Ghyd

The contains a lot of code that should likely be upstreamed as it is of general utility and can be used to more easily automate the performed fits.

Open Data

The reconstructed ESP data is available on lilac in the data-set-labelling/qc-esp subdirectory while the AM1(BCC) charges are located in data-set-labelling/am1-charges.

Open Science

Training the GNN models

As an initial pass to ensure the fitting infrastructure works correctly we proceed with hyperparameters selected in the original espaloma preprint. Namely, 3 GCN layers with 128 hidden features each, 4 NN layers with 128 hidden features each, a learning rate of 0.001 and a batch size of 256.

Training the GNN model to only AM1 charges
Training the GNN model to AM1BCC charges

Training the BCCs

Training BCCs on top of the AM1 GNN model


A histogram of per molecule average RMSE between model and QC ESP computed over multiple ELF conformers
The shift in per molecule average RMSE between model and QC ESP computed over multiple ELF conformers of two GNN charge models compared to the original AM1BCC charge model*. More negative is better. See 10.26434/chemrxiv-2021-gsgr4-v3 for details on this metric.

* Espaloma appears to assign charges that sum to 0.0 even when the molecule has a non-zero total charge. Here we filter any charged molecules to give a fairer comparison.

 Reference materials