Co-optimisation of AM1BCC and VdW Parameters

Overarching Question

Does co-optimizing the non-bonded VdW and AM1BCC electrostatic force field parameters against physical property data of mixtures and electrostatic potentials improve the force fields performance on chemical potential and similar properties (e.g. solvation free energies, Log P / log D).

External Collaborators

XtalPi Inc.

Due date

Oct 31, 2020 

Key outcomes

  • Develop a SMIRNOFF parameter handler which directly encodes the BCCs using smirks patterns.

  • Develop the infrastructure for generating ESP data for a diverse set of molecules, and retraining BCCs against this.

  • Perform the first OpenFF optimisations of both the VdW and Electrostatic non-bonded interactions.

Status

STARTED

Overview

This study will attempt to explore whether co-optimising the non-bonded VdW interactions and the electrostatic interactions (implemented using the AM1BCC method) against both physical property data of mixtures and electrostatic potentials (electrostatic interactions only) will improve a force fields ability to reproduce experimental properties related to chemical potentials, such as solvation free energies, relative solubilities, and partition coefficients. This work is in part inspired by the work of Fennell et al [1].

While training just the VdW interaction parameters against enthalpies of mixing and mass densities of binary mixtures has already shown promise towards improving a force fields performance (even without the inclusion of properties of pure systems), only so much improvement can be gained while keeping the electrostatic model fixed [2]. In fact, it is arguable that keeping the electrostatic model fixed may be detrimental when optimising the VdW interactions as this may lead to the VdW parameters which are chosen to overcome deficiencies in the charges.

This study will be conducted jointly between OpenFF and XtalPi. OpenFF will be responsible for curating the training data sets (both experimental and quantum) and performing the force field optimisations against these. XtalPi will then taken responsibility for curating the test sets of the properties mentioned above and assessing force field performance against these using their compute resources.

It will be required that the compositions of both the training and test sets must be agreed upon by both OpenFF and XtalPi.

Data Set Selection

As was done with the the training set (and potentially the test set) are expected to be limited to systems measured for molecules only composed of carbon, oxygen, and hydrogen. It is as of yet unclear that there is enough data for nitrogen containing compounds to construct a meaningful, well balanced training set including these. Further, including nitrogen would significantly expand the number of BCCs which need to be re-trained (it would add five additional BCC atom types, which in turn would introduce many more BCC types) - not only would the increase the difficulty of converging the optimising, but would also require building a much larger training set for which the data likely isn’t available.

Experimental Training Set Selection

The experimental data component of the training set is likely to be composed of enthalpy of mixing data points measured for binary systems, and density data points measured for both pure and binary systems. It will be selected so as to include a range chemical functionalities, including alcohols, alkanes, alkenes, aromatics, esters, ketones, and ethers.

The training set may also additionally include aqueous mixtures depending on data availability.

The QM component of the training set will comprise ESP data for about ? compounds that will include all those in the experimental data set along with additional compounds that include the same atom types as in the experimental compounds. Calculation of the ESPs is

More details will be added to this section once the training set selection is in progress.

QM ESP Training Set Selection and Generation

Conformer Selection: The current plan is to employ the OpenEye ELF 10 implementation to generate reasonable conformers to include in the training set. This should produce conformers which do not contain strong electrostatic interactions. A maximum of 5 conformers will be generated for each molecule in the training set where possible. Smaller or more rigid molecules may have less than this.

Level of Theory: Initially the ESPs will be generated using PW6B95 + aug-cc-pV(D+d)Z (the level of theory suggested in the RESP2 publication [2]). This will allow us to explore somewhat whether including the physical property data begins to incorporate some of the overpolarisation qualities of HF but in a more ‘correct’ way.

ESP / EF Generation:The ESP and electric field data will be generated using the existing QC infrastructure.

The ESPs will be generated on a FCC grid with a spacing of 0.5 and within a 1.4-2.0 shell around the molecule. These are the same settings used in the original AM1BCC paper.

Test Set Selection

The test set is to be curated by XtalPi, and this section will be updated with more details after further conversations with them.

Parameters to Optimise

The parameters to optimise will be dependent on the composition of the training set, however presented below are rough estimates of the parameters which will be optimised. This section will be updated after the training set has been selected.

The assumption has been made that the training set will be composed of systems composed of only carbon, hydrogen, and oxygen.

VdW Interaction Parameters

There will likely be 18 VdW parameters being optimised - an epsilon and sigma parameter for each of the below SMIRKS patterns.

  • [#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]

  • [#1:1]-[#6X3](~[#7,#8,#9,#16,#17,#35])~[#7,#8,#9,#16,#17,#35]

  • [#1:1]-[#8]

  • [#1:1]-[#6X4]

  • [#6:1]

  • [#6X4:1]

  • [#8:1]

  • [#8X2H0+0:1]

  • [#8X2H1+0:1]

BCC Parameters

There will likely be a total of 40 BCC parameters to be optimised based upon the restrictions on the training set proposed above.

Optimisation Strategy

This section will outline the general strategies to be employed for the optimisations.

The optimisations will span the BCCs and the LJ parameters. They will start from the latest Parsley force field (current v1.2.0) which has been augmented with a new section containing the explicit AM1BCC parameters (as opposed to the current handler which simply calls to the OpenEye or RdKit implementation).

The TIP3P force field parameters will be used for any aqueous mixtures. It is not currently expected that the aqueous parameters will be re-optimised.

Required Infrastructure

Several new pieces of infrastructure will need to be constructed in order to facilitate this study:

Task

Progress

Links

Assignee

Task

Progress

Links

Assignee

The current AM1BCC parameters will need to be ported to the SMIRNOFF specification, and a new ParameterHandler will need to be created to

COMPLETE

@Simon Boothroyd

An automated way to reproducibly compute ESPs will need to be constructed if this does not already exist.

COMPLETE

@Simon Boothroyd

ForceBalance will need to be extended with a new target which can train the BCC parameters against the ESP data (possible consumed for QCArchive?) if this is not already possible.

COMPLETE

@Simon Boothroyd

The gradient calculations in OpenFF Evaluator need to be validated for computed gradients of BCC parameters.

NOT STARTED

 

 

 

References

[1] A Fixed-Charge Model for Alcohol Polarization in the Condensed Phase, and Its Role in Small Molecule Hydration

[2] Non-bonded force field model with advanced restrained electrostatic potential charges (RESP2)

[3] Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: I. Method