Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overarching Question

Does co-optimizing the non-bonded VdW and AM1BCC electrostatic force field parameters against physical property data of mixtures and electrostatic potentials improve the force fields performance on chemical potential and similar properties (e.g. solvation free energies, Log P / log D).

External Collaborators

XtalPi Inc.

Due date

 

Key outcomes

  • Develop a SMIRNOFF parameter handler which directly encodes the BCCs using smirks patterns.

  • Develop the infrastructure for generating ESP data for a diverse set of molecules, and retraining BCCs against this.

  • Perform the first OpenFF optimisations of both the VdW and Electrostatic non-bonded interactions.

Status

Status
colourYellow
titleSTARTED

...

This study will attempt to explore whether co-optimising the non-bonded VdW interactions and the electrostatic interactions (implemented using the AM1BCC method) against both physical property data of mixtures and electrostatic potentials (electrostatic interactions only) will improve a force fields ability to reproduce experimental properties related to chemical potentials, such as solvation free energies, relative solubilities, and partition coefficients. This work is in part inspired by the work of Fennell et al [1].

...

As was done with the Binary Mixture Data Feasibility Study the training set (and potentially the test set) are expected to be limited to systems measured for molecules only composed of carbon, oxygen, and hydrogen. It is as of yet unclear that there is enough data for nitrogen containing compounds to construct a meaningful, well balanced training set including these. Further, including nitrogen would significantly expand the number of BCCs which need to be re-trained (it would add five additional BCC atom types, which in turn would introduce many more BCC types) - not only would the increase the difficulty of converging the optimising, but would also require building a much larger training set for which the data likely isn’t available.

Experimental Training Set Selection

The experimental data component of the training set is likely to be composed a of enthalpy of mixing data points measured for binary systems, and density data points measured for both pure and binary systems. It will be selected so as to include a range chemical functionalities, including alcohols, alkanes, alkenes, aromatics, esters, ketones, and ethers.

The training set may also additionally include aqueous mixtures depending on data availability.

The QM component of the training set will comprise ESP data for about ? compounds that will include all those in the experimental data set along with additional compounds that include the same atom types as in the experimental compounds. Calculation of the ESPs is

More details will be added to this section once the training set selection is in progress.

QM ESP Training Set Selection and Generation

In general, the ESP data will be generated by the protocol proposed in the original AM1BCC publication as closely as possible [3].

Conformer Selection: The current plan is to employ the OpenEye ELF 10 implementation to generate reasonable conformers to include in the training set. This should produce conformers which do not contain strong electrostatic interactions. Roughly 10(???) conformers will be generated for each molecule in the training set where possible.

Level of Theory: Initially the ESPs will be generated at bothHF 6-31*G level theory and PW6B95 + aug-cc-pV(D+d)Z (the level of theory suggested in the RESP2 publication [2]). This will allow us to explore somewhat whether including the physical property data begins to incorporate some of the overpolarisation qualities of HF but in a more ‘correct’ way.

ESP Generation: If possible, the ESPs should be generated using the existing QC infrastructure.

The ESPs will be generated on a FCC grid, likely with a spacing close to 0.5 and within a 1.4-2.0 shell around the molecule (although these settings should be checked to see if instead a coarser grid could be used without a big decrease in accuracy).

Lee-Ping Wang and Hyesu Jang have a large amount of experience here which should be drawn upon.

Test Set Selection

The test set is to be curated by XtalPi, and this section will be updated with more details after further conversations with them.

...

The parameters to optimise will be dependant dependent on the composition of the training set, however presented below are rough estimates of the parameters which will be optimised. This section will be updated after the training set has been selected.

...

This section will outline the general strategies to be employed for the optimisations.

The optimisations will span the BCCs and the LJ parameters. They will start from the latest Parsley force field (current v1.2.0) which has been augmented with a new section containing the explicit AM1BCC parameters (as opposed to the current handler which simply calls to the OpenEye or RdKit implementation).

The TIP3P force field parameters will be used for any aqueous mixtures. It is not currently expected that the aqueous parameters will be re-optimised.

Optimisation Against ESP Data

In general, the optimisation against ESP data will follow the protocol proposed in the original AM1BCC publication as closely as possible [3].

Conformer Selection: The current plan is to employ the OpenEye ELF 10 implementation to generate reasonable conformers to include in the training set. This should produce conformers which do not contain strong electrostatic interactions. Roughly 10(???) conformers will be generated for each molecule in the training set where possible.

...

ESP Generation: If possible, the ESPs should be generated using the existing QC infrastructure.

The ESPs will be generated on a FCC grid, likely with a spacing close to 0.5 and within a 1.4-2.0 shell around the molecule (although these settings should be checked to see if instead a coarser grid could be used without a big decrease in accuracy).

Lee-Ping Wang and Hyesu Jang have a large amount of experience here which should be drawn upon.

Required Infrastructure

Several new pieces of infrastructure will need to be constructed in order to facilitate this study:

...