Electrostatics refitting plan for Rosemary

[NOTE] Dummy plan to kick off discussion

  1. Use openff-recharge to refit BCC’s and virtual site charges to a set of small molecules.

    1. Should be whatever data set Simon has prepared that is represented in QM Archive. More molecules might need to be added.

      1. The list of molecules is here: (insert location where stored)

    2. Underlying charge model is ELF10

      1. Potentially graph charges fit to ELF10, will depend on progress of graph charges.

    3. The BCC’s selected (other than virtual ites )will be the the original Bayly BCCs, no effort to come up with new BCC’s for now.

      1. These are already converted to SMIRKS format.

    4. Use gradient descent to fit (ESP or electric field? Which is best, based on @Simon Boothroyd's work so far).

      1. Gradient descent rather than least squares, in order to change the virtual geometries (within constraints of the virtual type, i.e. halogen sides stay on the bond, but can change distance along the bond)

      2. numpy or pyro?

      3. Data should be relatively constraining, so suggest no regularization.

    5. Which atomic environments have virtual sites will be selected using Cole lab analysis (10.26434/chemrxiv-2021-hsf8l, https://pubs.acs.org/doi/10.1021/acs.jcim.8b00767), with a pass from industry partners.

      1. for halogens, one site.

      2. lone pair virtual sites for oxygen, nitrogen, sulfur (1 or 2, depending on hybridization (Check with Cole group to verify))

      3. There should be some measure of the performance hit upon introducing virtual sites, which should be small but noticeable (up to 10%?)

      4. Will need to verify that toolkit can handle outputs to GROMACS, AMBER and CHARMM before release and all are defined according to SMIRNOFF spec.

  2. Perform optimizations of the BCCs and virtual site magnitudes, co-optimizing with LJ terms, using Force Balance to fit condensed phase fluid properties, the same ones used for the Sage refit.

    1. Could potentially optimize virtual site geometries with ForceBalance, but likely to involve some weird redundancies and nonlinearities; thus safer to only reoptimize variables that are the same type/magnitude.

    2. Should consider some weak regularization to avoid having the BCC’s change too much from the ones determine during electrostatic fit.

    3. Could consider using Owen’s surrogate modeling instead of ForceBalance, which appears to be doing better, but almost certainly probably better to leave that to Thyme, as there is too much scaling to larger data sets to do.