2020-09-04 Charge Model Meeting notes

Date

Sep 4, 2020

Participants

  • @Simon Boothroyd

  • @Owen Madin

  • @David Mobley

  • @Christopher Bayly

  • @Michael Schauperl (Deactivated)

  • Paul Nerenberg

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

5min

Current infrastructure (+ limitations)

Simon

  • Willa Wang will be joining the project to follow up on M. Schauperl’s work

10min

Overview of plan for refitting of LJ with BCC parameters



  • Simon: All infrastructure is in place to optimize BCCs against ESP data/ mean field data

  • John: is the goal to use ESPs as a regularizer, or dual target with phys prop?

  • Simon: Not entirely sure, but hoping to use dual target especially when we co-optimize

  • Bayly: We didn’t regularize BCCs. What prevents overparameterization is a wide train set (in terms of BCCs). When we just fit to ESP, there are some things we just can’t get right. Not regularization, but a posteriori fitting to make it work

  • Gilson: Off-site charges could help

  • Bayly: Strongly agree. Fitting to many targets across one BCC is the regularization, along with some phys prop

  • Simon: We can do the co-optimization in forcebalance, but cool thing is that math is fast enough to use a Bayesian method for BCC calculations (pyro). Owen will take this on in the future.

  • John: I can put you in touch with pyro developer

 

Questions?

  • Level of theory + implicit solvent model (same as RESP2?)

  • Include both PCM QM + gas phase QM targets?

  • ESP vs Electric Field (or both? with a split training set?)

  • Strategy for refit

    • Initially refit only BCC parameters to QM data, then refit LJ separately

    • refit both BCC and LJ at same time

    • refit BCC, then co-optimise BCC + LJ (perhaps with heavier priors on BCC parameters).

 

  • Idea for study with XtalPi is to train LJ and BCCs against phys prop (EoM, density) and ESP data, test against solubility data from Minnesota database

  • Simon: Our initial plan for ESP level-of-theory is to copy what was done in RESP2. Reasonable?

  • Schauperl: Looked a lot into level of theory, not as much into implicit solvent model. Not much consensus on what is best. Should take a look. PSI4 has a limited options for PCM model

  • Paul: My view on level of theory is that RESP2 is fine, and reasonable functional and basis set with diffuse functions is ok

  • Bayly: Second what Paul said. Paper that Schauperl, Gilson, Paul put out shows this.

  • Gilson: That’s for reference ESPs. What about question of could we replace AM1?

  • Chodera: Portability is important!

  • Simon: PM3 not available in OpenEye

  • Bayly: Problem with PM3 is that it has more limited parameters for unusual atoms. Ok for strucitly regular organic compounds but breaks down for more exotic

  • Paul: HF3C or (sort of) semi empirical methods?

  • Bayly: I’m excited about v-charge/graph based methods. Want to suppress conformational variance. We would like a graph-based methods that doesn’t depend on bond order

  • Gislon: V-charge can average over alternate resonance forms. There are some systems where there are too many resonance forms

  • Chodera: We’re working on a GCN based version of something similar.

  • Owen: We need to think about what is ready for next release, but also future. Probably AM1 next release

  • Chodera: I think we do Am1 for now but replace with a graph net that can reproduce AM1 (and more in the future)

  • Schauperl: Small corrections allow you to fine tune a lot more. Small BCCs better.

  • Bayly: Charge model needs to be able to come up with charges for 50 pharmaceutical molecules with ~10 rotatable bonds in an hour. Am1 can, HF3C can’t

  • Gilson: Fragmentation possible for speed-up?

  • Schauperl: You lose info with fragmentation. Would like to come back to question of ESPs vs. electric fields, conformations, etc.

  • Simon: Conformer generation via openeyes omega and elf10 to generate 10k+ conformers and prune down to best 5-10.

  • Schauperl: Sounds good unless someone else has a better idea

  • Chodera: OpenEye or RDKIT for AM1?

  • Simon: OpenEye, but we can test

  • Wagner: Dave Cerrutti’s MGCX?? is an alternative with nice features

  • Simon: OpenEye is proven, and don’t think there is an openness issue.

  • Chodera: Should sync up with settings so that we can use the same generation for GCNs

  • Simon: All code available in recharge repo: ADD

  • Simon: ESP vs. electric field. Hyesu saw that EF better for polar molecules, worse for non-polar. Can split training set so non-polar trained against ESP, polar against EF

  • Bayly: Never tried splitting out, but found that fitting to only EF was always better than any combo including ESPs. Least squares tend to fit the middle of the data better, with charges you want to fit the extremes well. Fits to ESP tend to be under-polarized for this reason. The more polar areas are better treated in EF. Not sure about separating polar/non-polar; problem is that overall non-polar molecules can have very polar areas. For example pyridazine. Total dipole 0, but very polar around nitrogens. Fitting only to EF always advantageous in terms of global fit.

  • Simon: If we want to throw these questions in, it’s very simple in terms of human work.

  • Chodera: It may be different fitting BCCs vs a graph in the future, but we’ve seen that fitting to some forces is helpful.

  • Simon: Other thing is whether to include gas phase. RESP2 got best charges from average between gas/continuum charges. Is there any benefit to including gas phase?

  • Bayly: If you’re gonna pick one dielectric constant, use 4 instead of 80. If you fix it to one, you have to repeat the calculation; with RESP2 you can tune. A circumstance where you might want to tune is for some of the systematically under/over polarized groups in my compound. It would only be twice the compute time to

  • Simon: I wonder if you could just add two targets (one target is gas-phase, one solvent) and then weight those effects.

  • Gilson: I’m not sure what the benefit here would be vs Schauperl method.

  • Simon: When including physical properties, thought it might be simpler to put everything in together.

  • Mobley: Idea is make both sets of fields the fitting target and then let the averaging happen as part of fitting

  • Gilson: There’s no chance of getting to 0 objective function with split target

  • Mobley: Why not try all three?

  • Bayly: Why don’t we start with RESP2, and get a good model out there. Then we can address these further scientific questions.

  • Gilson: In discussing this RESP2, is this what we’re fitting the AM1-BCC to or something else within openff?

  • Mobley: We’re always going to have something like AM1-BCC. If we can show in the future that RESP2 has enough benefits over AM1-BCC, maybe we can have a “higher-quality” forcefield for use with RESP2

  • Schauperl: The advantage of the RESP2 method is that I can tune alongside the LJ parameters.

  • Gilson: Paradoxically, RESP2 is a cheap optimization because there’s only the delta parameter to tune.

  • Bayly: We’re using RESP2 to get residual field vs AM1 - then optimize that with BCCs

  • Simon: Idea for co-opt is to do optimization against RESP2 for BCCs, then optimize both BCCs and LJ vs. condensed phase

  • Bayly: Huge problem with linear dependencies. Very interested and afraid to see what happens. When you increase the LJ radius it affects charge, so densities and others could get worse.

  • Paul: Key takeaway from Schauperl paper is that you can readjust LJ to fit all sorts of delta parameters.

  • Gilson: Can you tell with torsions if you’ve botched your LJ?

  • Bayly: Densities should definitely be botched. It will also affect torsions and 1-4s. Theres a tremendous interaction between these factors.

  • Schauperl: Lots of freedom in BCCs

  • Gilson: What’s the plan for # of BCCs

  • Simon: I think that we just have to try things and see how it goes. I think the biggests blowup of BCC params comes from nitrogen; permutation effect. So nitrogens is where we can cut back on BCC parameters. We could start from fresh on the nitrogens or use Bayesian approaches. THe other thing that could help is using bond order interpolation to reduce the number of BCCs. For both places we don’t have a straightforward path on how to do this, probably just need to try something.

  • Bayly: We didn’t start with 300+, we ended up there. Gradients will be helpful for finding bifurcation points. We can algorithmically identify the bifurcation points. Can use ChemPer here. In a least squares world can use gradients to split BCCs

  • Simon: I think that’s what we’re thinking about as well.

  • Bayly: we want to start simple and let the data drive.

  • Simon: Is there some way we can use gradients to drive RJMC method or something similar.

  • Simon: Should we have a standing meeting for this?

  • Gilson: Let’s do it for a few weeks and see where it goes.

Action items

Set up weekly meetings for the next few weeks @Simon Boothroyd

Decisions