2021-03-04 Force Field Release meeting notes

Date

Mar 4, 2021

Participants

@Hyesu Jang
@Jeffrey Wagner
Alberto Gobbi
@Pavan Behara
@David Mobley
@Christopher Bayly
@Trevor Gokey
@Lee-Ping Wang
@Simon Boothroyd
@Jessica Maat (Deactivated)
@Joshua Horton
@Owen Madin

Discussion topics

Time	Item	Presenter	Notes

Time

Item

Presenter

Notes

WBO update

@Pavan Behara

Slides https://docs.google.com/presentation/d/16rl9A0813CAV4WrzRXdLT3jE0z20RkB7wXt0r0e2CqQ/edit?usp=sharing
TIG5 slide
- AG – The top left molecule (QCP-1762411) has some big discontinuities in energy. What’s happening there?
- CBy – I could see there being both steric and electrostatic complexity/clashes
- AG – It seems like there’s a different family of conformations here.
- LPW – When we generate the QM torsion profiles, those can have multiple conformations (of the surrounding atoms).
- LPW – when we do a fit to the torsion profile, we freeze the 4 atoms involved in the torsion and relax evertthing else. It’s this relaxation that might allow the conformations to move far from the QM. I wonder if the MM torsion profiles should allow more limited relaxation.
  - DM – So, maybe we should use small restraints on all atoms, not just the torsion?
  - LPW – My recommendation would be to constrain the angle of all rotatable bonds. The reason we don’t do that already is because geomeTRIC’s optimizations are expensive relative to OpenMM minimization. But in a case like this, it would be worthwhile to do geomeTRIC optimizations with all rotatable bonds constrained.
  - AG – Agree
  - CBy – Agree. What about constraining invertable nitrogen?
  - LPW – Yes, we should constrain pyramidal nitrogen
- It is worth mentioning that the cusps may or may not be an unphysical feature of the single-reference QM calculations. I know that for alkenes, single reference methods like DFT give a cusp whereas multi-reference methods like CASPT2 give a smooth barrier. Although it might be different for amide / urea pyramidal centers.
- CBy – Interestingly, the top leftmost structure and the second in the right column are exactly the same structure, except for the methyl.
- AG – Could we compare BOTH the energy and the RMSD?
  - PB – Yes
- Column 1, row 3
  - (general) – It’s weird because that substituent shouldn’t affect the WBO of the central bond.
- LPW – I’ve been suspecting that nonbonded interactions (eg 1-6, 1-7 interactions) and intramolecular weak interactions could be giving rise to noise that overwhelms things like cis/trans amide energy differences. So I wonder how much this would improve if we allows nonbonded parameters to vary.
  - CBy – Agree. We want the torsion parameters to capture the conjugation effects. In some cases we see this.
- CBy – LPW asked whether we’re seeing improvement because 1) we added WBO dependence, or if it’s just 2) developing a better torsion
- PB – This is all results on the TEST set, not the TRAINING set.
- CBy – Not sure that these span the “dynamic range” of torsions that could match this pattern
  - DM – Agree. Also we want to test a WBO-interpolated parameter versus the pattern with explicit single, aromatic, and double bonds.
  - LPW – Agree. I’d like to see that data.
- CBy – To review, we should
  - Decision – We should restrain other torsions in fitting
    - LPW – Previously we had used harmonic restraints on atom positions. But JH had shown that there were pathological problems caused by these restraints. I don’t recall if we took action on this finding.
      - TG – I looked into this in the optgeo targets, but I found that things would get stuck in a minimum. So I saw some improvement but I don’t think it’s a global fix.
    - CBy – So, for torsion fitting in Sage, we want to avoid catastrophic problems, what should our general strategy for torsion fitting be?
  - Decision – We will make RMSD plots for torsion fits
  - (CBy) Throw out strong sterics
    - PB – The training and test set for this presentation already DOES throw out strong sterics. If the steric energy in these plots is above 3.4 kcal/mol, I throw them out.
    - AG – It seems like some of these are probably just below the threshold. The plot in col 1, row 2 has some electrostatic interactions that may affect the force field.
    - DM – So we’ll want to revisit this.
  - JW –Our choice of follow-ups may be informed by how we’re treating this study – Is it to inform a “go/no-go” decision (in which case we should have a challenging test set), or to help us better understand the fundamentals of fitting a WBO-based FF (in which case it’s fine to just use “clean” molecules
    - SB – A bit of both. The understanding of the fundamentals will be valuable either way. And in general we’ll want to split our test set between “clean” and “congested” (which will help us find troublesome vdW parameters)
(General) – This fitting seems to imply that TIG8 is unnecessary.
- CBy – Do the bottom-middle two structures motivate the creation of a nitro-specific torsion? This wouldn’t be WBO-based.
JW – One big question is whether we think that “fit to clean data and assume that weird sterics/electrostatics are just uniformly distributed random noise” is a true statement, or if our clean datasets will be systematically offset in one direction.

1 hr?

Release decision

@Simon Boothroyd

Need to make decision on release date/what science to include. Will decide in view of latest WBO data.

Slides can be found here

LPW – One way to investigate steric effects would be to pursue a strategy of automated fragmentation
LPW – When we were looking at the problem for amides, I suggested that it would be itneresting to take the molecule that you’re doing a torsiondrive with, then separate it into two molecule, which you could then use to do a QM calculation of interacton energies. Then you could get a “real” intramolecular interaction energy. However it would be really hard to make a sane fragmentation+capping scheme. I think this is what QUBEKit does
- JH – QUBEKit doesn’t do exactly this, it’s more of an analysis of electron density to get charges/nonbonded parameters.
SB – Slide 6 – I don’t think WBOs are ready for release
- JW – Agree. Even if PB’s talk had shown great results, I’d be thinking that we need data. We can keep working on it and include it in a subsequent release when they’re ready.
- CBy – On point 4 of SB’s last slide, I’d look at titrating in WBO parameters in a very limited scope once they show clear value.
- DM – Agree. We’re getting to the point where we see some signal in limited datasets. But it’s not ready for release. We should keep efforts going on WBO interpolation.
- CBy – On point 2, I wonder if there’s some low-hanging fruit on adding specialized non-interpolated torsions to Sage based on what we’ve found with WBOs.
  - SB + LPW + DM + JW – Agree
- SB – I’m also optimistic that we can clean up nonbonded parameters so that signal here is clearer. Right now we’re mostly guided by condensed-phase properties, but also some fitting of nonbonded terms to intramolecular energies will be helpful
- LPW – Distance-dependent attenuation of INTRAmolecular interactions. Right now we have things like 1-4 scaling, which are “topology-dependent”. Maybe something more based on distance would be helpful.
- CBy – I’m not sure that two nonbonded parameters are sufficient to get INTERmolecular and INTRAmolecular interactions right. We may need to have a third parameter for nonbonded interactions. The condensed phase properties are largely based on rmin and epsilon, but I think that we want to control BOTH well depth and curvature on the attractive and repulsive faces. So we may want to move away from the 12-6 potential. So we may want to consider soft-core potential for both charge penetration and vdW.
  - LPW – Agree
- SB – I wonder if it makes sense to consider ways forward on different functional forms. We have the infrastructure to start fitting these in openMM.
  - DM – Yes, we’ll want to give other simulation communities time to implement our functional forms.

Iodine basis set issue

@Trevor Gokey @Hyesu Jang

Bill Swope reported getting very different energies for iodine-containing molecules when using scf_type=df (our default) as opposed to direct, pk, out_of_core. These energy differences are large (40 hartrees/10,000 kcal/mol) and correspond to somewhat different geometries.
- PB reports that, for single point energies for for tri-iodide using different settings
  - scf_type=pk took about 11 sec
  - df took about 6 sec
  - direct took 60 sec
(This discussion will be open-ended. We’re also planning to consult with Ben Pritchard/Lee Ping/Daniel Smith at our next submission meeting)
BS – The carbon-iodine bond length is off by 0.8A (the angles are also pretty distorted).
LPW – I don’t think there’s an intrinsic problem with the basis set iself. It’s due to the use of density fitting, but dzvp isn’t suitable for this situation. Our current fitting method assumes the existence of effective core potentials for iodine, but they aren’t available. I propose that we make a basis set where we use dzvp up to a certain atomic number (around Ar or Ca), and after that we use def2-sv(p), which is the def2 basis set of the same size, and include effective core potentials. We don’t use this for everything because, in our testing, dzvp gave good accuracy for the molecules we were looking at
- TG – How about def2-tzvp?
- LPW – I’d be comfortable using def2-tzvp instead of def2-sv(p) because it’s generally safer, even if it’s more expensive.
- CBy – What would be the effect on modeling the sigma hole of using tzvp vs. sv(p)?
  - LPW – I don’t know off the top of my head. When we did our conformational studies, sv(p) was LESS accurate than dzvp, but tzvp was MORE accurate than dzvp.
  - CBy – Based on that, I agree
- TG – tzvp would model the diffusivity of heavier atoms.
  - LPW – Agree
- Decision – in the short term, let’s filter out all iodine containing molecules from our fitting/testing sets.
- Decision – In the medium term, let’s redo all our fitting/benchmarking sets with tzvp?
  - TG – I think we’ll want to resubmit everything, but I’m not sure what we’ll want to change. We’ll need to think about this more.
  - JW – Maybe one strategy would be to resubmit ALL of our fitting/benchmarking datasets with the higher level of theory, and if they aren’t done in time, just use the previous datasets and throw out everything with Iodine.
- Decision – We’ll add a few selenium-containing molecules to the QM datasets. (possibly also arsenic)
- CBy – Does the same problem affect bromine?
  - TG – No, I don’t think so
  - BS – I did experiments on this specifically and didn’t see problems with Br or Cl. So the problem is specifically that the basis set assumes effective core potentials for I, which aren’t present. I got good results if I did dzvp but turned off density fitting.
- LPW – The ECPs in our current basis start at rubidum.
- Will the same problem affect Br?
  - CBy – I expect they will. I’d like to check more closely on bromine.
  - BS – I agree in theory but didn’t see this problem in my experiments.
  - CBy – I’m also concerned that selenium will have the same problem as bromine. This will be useful with proteins, since they may have selenomethionines.
    - HJ – I can put selenium-containing molecules in the training/test sets.
      - DM + CBy– Agree
- DM – Does our QM benchmarking include Br- and I-containing molecules?
  - HJ – There are a few, but I can add more specifically to probe iodine and bromine
HJ – I looked into the effects of iodine in our current datasets/FF – Inspection of iodine containing molecules in 2nd gen. training set for valence parameter optimization
- LPW – One could argue that the impact of wrong iodine QM data was actually more widespread because it affected the fitting of the more general parameters.

2021-03-04 Force Field Release meeting notes

Date

Participants

Discussion topics

Action items

Decisions