/
2021-02-04 Force Field Release meeting notes

2021-02-04 Force Field Release meeting notes

Date

Feb 4, 2021

Participants

  • @Hyesu Jang

  • @Christopher Bayly

  • @David Mobley

  • @Trevor Gokey

  • @Pavan Behara

  • @Owen Madin

  • @Jessica Maat (Deactivated)

  • @Joshua Horton

  • @Simon Boothroyd

  • @Jeffrey Wagner

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

10 min

hessian fitting

@Hyesu Jang

https://openforcefield.atlassian.net/l/c/PpMg11Wc

  • CBy – How are QM frequencies matched to MM frequencies? When looking at MAE plots, how is that computer for an outlier?

    • HJ – When I make an overlap matrix between QM and MM normal modes, there are cases where the overlap is large, but the frequency difference is really large.

    • CBy – It could be that these large outliers are just due to incorrect assignments.

    • DM – IIRC, the advantage of the hessian procedure is that that there’s no matching step (whereas the vibrational frequency matching DOES require matching)

    • HJ – Yes, this is why I do hessian matching (It turns out this was coming up in Hyesu’s page above)

    • CBy – Oh, I see, so you already anticipated my issue above. But vibrational frequencies might be matched using eigenvector overlap

    • HJ – The vibfreq data I showed above does use eigenvectors

  • CBy – In terms of normalization, … When out FF gets a lot better, we should be able to fit directly to those frequencies. Though I wonder whether we’ll be able to get useful info from torsion angles. I wonder if we should use vibrational data only for bonds for now. Since terms like torsions will be really convoluted that they’ll probably just add noise.

  • DM - already use vibrational frequiences heavily

  • CB - would prefer to use much more energies and gradients as if the eigen vectors aren’t related, then eigen values won’t be. I.e. if energies and gradients bad, then second derivatives will be much worse. When we get to a point where the energies and gradients are closer, then second derivative may be useful to help fine tune.

  • DM - what should we do differently now?

  • CB - not 100% sure.

  • DM - excited about doing additional fitting experiments. 1) replace vib freq with internal coord hessian, 2) try moving away from either of these and only focus on energies + gradients.

  • CB - Hyesu showed that can separate and isolate components such as bond lengths from internal hessian approach - so may be value in using to fit bond lengths.
    May help with diagnosing where 1-4 interactions are too far off? Question is need to narrow down how will this help with the objective function, and how to use it to diagnose which valence contribution or non-bonded is causing friction?

  • JW - what experiments can we run to best explore whether we need to be refitting the bond lengths?

  • CB - Historically, been able to resolve issues with respect to bond lengths by not just fitting to single point frequency. Fitting to QM geometry / energies may have allowed bond lengths to artificially extend, but the upside was it made the other valence terms easier to fit.

  • SB – This seems like an interesting direction of study, but at this point we don’t have enough data to make a call.

  • TG – In terms of summarising what the hessian is telling us what the vibrational frequencies should be - why should a frequency of 3000 map to a frequency of 1000? Or modes mass weighted?

  • HG - yes mass weighted.

  • TG - curious as to why this matching is happening.

  • HG - Lee-ping used some trick to compute the overlap by summing the information of frequencies so that it includes differences in frequencies.

  • CB - with your internal coordinate apprach - are the highest discrepancies along the bonds? Assumed the bonds will be dominant, but are they? The internal coordinate hessians should say

  • HG - will check.

  • CB - internal coordinate hessians may then allow us to tell where the biggest discrepancies are (angles, bonds, torsions, 1-4).

  • HG - largest discrepancies in bonds and angles. without normalization will mostly fit bond and angles terms.

  • CB - in short it may be important for angles. Bonds matter (ish) but angles matter more! How will hessians contribute to the objective function? I.e. how do hessians map into which parameters need to be changed? At what point does the second derivative information become important?

  • DM - we should try some test fits using the hessians soon. Would suggest making regularization proportional to the type of degree of freedom. down weight bonds more, angles less, all the other ones less.

  • DM - timeline on this being ready?

  • HG - one to two weeks maybe

  • SB - let me know if theres anything I can do to help.

  • DB - If this does show that this fixes something which vibrational frequencies have not performed well on, would be good to get into Sage.


20 mins


wbo update


@Pavan Behara

  • PB - to recap in fit 4 had torsion filters, fit 7 did not not

    In fit 4 1.3.0 did better in TFD and RMSD, fit 7 moved in the right direction but 1.3.0 still out-performed.

    Benchmarked fit 7 on Lim and Hahn set. Lots of phenols and bi-aryls etc.

    Fit 7 shows potentially marginal improvement in RMSD but not TFD.

  • CB - trying to decide of RMSD + TFD got better overall. Would love to dig more specifically into the chemistry to see what populations are getting better or worse.

  • DM - if had error bars - would there be statistical significance.

  • JW - also interested in WBO conformer dependence of these.

  • CB - am I right that everything in fit 7 has 1 less parameter less per torsion?

  • PB - no - roughly 10 general parameters replaced by ~ 10 interpolated parameters.

  • CB - main question is, if there is less parameters, do we similar or slightly better / performance?

  • PB - not the case here with fit seven.

  • CB + DM - most structure in ddE benchmark are sterically congested.

  • CB - including molecules with strong steric congestion in the benchmark may still skew the results (in addition to them maybe causing issues for optimisations)

  • CB - when we do benchmarking, can easily compute if there is a strong steric / non-bonded interaction. Can we use that to prune out molecules dominated by such to try and de-convolute the analysis.

  • SB - agree we should do this.

  • CB - in general condensed phase fitting of vdW mainly looking at the long range portions of the potential. Could we include molecules with the strong steric ones.

  • SB - not a lot of data.

  • CB - worried bond order is not enough to solve the issues we are seeing. VdW + Coulomb will be very important here.

  • SB - we should probably start thinking about the 1-4 scale factors sooner rather than later as an initial way to look into this.

  • CB - red and yellow dots are super imposed. If gave the barrier a value of ~6 - the residual would almost drop to zero. For green dots, would only need a barrier of 3.

  • DM - smaller residual on the green dots.

  • CB - so the difference is how much the local optimization took care of the QM torsional barrier before the fits.

  • DM - I’m confused - so you’re using different geometry and your also using different force fields?

  • PB - yes.

  • CB - to SBs point earlier - there will be a strong NH - O- interaction. You could fit a torsional barrier with a value of 6 or you could just change the 1-4 scaling factor - i.e make the H-bond stronger and make the residual down to zero. Nice thing about the scale factor is you don’t need to mess around with the actual parameters, can also perform a global optimisation on it as well.

    What these residuals are showing is, firstly, by getting the residuals you can see the barrier you would need to fit for the different fits. It simplifies and makes evident what you need to fit - a win for residuals!

  • (10) CB - this is the fit I’ve been wanting to see! Clearly some trend for the blue dots. Would like to see the chemistry of the points which are in agreement and those which deviate.

  • CB - are larger priors more restrictive

  • DM - yes

  • CB - definitely seems to be a trend as a function of changing the priors. Change in TIG5b going negative is strange - change in the phase?

  • HJ - priors greater than 3 don’t really make sense

  • DM - is this trained on steric including set?

  • PB - yes.

  • CB - is some chemistry driving the negative change? Different chemistries driving the parameter from cis to trans for e.g.?

  • HJ - don’t think the k1 parameter determines the cis-trans preference, just the degree of flatness.

  • CB - so where should we set the prior?

  •  

10 min

release decisions

@Simon Boothroyd

@David Mobley

When are we going to make a call on the Sage release? Are we optimistic we’re getting to where we need to be on WBOs so we should delay further, or push ahead without?

  • SB - would like to wait for the end of the month to make a call given the progress on WBO order stuff.

  • My ideal goal post would be to identify a chemical series which is definitively improved by WBO

  • CB - would be good to show definitive improvement on the series. then also show global improvement (or at least not global degradation).

  • CB - maybe should consider a refit of the 1-4 parameters for Sage?

  • SB - may need some infrastructure changes.

5 mins (if time)

data sets from scaffolds

@Simon Boothroyd

  • Punt until the next meeting.

Action items

@Simon Boothroyd@Trevor Gokey@Hyesu Jang will sync up about refitting the 1-4 parameters.
@Simon Boothroyd@Pavan Behara will sync about ELF10 WBO.

Decisions