2023-08-24 Protein FF meeting note

Participants

Goals

Update on MM minima refits

Discussion topics

Item

Presenter

Notes

MM minima refits

Chapin Cavender

Slides will be uploaded
Presentation starts from slide 27/28
Slide 30:
- Taking TD data, re-optimising, scoring with ff14SB. Question is if AMBER generates minima lower than QM method
- MG – this looks a lot better
- CC – nowhere here does the FF score a minimum lower than the QM. No spurious minima
- DLM – could be good to add a line of unity to make it easier to figure out which points are below the diagonal
- DLM – this looks like really good news in the sense that it indicates that if we can fix our FFs to look like the AMBER plot here, then we could fix our issues
- CC – agree
- MG – CC has pointed out to me that AMBER was partly fitted to the QM energy differences between points on the Ramachandran map. I feel like doing everything pairwise could help here.
- CC – concerns that fitting to energy differences is not done for the small molecule torsions. Suggest prioritising a small molecule study here.
- CC – to restate the problem, currently in fitting torsion profiles, we reference QM and MM energies to the QM structure. The alternative suggested by Simmerling is all pairwise differences between energies.
- MS – not sure which conformers they select for those
- DLM – maybe our method over-weights that single QM conformer
- LW – is having a small molecule study a blocker to moving forward with the protein FF re-fit?
- CC – basically if what I’m currently doing doesn’t work, having this pilot study would help me figure out what to do next
- MS – while a pilot study would be helpful, we could also move forward with this and see if the proteins improve and do the small molecules later. That means we don’t need to have the small molecule study be a blocker
- MG – if we use the specific instead of null model, we don’t have to apply the methods back to the small molecule torsions
- CC – My general concern with MG’s suggestion is that it’ll be difficult to get this to run, so figuring out hyperparameters etc with small molecules would be faster and more efficient than seeing if it helps proteins
- MG – a fallback position is to fit to the NMR data. Is this method consistent with OpenFF philosophy instead of improving our fit to the QM?
- MS – fitting to NMR data involves significant new infrastructure
- MG – unless we just ad-hoc retune a couple torsions
- CC – the pairwise energy differences would also require updates to ForceBalance
- MG – but there’s a chance we can still get this to work using LPW’s method with the reweights
- DLM – we should take care we can get a graph like this (slide 30) to ensure that the effort we invest into fitting to NMR data is worth it
- DLM – to restate, given the relative effort level of changing how we fit to QM data vs NMR data, would we want to fit to NMR data before making sure we can end up with a plot that looks like Slide 30?
MG – why is the answer to reduce the weighting of the MM minima target if we’re already moving in the wrong direction?
- MG – what if we put more emphasis on lower energy points?
- CC – we’re already doing that. keyword energy_asymmetry . Within that target, if the MM energy has a lower energy than the QM, it gets boosted by 100.
- CC – to be clear, the cycle-2 points are not the same as the original cycle-1 points. Training to the cycle-1 minima does reduce generation of new spurious minima.
- DLM – maybe we’re putting too much emphasis on the lowest energy QM geometry. Maybe that messes things up
- MG – what if you have multiple references? The lowest QM conformer, alpha, beta, … on the Ramachandran plot. I think this wouldn’t require re-coding.
- CC – are you suggesting to copy the training data and reference it to each point in the copy. If we do it for all the points we would be approximating the pairwise interactions, although we would be doubling up. We could also pick a smaller number of points spaced out in energy
- MG – how costly would this be in computer time?
  - CC – I think there’s a way to implement this that’s not costly, but the naive implementation would scale linearly with number of reference points
  - PB – I’ve done similar work before:
  - CC – The slow part is doing the MMs, the computation of pairwise differences is negligible
- DLM – on the one hand it seems like fitting to NMR data may be the only way to end up where we want to end up, on the other hand it’s a lot of work. This is one where you should think carefully about what makes most sense to you
PB – did you try with AbInitio targets?
- CC – currently running, not converged yet
MG – why are 2D TDs so expensive?
- CC – more points
PB – volunteering to do small molecule study and sync up with Chapin on a selection of small molecules
- CC – if it works out, I’ll copy over the hyperparameters and see how it works with the protein FF
CC – in the meantime, I’ll look into the NMR targets

Participants

Goals

Discussion topics

Action items

Decisions