2023-01-26 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @David Mobley

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Michael Gilson

  • @Matt Thompson

  • @Lily Wang

  • @Trevor Gokey

Goals

  • Modeling amide/peptide torsions

Slides

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Modeling amide/peptide torsions

@Chapin Cavender

  • Slides will be attached

  • PB: did you apply restraints on the adiabatic scan?

    • CC: no, everything was allowed to relax aside from the central torsion

  • ff14sb used as a reference as we know it performs well, and we have no QM data

  • PB: how many periodicities are present in the torsion parameters in the force fields shown on slide 3?

    • CC: don’t know wrt ff14sb – all the SMIRNOFF models have the same periodicities as in Sage. I think it’s a 3 and a 2, but not sure. There’s also an improper torsion here, in SMIRNOFF and ff14sb. Improper torsions were not changed in the protein fit – if the amplitude is different, that’s to small molecule data.

    • MG: how come the protein models are better than Sage?

    • CC: hopefully other torsions are modelling degrees of freedom better

    • CC: charges are the same between all SMIRNOFF models

    • MG: so something in the protein QM data must have improved the performance on the omega torsions

    • CC: the torsions are coupled, so maybe modelling phi and psi better means we also model omega better

    • MG: so the question is – are we good enough?

  • CC: lays out three possible solutions

    • DM: is generating QC data mostly effort from you, or computational effort?

    • CC: mostly computational

    • DM: could we start the QC data in the pipeline just in case we need it – we probably will eventually, even for small molecule force fields, while we work on other solutions? Is the NMR benchmarking working?

    • CC: still in progress, nearly there

    • DM: could you do solution #2 while you’re fixing that then? If points 2 and 3 are mostly independent of your personal time, you could start firing away while trying to get 1 done

    • CC: agree – but is point 2 worth doing without point 3?

    • CC: takes 10 days for re-fit

    • DM: do we have any small molecule TorsionDrives that scan omega?

    • CC: yes, think it’s already in the training set I’m using. Should already be in the parameter fits I’ve done

    • MG: I thought the concern was that the small molecule training data doesn’t really have QC data for this amide bond?

    • CC: there’s no full scan of the amide bond – there’s a motion out of plane to move to -+30. There’s nothing that moves from trans to cis

    • PB: not sure if we have a full scan of amides. I’ll check

    • MG: was this raised because industry noticed flips in the amide bonds?

    • DM: more because our well depths aren’t accurate. We couldn’t figure out how to fix this wtihout doing more science on it. e.g. at the time we weren’t fitting impropers

    • MG: if we have impropers operational now, how come Sage is getting those wrong and passing the errors onto the protein fit?

    • DM: we haven’t fit impropers yet

    • MG: do we need to fit impropers to fix this?

    • DM: we still have a very minimal number of them and as I recall they still haven’t been fit

    • MG: we may need more than any of the proposed solutions to get our torsions where we want them

    • CC: I’m not confident these will fix the problem, it’s my intuition that these will help

    • PB: I think you’re already starting from a baseline FF that has re-fit impropers. No improper scans used in fitting those – just optimised to geometries etc

    • CC: I thought we had data with improper scans

      • PB:

    • DM: I agree with MG that there’s a science issue and we’re not sure if these will solve the problem

    • MG: how does Amber get it right?

    • CC: don’t think they did anything special for proteins there – it probably comes from inherited GAFF parameters

    • PB: Do they have extra terms beyond SMIRNOFF terms?

    • CC: don’t know off the top of my head – will check.

    • MG: given that we don’t have confidence in the other solutions, IMO we should push ahead with the NMR studies and pursue others in the meantime. At least it looks better than Sage. We shouldn’t make a science problem a roadblock if we can avoid it

    • DM: and set up QC data in the pipeline

    • JW: we have basically all our QC compute available for work with any kind of priority

    • CC: do we want to generate scans for peptides, or small molecule amides, and generalise those to proteins?

    • DM: prioritise protein parameters you need for your MVP, if we have to we can inherit small molecule params from those

    • PB: I think we do have scans for small molecule amides. It may be a typing issue

    • MG: so do we need to include impropers as fittable terms?

    • DM: unsure – we can work it out for proteins, and inherit the solution for small molecules

    • MG: what’s the issue with impropers? Why are they the problem?

    • PB: we’re not distinguishing between cis- and trans-configurations properly, and the barrier is pretty low

    • CC: it’s just a hunch. This solution was proposed to do a re-fit without generating new data

    • MG: so is there an issue with small-molecule omega amides? So were impropers not being fitted there? Were they fit in Sage? Why not?

    • PB: No, they weren’t re-fit in Sage

    • DM: because we have very few impropers that cover a lot of chemistry, so we were worried about messing with that. But when we had issues with torsions and studied those, we held impropers fixed, so now we’re wondering if fitting impropers would help

    • MG: would it help to add another torsional term?

    • DM: it’s possible

    • MG: how many torsions does Amber have?

    • CC: will check

    • CC: agree with MG that if we’re going to generate new QC data, we can directly refit torsions and skip re-fitting impropers

    • PB: can you do bespoke fits (using forcebalance) on single molecules and check how force constants vary from what’s in the protein specific model? High variance could suggest we need to split torsions for different periodicities/etc. You can also check if any angles are causing any issue, i.e. if torsions are not the cause

    • CC: will do

  • JW (in chat): Just want to point out that this [choosing a solution] is a “decision”, so we should remember that decision making authority is defined as “a majority of (Gilson, Shirts, Cavender).” So let’s make sure that we formally get at least two “yes”es if we think we’ve solved this

    • Most notify M Shirts to make sure he doesn’t veto it

    • CC: to summarise – will proceed with NMR benchmarks of the current model. Will also start generating 1D torsion scans that we can prioritise in QC pipeline. Will also run single-molecule fits against existing scan data to see if FB fits are giving us high variance in force constants etc., which could indicate we need to split torsions, or whether other valence parameters (not torsions) are an issue. Will try to keep impropers fixed due to concerns DM brought up earlier

      • JW: agree, prefer adding proper periodicities than fitting impropers

      • MG: full parameter re-fit to new QC data, or just some parts?

      • CC: full parameter re-fit

      • JW: looks cleaner to just do a full re-fit

      • MG: what’s the starting guess?

      • CC: takes ~10 days to re-fit. My original starting guess was what will become the 2.1 release. I added the protein LibraryCharges and symmetrising parameters

      • PB: TG has a way of packaging the environment to make fits run faster than when they need to access the filesystem

      •  

    • MG: let’s also run QC torsion scans to serve as reference instead of Amber, to verify that we actually have a problem

    • TG: what’s the most challenging peptide? Did you use attenuation in the fits?

    • CC: worst one was probably trialanine.

    • TG: attenuation is based on QM energies, right

    • PB: yes

    • TG: so maybe the QM minimum is getting attenuated away, so that’s why we’re not fitting to it. I found that 5 kcal/mol is where we start getting too low, and missing barriers. For my double bond scans I used a value of ~40, would suggest a large number here. But turning attenuation off is really bad, so suggest keeping it on with a high barrier.

      • PB (in chat): Lee-Ping used 5 kcal/mol for lower limit and 20 kcal/mol for upper limit beyond which weight is zero for FB15. https://pubs.acs.org/doi/10.1021/acs.jpcb.7b02320

      • TG: as you increase attenuation you make it harder to fit minima, so don’t go too crazy. Sage was fit from 1 to 5 kcal/mol.

      • MG: could this be another solution?

      • DM: for small molecules we mostly care about minimal or close to minima, but for protein fits we’re looking at much higher energy barriers. Yes, could be another solution

      • CC: agree, could be an easy alternative to try

    • TG: will try a bespoke fit to trialanine

  • CC action items:

    • Will do NMR on current model

    • Will re-fit with new attenuation values

    • Will generate new 1D QC scans, do a full re-fit to those when they’re ready

    • MG: suggesting to retrain to a subset of data

      • PB: a small re-fit could take less than a day

    • MG: agree with this plan as a minimal plan to start with, but could you please write this down on Slack

      • CC: will do, will tag MShirts as well

  • TG: how do the geometries look when you compare?

    • CC: look good except for amide torsion. I only have optimised geometries for trans omega.

 

 

 

Action items

Decisions