2023-01-26 Protein FF meeting note

Participants

@Chapin Cavender
@David Mobley
@Pavan Behara
@Jeffrey Wagner
@Michael Gilson
@Matt Thompson
@Lily Wang
@Trevor Gokey

Goals

Modeling amide/peptide torsions

Slides

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Modeling amide/peptide torsions	@Chapin Cavender	Slides will be attached PB: did you apply restraints on the adiabatic scan? CC: no, everything was allowed to relax aside from the central torsion ff14sb used as a reference as we know it performs well, and we have no QM data PB: how many periodicities are present in the torsion parameters in the force fields shown on slide 3? CC: don’t know wrt ff14sb – all the SMIRNOFF models have the same periodicities as in Sage. I think it’s a 3 and a 2, but not sure. There’s also an improper torsion here, in SMIRNOFF and ff14sb. Improper torsions were not changed in the protein fit – if the amplitude is different, that’s to small molecule data. MG: how come the protein models are better than Sage? CC: hopefully other torsions are modelling degrees of freedom better CC: charges are the same between all SMIRNOFF models MG: so something in the protein QM data must have improved the performance on the omega torsions CC: the torsions are coupled, so maybe modelling phi and psi better means we also model omega better MG: so the question is – are we good enough? CC: lays out three possible solutions DM: is generating QC data mostly effort from you, or computational effort? CC: mostly computational DM: could we start the QC data in the pipeline just in case we need it – we probably will eventually, even for small molecule force fields, while we work on other solutions? Is the NMR benchmarking working? CC: still in progress, nearly there DM: could you do solution #2 while you’re fixing that then? If points 2 and 3 are mostly independent of your personal time, you could start firing away while trying to get 1 done CC: agree – but is point 2 worth doing without point 3? CC: takes 10 days for re-fit DM: do we have any small molecule TorsionDrives that scan omega? CC: yes, think it’s already in the training set I’m using. Should already be in the parameter fits I’ve done MG: I thought the concern was that the small molecule training data doesn’t really have QC data for this amide bond? CC: there’s no full scan of the amide bond – there’s a motion out of plane to move to -+30. There’s nothing that moves from trans to cis PB: not sure if we have a full scan of amides. I’ll check MG: was this raised because industry noticed flips in the amide bonds? DM: more because our well depths aren’t accurate. We couldn’t figure out how to fix this wtihout doing more science on it. e.g. at the time we weren’t fitting impropers MG: if we have impropers operational now, how come Sage is getting those wrong and passing the errors onto the protein fit? DM: we haven’t fit impropers yet MG: do we need to fit impropers to fix this? DM: we still have a very minimal number of them and as I recall they still haven’t been fit MG: we may need more than any of the proposed solutions to get our torsions where we want them CC: I’m not confident these will fix the problem, it’s my intuition that these will help PB: I think you’re already starting from a baseline FF that has re-fit impropers. No improper scans used in fitting those – just optimised to geometries etc CC: I thought we had data with improper scans PB: DM: I agree with MG that there’s a science issue and we’re not sure if these will solve the problem MG: how does Amber get it right? CC: don’t think they did anything special for proteins there – it probably comes from inherited GAFF parameters PB: Do they have extra terms beyond SMIRNOFF terms? CC: don’t know off the top of my head – will check. MG: given that we don’t have confidence in the other solutions, IMO we should push ahead with the NMR studies and pursue others in the meantime. At least it looks better than Sage. We shouldn’t make a science problem a roadblock if we can avoid it DM: and set up QC data in the pipeline JW: we have basically all our QC compute available for work with any kind of priority CC: do we want to generate scans for peptides, or small molecule amides, and generalise those to proteins? DM: prioritise protein parameters you need for your MVP, if we have to we can inherit small molecule params from those PB: I think we do have scans for small molecule amides. It may be a typing issue MG: so do we need to include impropers as fittable terms? DM: unsure – we can work it out for proteins, and inherit the solution for small molecules MG: what’s the issue with impropers? Why are they the problem? PB: we’re not distinguishing between cis- and trans-configurations properly, and the barrier is pretty low CC: it’s just a hunch. This solution was proposed to do a re-fit without generating new data MG: so is there an issue with small-molecule omega amides? So were impropers not being fitted there? Were they fit in Sage? Why not? PB: No, they weren’t re-fit in Sage DM: because we have very few impropers that cover a lot of chemistry, so we were worried about messing with that. But when we had issues with torsions and studied those, we held impropers fixed, so now we’re wondering if fitting impropers would help MG: would it help to add another torsional term? DM: it’s possible MG: how many torsions does Amber have? CC: will check CC: agree with MG that if we’re going to generate new QC data, we can directly refit torsions and skip re-fitting impropers PB: can you do bespoke fits (using forcebalance) on single molecules and check how force constants vary from what’s in the protein specific model? High variance could suggest we need to split torsions for different periodicities/etc. You can also check if any angles are causing any issue, i.e. if torsions are not the cause CC: will do JW (in chat): Just want to point out that this [choosing a solution] is a “decision”, so we should remember that decision making authority is defined as “a majority of (Gilson, Shirts, Cavender).” So let’s make sure that we formally get at least two “yes”es if we think we’ve solved this Most notify M Shirts to make sure he doesn’t veto it CC: to summarise – will proceed with NMR benchmarks of the current model. Will also start generating 1D torsion scans that we can prioritise in QC pipeline. Will also run single-molecule fits against existing scan data to see if FB fits are giving us high variance in force constants etc., which could indicate we need to split torsions, or whether other valence parameters (not torsions) are an issue. Will try to keep impropers fixed due to concerns DM brought up earlier JW: agree, prefer adding proper periodicities than fitting impropers MG: full parameter re-fit to new QC data, or just some parts? CC: full parameter re-fit JW: looks cleaner to just do a full re-fit MG: what’s the starting guess? CC: takes ~10 days to re-fit. My original starting guess was what will become the 2.1 release. I added the protein LibraryCharges and symmetrising parameters PB: TG has a way of packaging the environment to make fits run faster than when they need to access the filesystem MG: let’s also run QC torsion scans to serve as reference instead of Amber, to verify that we actually have a problem TG: what’s the most challenging peptide? Did you use attenuation in the fits? CC: worst one was probably trialanine. TG: attenuation is based on QM energies, right PB: yes TG: so maybe the QM minimum is getting attenuated away, so that’s why we’re not fitting to it. I found that 5 kcal/mol is where we start getting too low, and missing barriers. For my double bond scans I used a value of ~40, would suggest a large number here. But turning attenuation off is really bad, so suggest keeping it on with a high barrier. PB (in chat): Lee-Ping used 5 kcal/mol for lower limit and 20 kcal/mol for upper limit beyond which weight is zero for FB15. https://pubs.acs.org/doi/10.1021/acs.jpcb.7b02320 TG: as you increase attenuation you make it harder to fit minima, so don’t go too crazy. Sage was fit from 1 to 5 kcal/mol. MG: could this be another solution? DM: for small molecules we mostly care about minimal or close to minima, but for protein fits we’re looking at much higher energy barriers. Yes, could be another solution CC: agree, could be an easy alternative to try TG: will try a bespoke fit to trialanine CC action items: Will do NMR on current model Will re-fit with new attenuation values Will generate new 1D QC scans, do a full re-fit to those when they’re ready MG: suggesting to retrain to a subset of data PB: a small re-fit could take less than a day MG: agree with this plan as a minimal plan to start with, but could you please write this down on Slack CC: will do, will tag MShirts as well TG: how do the geometries look when you compare? CC: look good except for amide torsion. I only have optimised geometries for trans omega.

Meetings

2023-01-26 Protein FF meeting note

Participants

Goals

Slides

Discussion topics

Action items

Decisions

Related content