2022-01-27 Protein FF meeting notes

 Date

Jan 27, 2022

 Participants

  • @Chapin Cavender

  • @Daniel Cole

  • @David Mobley

  • @Joshua Horton

  • @Matt Thompson

  • @Michael Gilson

  • @Simon Boothroyd

  • @Pavan Behara

  • @Jeffrey Wagner

 Goals

  • Sidechain constraints in TorsionDrives

  • Protein FF fitting strategy

  • Protein FF benchmark strategy

 Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Sidechain constraints in TorsionDrives

@Chapin Cavender

Protein FF fitting strategy

@Chapin Cavender

  • CC – Should we fit all valence terms simultaneously, or proper torsions first, then bonds and angles.

    • DM – Would it be significantly easier/harder to do the torsions at the same time as bonds and angles?

    • PB –

    • SB – With 300-400 workers, I was able to do a one-pot simultaneous fit. But it depends a lot on how fast the optimization converges. This would probably be greatly accelerated by resetting the iniital parameter values to a good guess using the modified seminario method. But overall this plan looks good to me.

  • CC – Should improper torsion optimization be part of this plan?

    • SB – I don’t think so.

    • JW – Maybe we should say “if refit impropers/refined improper types make it into Sage before we start fitting Rosemary”

    • SB + DM – Agree

  • JW – Also, for charges, CC lists library charges, but we’re hoping to have neural net charges, right?

    • CC – It would be nice to have neural net charges, but I need to define this plan as “something I can do without blockers”, so I’d like to also plan on library charges, but switch to net charges if they’re introduced into Sage beforehand.

    • SB – Agree

  • CC – QC data strategy/questions

    • DC – I seem to remember chi1 and chi2 being super important for opls-aa/m too

    • MG – We’d discussed scanning phi/psi in the context of different sidechain rotamers.

  • (winding discussion, see video starting around 20 minutes for details)

  • MG – Would bespoke fitting tooling help here?

    • CC – Bespoke fitting only fits to one dihedral at a time. right JH?

      • JH – Yes

      • CC – So it’s probably not suitable for most of what we want to do.

  • MG – Estimate of additional effort and time that this will add to our effort?

    • CC – These new sets will be about the same as the backbone scans, a couple weeks of compute.

    • PB – Regarding PEPCONF, we only have 50% of the dataset complete.

      • CC – I don’t feel too strongly about having every combination of two sidechains complete. But it’d be strange to say that we only used half of the input set due to compute constraints.

    • JW – PB, would PEPCONF proceed with more compute?

      • PB – Unlikely, looks like SCF convergence errors, though we could fiddle with the compute spec to make another push and we may get around it.

    • (General confusion about number of sidechains and terminology, general interest in having a clearer)

    • CC – I’ll refer to datasets as capped/uncapped N-mers, where N is the number of sidechains.

  • SB – I think this plan is good. You may be able to save time by pre-optimizing the conformers using a cheaper level of theory. Had you thought about doing that, or is that not worthwhile on a compute time/human time tradeoff?

    • CC – I preoptimize with Sage.

  •  

  •  

Protein FF benchmark strategy

@Chapin Cavender

  • CC – For chemical shifts, we should use shiftX - I anticipate that Evaluator could plug right in.

  • CC – For scalar couplings, we could do this by hand as a function of dihedral angle, but there are Karplus parameters that would need to be selected. There are some sets of these parameters that are widely used, and we could select them.

  • CC – NOEs are harder, could estimate things as a function of r^-6. But there’s more that can be done, and I’d like to leave this whole point as a secondary/optional goal.

  •  

  • MG – Is there a canonical water model that we’ll be using for these benchmarks?

    • CC – TIP3P seems to be the standard, but it’s shown that this makes things too compact. I think OPC or TIP4P-B? would be more accurate, but I don’t know what we’ll recommend. Could benchmark against several without endorsing a particular one.

    • MG – Agree with benchmarking against several. Will our metrics be able to tell us if we’re seeing the “too compact protein” issue? I don’t think it will.

    • DM (maybe mistranscribed)– I agree that it’s really important to test with different water models. If we see big differences we’ll need to go back ask how our condensed phase fitting affected that.

    • SB – Sage condensed phase fitting used TIP3P. So could do the condensed phase fitting to different water models.

    • CC – Yeah, I could switch out water models in fitting.

  • DC – Are there plans to do other benchmarks other than those on this slide? An OPLS paper did several benchmarks on increasing scale on their way up to a released FF.

    • CC – I don’t think that really applies to us - We state that we’re only fitting to QM data, so I don’t see us needing tons of iterations. Our goal is to get a non-inferior protein FF out, and we can work on improvements in further generations.

  • PB – So ff14SB and rosemary will be compared side-by-side in these benchmarks?

    • CC – We know we want to test multiple FFs, but I haven’t selected which ones we should use. Could do ff14SB, ff19, 99disp, etc. Currently undecided, can decide based on available resources at that stage.

    • PB – Just thinking that an existing FF can be used in the benchmarking infrastructure as a test.

  • JW – Timing for release? Not necessary to be super detailed.

    • CC – Hoping that we can generate the QC fitting data and run the fits in the next threeish months. Then an unknown amount of time for observable-based benchmarks.

    • SB – There’s a lot of wiggle room other than “release date”, can also say there’s an “alpha date”, “beta date”, etc. Also worth keeping in mind that other goings-on in the consortium will affect this.

 Action items

 Decisions