2023-09-21 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @Alexandra McIsaac

  • @David Mobley

  • @Jeffrey Wagner

  • @Brent Westbrook

  • @Anika Friedman

  • @John Chodera

  • @Michael Shirts

  • @Michael Gilson

  • @Trevor Gokey

Goals

  • Update on QM refit targeting MM minima

  • Proposal on fitting to NMR observables by reweighting

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

QM refit targeting MM minima

@Chapin Cavender

  • Chapin will put slides here

  • Previous FFs underestimate favorability of alpha helices, hypothesize that it’s caused by spurious minima on MM surface that aren’t captured in the QM training data

  • Solution was to generate MM minima and refit

  • Conclusion: refit not addressing the issue, suggests targeting NMR observables as alternate route

  • JC: How wrong are the minima?

    • CC: errors are on the order of -5 kcal/mol below the minimum

  • MS: Hypothesis was that Amber was getting it right due to fitting to pairwise energy differences, but that was too expensive so we don’t want to pursue it?

    • CC: Yes, goal is to do a pilot on a small molecule dataset to see if it helps, then expand to proteins. But this will take longer than the deadline for the R01 grant, so not prioritizing right now

  • JC: How long does refit take?

    • CC: Takes a full week to do the refit with force balance, can’t iterate on it quickly

    • JC: Espaloma takes a day on large numbers of datapoints, potential limit with FB

    • MG: Issue is number of optimizations, not number of data points

    • JC: Maybe could try small number of opt steps to speed it up

  • CC: Wants to eventually pursue all of these ideas, but need to have a working SMIRNOFF FF for R01 grant deadline, so wants to focus on NMR fitting to try to get results more quickly

    • JC: Thinks we have enough for the grant from Espaloma work, not much time to make progress either way, may not be much time pressure

    • MS: Agrees not enough time

    • DM: PI’s can figure out what goes in grant but you should follow best path forward, not enough time to put this in the grant

Fitting to NMR observables by reweighting

@Chapin Cavender

  • Option 1: Protocol taken from this paper: https://pubs.acs.org/doi/10.1021/acs.jctc.9b00206

  • Alternative: using BICEPS package, but they don’t have a good way of estimating gradient of loss function, hard to score

  • JC: (Option 1) seems like the right way to go, but is it better to do a one-off tool, or should we add capability to OpenFF Evaluator so it can be used again and be part of our regular infrastructure to be future-compatable

    • MS: Evaluator calculates grad of loss with finite differences, may be better than evaluating with averages/correlations (d<O>/dki). Agrees would be better to do it in Evaluator to tie into existing workflow

    • JC: Would also be valuable for other projects to have this in evaluator

    • CC: Agrees in long term, was planning to write it as a one-off at first to see if it works before incorporating it into existing infrastructure

    • DM: Idea was to do it outside first so that it doesn’t clog up infrastructure if we decide to pursue something else, that also has to be in evaluator, etc

    • JW: Seems like it could go into Evaluator, needs to look, for a while new toolkit didn’t work with FB so didn’t make sense

    • JC: Doesn’t think using NMR data wouldn’t work out, published by others and validated that it works elsewhere, doesn’t think integrating it is a risk

    • MG: Not sure we want to hold up FF devo while we get it implemented into our existing infrastructure, but would probably be good to have eventually

    • DM: Don’t have someone on infrastructure team with time to do this right now, either Chapin will do it now or it will be a while, should let Chapin decide how he prefers to move forward

    • Some discussion about tradeoffs between QM path vs NMR path

    • CC: Talked to Matt, not clear that Evaluator can take existing trajectory and operate on it, may require significant work

    • MS: Should check with Simon about it

    • CC: Concern: alpha hyperparameter is very important, current Evaluator doesn’t allow optimization, needs to be chosen with cross validation

  • JC: How are you assessing the NMR observables?

    • CC: Part of protein benchmark repo

  • MS: How “bespoke” are we talking with the new parameters derived from NMR? Concern about torsion terms applying to other molecules in the NullFF where bespoke terms are also applied to small molecules

    • CC: does hit some molecules in industry set, look like amino acids

    • JC: many drug like molecules have amide backbones, might be a problem

    • MS: wants force field to be as “null” as possible, but as long as the molecules that are hit are similar to amino acids, probably ok. Thinks we should add as few parameters as we need to, but how to figure out which/how many

      • CC: currently adding 6 protein-specific torsions

      • MS: seems fine as long as we can show that it doesn’t negatively impact small molecules

      • CC: constructed so they don’t apply to terminal residues or single amino acids

  • JC: Need to monitor degradation of old targets while fitting new targets

    • CC: Not jointly fitting, but goal is to include that in the regularization for the loss function, to minimize departure of parameters

    • Could also go back and do global opt

  • CC: Will touch base with Matt again about evaluator, hopefully will have an update next meeting

  • JC: Thinks this is a good direction, who is decision maker for this?

    • DM: Chapin will evaluate how much effort will be required to derisk the idea in a one-off script versus implement in Evaluator and pursue whichever he thinks is best

    • General agreement

  • MS: In the past talked about running espaloma parameters through a check of whether they give correct secondary structure, won’t want just ligand binding energies.

    • JC: No time to do that before the paper

    • MS: Should go in grant

    • JC: disagrees

    • MS: Thinks referees will want evidence that alpha helices are stable/secondary structures are correct and stable

    • MS: Run all Chapin’s tests on structure on espaloma

    • DM: Doesn’t think it will be the reason grant is funded, and gives them something to criticize

    • MS: Disagrees, thinks we need to include secondary structure data

    • MG: Mixed feelings, if we try it and it doesn’t work what do we do, would we put bad results into grant?

    • MS: Would want to know that, and have gotten dinged on this in the past

    • JC: Espaloma paper has data that shows it performs better than OpenFF and Amber

    • MS: Maybe Chapin can do a day of work to get that done, then Anika can run simulations

    • JC: Maybe for paper but thinks not practical for grant timeline

    • Conclusion that it should be discussed in separate proposal call

    • Would it really be a day of Chapin’s time to do these benchmarks on espaloma?

      • CC: Just have to do some refactoring, probably half a day

      • MS: Sounds like getting that done is the #1 priority for moving this forward, can decide later once we see the results and if it finishes in time

      • JC: is there consensus in the lit that these are the right benchmarks?

        • CC: pre-print we could cite with important names to convince grant reviewers

  • MS, MG, and Chapin agree to move forward on espaloma testing

Action items

@Chapin Cavender will work on espaloma benchmark testing

Decisions