2023-09-21 Protein FF meeting note

Participants

@Chapin Cavender
@Pavan Behara
@Alexandra McIsaac
@David Mobley
@Jeffrey Wagner
@Brent Westbrook (Unlicensed)
@Anika Friedman
@John Chodera
@Michael Shirts
@Michael Gilson
@Trevor Gokey

Goals

Update on QM refit targeting MM minima
Proposal on fitting to NMR observables by reweighting

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
QM refit targeting MM minima	@Chapin Cavender	Chapin will put slides here Previous FFs underestimate favorability of alpha helices, hypothesize that it’s caused by spurious minima on MM surface that aren’t captured in the QM training data Solution was to generate MM minima and refit Conclusion: refit not addressing the issue, suggests targeting NMR observables as alternate route JC: How wrong are the minima? CC: errors are on the order of -5 kcal/mol below the minimum MS: Hypothesis was that Amber was getting it right due to fitting to pairwise energy differences, but that was too expensive so we don’t want to pursue it? CC: Yes, goal is to do a pilot on a small molecule dataset to see if it helps, then expand to proteins. But this will take longer than the deadline for the R01 grant, so not prioritizing right now JC: How long does refit take? CC: Takes a full week to do the refit with force balance, can’t iterate on it quickly JC: Espaloma takes a day on large numbers of datapoints, potential limit with FB MG: Issue is number of optimizations, not number of data points JC: Maybe could try small number of opt steps to speed it up CC: Wants to eventually pursue all of these ideas, but need to have a working SMIRNOFF FF for R01 grant deadline, so wants to focus on NMR fitting to try to get results more quickly JC: Thinks we have enough for the grant from Espaloma work, not much time to make progress either way, may not be much time pressure MS: Agrees not enough time DM: PI’s can figure out what goes in grant but you should follow best path forward, not enough time to put this in the grant
Fitting to NMR observables by reweighting	@Chapin Cavender	Option 1: Protocol taken from this paper: https://pubs.acs.org/doi/10.1021/acs.jctc.9b00206 Alternative: using BICEPS package, but they don’t have a good way of estimating gradient of loss function, hard to score JC: (Option 1) seems like the right way to go, but is it better to do a one-off tool, or should we add capability to OpenFF Evaluator so it can be used again and be part of our regular infrastructure to be future-compatable MS: Evaluator calculates grad of loss with finite differences, may be better than evaluating with averages/correlations (d<O>/dki). Agrees would be better to do it in Evaluator to tie into existing workflow JC: Would also be valuable for other projects to have this in evaluator CC: Agrees in long term, was planning to write it as a one-off at first to see if it works before incorporating it into existing infrastructure DM: Idea was to do it outside first so that it doesn’t clog up infrastructure if we decide to pursue something else, that also has to be in evaluator, etc JW: Seems like it could go into Evaluator, needs to look, for a while new toolkit didn’t work with FB so didn’t make sense JC: Doesn’t think using NMR data wouldn’t work out, published by others and validated that it works elsewhere, doesn’t think integrating it is a risk MG: Not sure we want to hold up FF devo while we get it implemented into our existing infrastructure, but would probably be good to have eventually DM: Don’t have someone on infrastructure team with time to do this right now, either Chapin will do it now or it will be a while, should let Chapin decide how he prefers to move forward Some discussion about tradeoffs between QM path vs NMR path CC: Talked to Matt, not clear that Evaluator can take existing trajectory and operate on it, may require significant work MS: Should check with Simon about it CC: Concern: alpha hyperparameter is very important, current Evaluator doesn’t allow optimization, needs to be chosen with cross validation JC: How are you assessing the NMR observables? CC: Part of protein benchmark repo MS: How “bespoke” are we talking with the new parameters derived from NMR? Concern about torsion terms applying to other molecules in the NullFF where bespoke terms are also applied to small molecules CC: does hit some molecules in industry set, look like amino acids JC: many drug like molecules have amide backbones, might be a problem MS: wants force field to be as “null” as possible, but as long as the molecules that are hit are similar to amino acids, probably ok. Thinks we should add as few parameters as we need to, but how to figure out which/how many CC: currently adding 6 protein-specific torsions MS: seems fine as long as we can show that it doesn’t negatively impact small molecules CC: constructed so they don’t apply to terminal residues or single amino acids JC: Need to monitor degradation of old targets while fitting new targets CC: Not jointly fitting, but goal is to include that in the regularization for the loss function, to minimize departure of parameters Could also go back and do global opt CC: Will touch base with Matt again about evaluator, hopefully will have an update next meeting JC: Thinks this is a good direction, who is decision maker for this? DM: Chapin will evaluate how much effort will be required to derisk the idea in a one-off script versus implement in Evaluator and pursue whichever he thinks is best General agreement MS: In the past talked about running espaloma parameters through a check of whether they give correct secondary structure, won’t want just ligand binding energies. JC: No time to do that before the paper MS: Should go in grant JC: disagrees MS: Thinks referees will want evidence that alpha helices are stable/secondary structures are correct and stable MS: Run all Chapin’s tests on structure on espaloma DM: Doesn’t think it will be the reason grant is funded, and gives them something to criticize MS: Disagrees, thinks we need to include secondary structure data MG: Mixed feelings, if we try it and it doesn’t work what do we do, would we put bad results into grant? MS: Would want to know that, and have gotten dinged on this in the past JC: Espaloma paper has data that shows it performs better than OpenFF and Amber MS: Maybe Chapin can do a day of work to get that done, then Anika can run simulations JC: Maybe for paper but thinks not practical for grant timeline Conclusion that it should be discussed in separate proposal call Would it really be a day of Chapin’s time to do these benchmarks on espaloma? CC: Just have to do some refactoring, probably half a day MS: Sounds like getting that done is the #1 priority for moving this forward, can decide later once we see the results and if it finishes in time JC: is there consensus in the lit that these are the right benchmarks? CC: pre-print we could cite with important names to convince grant reviewers MS, MG, and Chapin agree to move forward on espaloma testing

Action items

@Chapin Cavender will work on espaloma benchmark testing

2023-09-21 Protein FF meeting note

Participants

Goals

Discussion topics

Action items

Decisions

Related content