2023-06-29 Protein FF meeting note

Participants

@Chapin Cavender
@John Chodera
@Pavan Behara
@Anika Friedman
@Lily Wang
@Jeffrey Wagner
@Michael Gilson
@Michael Shirts
@Matt Thompson

Goals

Helical peptide benchmark with Amber RESP charges
Strategy for next iteration of parameter fitting
Migration of proteinbenchmark repo from Chapin’s GitHub to OpenFF GitHub

Recording

https://drive.google.com/file/d/1ErqYd7DcCBf3pdgq5V1wH8auLhdovml2/view?usp=sharing

Slides (new slides from 19)

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
NIH grant feedback	@Michael Shirts	Was a lot of worry that it is not possible to make a transferable force field, and that we haven’t got a protein force field. Thus, important to figure out a plan that is likely to get us to a useful, validated, protein FF by November, including figuring out what other resources want to bring in to increase chance of arriving at the endpoint in time. We should discuss the structure of the planned paper(s) so we can line up resources to do various jobs (for example, protein-ligand binding test sets). One major feedback was that the reviewers really wanted to see a protein force field (especially protein / small molecule self-consistent force field). If we could produce a Rosemary validated release candidate with protein:ligand binding free energies by mid-Oct 2023, it would be extremely helpful. We would need to have a sense of what data we would have available some time before that. It would need to pass basic checks Who would run protein:ligand binding free energies At what point can we decide that Rosemary isn’t going to produce a release candidate by mid-Oct Jeff: Alchemiscale team can turn around protein:ligand binding free energies in a few weeks. Cannot do this with virtual sites in ligand part of force field. Lily: We might move virtual sites into a release earlier than Rosemary, but we currently aren’t planning virtual sites. John: We may not need Rosemary for the grant proposal revision. Michael Shirts: We’re trying to find out what the current Rosemary timeline is so we can decide how to revise the grant. Michael Gilson: Are there ways we can maximize the chances of getting the studies we want done in time? Chapin: My goal is to provide workflow for parameterizing proteins. This workflow will be provided back to Lily to parameterize proteins. If Lily’s small molecule force field has virtual sites at that point, everything will have virtual sites. Michael Shirts: Can we make a Sage 2.3 force field that includes protein coverage if Rosemary is going to have virtual sites? If Chapin has a working force field parameterization workflow now, why aren’t we building a force field now? Chapin: Currently building on intermediate between Sage 2.0 and 2.1. We need to do at least some refitting of torsions to correct for helix/sheet balance issues. This would take ~2 weeks. Peptide benchmarks would take ~1 day. If ACCESS proposal goes through, 1-2 weeks for 4 folded proteins benchmark. This is the threshold for handing off to Lily. Lily would start from Sage 2.1 and refit for ~2 weeks. Benchmarks on folded proteins ~1 month. Then we pass off to another team for protein:ligand benchmarks and unfolded protein benchmarks. RC would be 8-12 weeks from now, then ~1 month for benchmarks. John: We don’t have to release a force field for the purposes of the proposal revision. We just need some kind of benchmarks of even an experimental version (espaloma or rosemary). Michael Gilson: We can try protein:ligand calculations of any kind of intermediate rosemary force field version. John: Right now, we don’t have OpenFE free energy workflow where entire system (protein, ligand, solvent) is parameterized using OpenFF offxml file via OpenFF PDB reader---it uses openmmforcefields (which just supports OpenFF small molecules) right now. We have to add support for complete OpenFF PDB → Topology → OpenFF ForceField → OpenMM System path. Michael Shirts: If we don’t need a release, we can check in with Chapin in ~2 weeks and see if helix/coil equilibrium issue is solved. If so, we can action free energy calculations. Chapin: We can pick an experimental version mid-Sep to benchmark even if the issue is not solved.
Helical peptide benchmark	@Chapin Cavender	Slides will be uploaded JC – are these results using the Sage re-fit LJ parameters? Has it changed 1-4s? CC – yes, using the re-fit LJs. I’ve got some runs queued up with the LJs switched out for AMBER’s. JC – most of the data used in the Sage re-fit were small molecules without a lot of 1-4s MS – so the possibility is that the original ones were better JC – this is one of the limitations of using a fixed 1-4 scaling MS – but using AMBER 1-4s might not help if Sage torsions were fit to our 1-4s MG – want to see if we still see the same kind of results with transferred 1-4 types. That would free us up to focus on torsions. MG – if we focus on comparison to experiment instead of AMBER, we mis-estimate both alpha and beta? We may need to focus on both. Potentially the FF is generating wacky minima where it shouldn’t be
Parameter fitting strategy	@Chapin Cavender	Single point QM energies to eliminate spurious MM minima Fitting torsion corrections to peptide scalar couplings Proposed next steps: follow LPW’s suggested strategy add additional ab initio targets … (~43 min in) seems like he was seeing similar behaviour with FB-15 and this seems to be the change that improved behaviour requires ~1000 single point energies, which is a moderate computational expense Only a second stage refit would probably take on the order of days instead of weeks JW – potential problem is that we cannot submit datasets to QCA right now CC – can I just run calculations locally and upload to QCA later JW – yes, could even re-run them later JC – JH has done this before, could be good to contact Another approach proposal empirical NMR as targets. We could fit to these to bring torsions in line to fit condensed phase targets. Max entropy reweighting approach has been used before by Lindorff-Larsen, Bussi, etc. (~47 min or so) CC/JC discussion – this would be too expensive to add to evaluator, idea is to make a one-off correction and apply it to torsions MS – also if you want to try a whole bunch of torsions, you could take configurations, calculate the energy difference of the new torsions, and see how they change the weights CC – that’s basically my proposal, but to use max entropy to decide how to do the reweighting MS – but you can avoid re-running simulations and calculate dE CC – yes MS – this is the approach used by at least one AMBER version to get the torsions right CC – AMBER 14SB targeting ala5 scalar couplings MG – why isn’t this plan A – we already have the data we want to fit and jump to that. I guess it’s a philosophical approach MS – yes, would be good to see if we can get the correct physical property behaviour from fitting to QM data. Although the answer is no JC – if the reweighting approach is so successful here, why can’t we incorporate it into the PropertyCalculator MS – we can only re-calculate our conformations, we can’t generate new conformations JC – concerned that this is yet another direction we are pursuing CC – agree. This is the fast route to getting that to work. However, I agree that if this is the long-term good solution then we should incorporate it into Evaluator MS – yes, we don’t know if it works yet CC – this is the pilot study JC – posted papers with suggested third approach if first two don’t work
Github repo migration	@Chapin Cavender	Repo is here: Questions for migration Are there minimum standards for documentation/testing? Protocol for code changes JW – 3 options: no one owns it Chapin owns it someone else in the infra team owns it. Not my preferred option CC – I’m happy to own it MT – is this a package that other packages will depend on? CC – yeah probably, analogous to the sage-2.1.0 etc repos. It’ll be attached to a publication if anyone wants to reproduce our benchmarks JW – agree, will try to make it nice for release but won’t maintain it CC will be good for future work for direct comparisons CC – I have separate repos for fitting and benchmarking, should I bring them both in? JW – we need a release tarball and release entry, that tells people how to reproduce things. It’s fine if they’re in separate repos, but we need it to be coherent
Benchmarking espaloma	@John Chodera	What changes are required to proteinbenchmark to support espaloma? CC – as long as we can make an interchange object from espaloma, that can slot into the rest of the workflow. It’s very modular and Anika has already picked it up. JC – is this something we can easily run ourselves on the cluster? CC – should be pretty agnostic AF – I’ve used it on the bridges2 cluster, so it’s worked on multiple platforms CC – as long as you can run a gmx simulation it should be good to go JC – I will open an issue asking if everything is or can be moved to one place, where FF file goes in and Interchange object goes out

Meetings

2023-06-29 Protein FF meeting note

Participants

Goals

Recording

Slides (new slides from 19)

Discussion topics

Action items

Decisions