2023-06-29 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @John Chodera

  • @Pavan Behara

  • @Anika Friedman

  • @Lily Wang

  • @Jeffrey Wagner

  • @Michael Gilson

  • @Michael Shirts

  • @Matt Thompson

Goals

  • Helical peptide benchmark with Amber RESP charges

  • Strategy for next iteration of parameter fitting

  • Migration of proteinbenchmark repo from Chapin’s GitHub to OpenFF GitHub

Recording

https://drive.google.com/file/d/1ErqYd7DcCBf3pdgq5V1wH8auLhdovml2/view?usp=sharing

Slides (new slides from 19)

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

NIH grant feedback

@Michael Shirts

  • Was a lot of worry that it is not possible to make a transferable force field, and that we haven’t got a protein force field. Thus, important to figure out a plan that is likely to get us to a useful, validated, protein FF by November, including figuring out what other resources want to bring in to increase chance of arriving at the endpoint in time.

    • We should discuss the structure of the planned paper(s) so we can line up resources to do various jobs (for example, protein-ligand binding test sets).

  • One major feedback was that the reviewers really wanted to see a protein force field (especially protein / small molecule self-consistent force field). If we could produce a Rosemary validated release candidate with protein:ligand binding free energies by mid-Oct 2023, it would be extremely helpful. We would need to have a sense of what data we would have available some time before that.

    • It would need to pass basic checks

    • Who would run protein:ligand binding free energies

    • At what point can we decide that Rosemary isn’t going to produce a release candidate by mid-Oct

  • Jeff: Alchemiscale team can turn around protein:ligand binding free energies in a few weeks. Cannot do this with virtual sites in ligand part of force field.

  • Lily: We might move virtual sites into a release earlier than Rosemary, but we currently aren’t planning virtual sites.

  • John: We may not need Rosemary for the grant proposal revision.

  • Michael Shirts: We’re trying to find out what the current Rosemary timeline is so we can decide how to revise the grant.

  • Michael Gilson: Are there ways we can maximize the chances of getting the studies we want done in time?

  • Chapin: My goal is to provide workflow for parameterizing proteins. This workflow will be provided back to Lily to parameterize proteins. If Lily’s small molecule force field has virtual sites at that point, everything will have virtual sites.

  • Michael Shirts: Can we make a Sage 2.3 force field that includes protein coverage if Rosemary is going to have virtual sites? If Chapin has a working force field parameterization workflow now, why aren’t we building a force field now?

  • Chapin: Currently building on intermediate between Sage 2.0 and 2.1. We need to do at least some refitting of torsions to correct for helix/sheet balance issues. This would take ~2 weeks. Peptide benchmarks would take ~1 day. If ACCESS proposal goes through, 1-2 weeks for 4 folded proteins benchmark. This is the threshold for handing off to Lily. Lily would start from Sage 2.1 and refit for ~2 weeks. Benchmarks on folded proteins ~1 month. Then we pass off to another team for protein:ligand benchmarks and unfolded protein benchmarks. RC would be 8-12 weeks from now, then ~1 month for benchmarks.

  • John: We don’t have to release a force field for the purposes of the proposal revision. We just need some kind of benchmarks of even an experimental version (espaloma or rosemary).

  • Michael Gilson: We can try protein:ligand calculations of any kind of intermediate rosemary force field version.

  • John: Right now, we don’t have OpenFE free energy workflow where entire system (protein, ligand, solvent) is parameterized using OpenFF offxml file via OpenFF PDB reader---it uses openmmforcefields (which just supports OpenFF small molecules) right now. We have to add support for complete OpenFF PDB → Topology → OpenFF ForceField → OpenMM System path.

  • Michael Shirts: If we don’t need a release, we can check in with Chapin in ~2 weeks and see if helix/coil equilibrium issue is solved. If so, we can action free energy calculations.

  • Chapin: We can pick an experimental version mid-Sep to benchmark even if the issue is not solved.

  •  

Helical peptide benchmark

@Chapin Cavender

  • Slides will be uploaded

  • JC – are these results using the Sage re-fit LJ parameters? Has it changed 1-4s?

    • CC – yes, using the re-fit LJs. I’ve got some runs queued up with the LJs switched out for AMBER’s.

    • JC – most of the data used in the Sage re-fit were small molecules without a lot of 1-4s

    • MS – so the possibility is that the original ones were better

    • JC – this is one of the limitations of using a fixed 1-4 scaling

    • MS – but using AMBER 1-4s might not help if Sage torsions were fit to our 1-4s

    • MG – want to see if we still see the same kind of results with transferred 1-4 types. That would free us up to focus on torsions.

  • MG – if we focus on comparison to experiment instead of AMBER, we mis-estimate both alpha and beta? We may need to focus on both. Potentially the FF is generating wacky minima where it shouldn’t be

 

Parameter fitting strategy

@Chapin Cavender

  • Single point QM energies to eliminate spurious MM minima

  • Fitting torsion corrections to peptide scalar couplings

Proposed next steps:

  • follow LPW’s suggested strategy

    • add additional ab initio targets … (~43 min in)

    • seems like he was seeing similar behaviour with FB-15 and this seems to be the change that improved behaviour

  • requires ~1000 single point energies, which is a moderate computational expense

  • Only a second stage refit would probably take on the order of days instead of weeks

  • JW – potential problem is that we cannot submit datasets to QCA right now

    • CC – can I just run calculations locally and upload to QCA later

    • JW – yes, could even re-run them later

    • JC – JH has done this before, could be good to contact

Another approach proposal

  • empirical NMR as targets. We could fit to these to bring torsions in line to fit condensed phase targets. Max entropy reweighting approach has been used before by Lindorff-Larsen, Bussi, etc. (~47 min or so)

    • CC/JC discussion – this would be too expensive to add to evaluator, idea is to make a one-off correction and apply it to torsions

    • MS – also if you want to try a whole bunch of torsions, you could take configurations, calculate the energy difference of the new torsions, and see how they change the weights

    • CC – that’s basically my proposal, but to use max entropy to decide how to do the reweighting

    • MS – but you can avoid re-running simulations and calculate dE

    • CC – yes

    • MS – this is the approach used by at least one AMBER version to get the torsions right

    • CC – AMBER 14SB targeting ala5 scalar couplings

    • MG – why isn’t this plan A – we already have the data we want to fit and jump to that. I guess it’s a philosophical approach

    • MS – yes, would be good to see if we can get the correct physical property behaviour from fitting to QM data. Although the answer is no

    • JC – if the reweighting approach is so successful here, why can’t we incorporate it into the PropertyCalculator

    • MS – we can only re-calculate our conformations, we can’t generate new conformations

    • JC – concerned that this is yet another direction we are pursuing

    • CC – agree. This is the fast route to getting that to work. However, I agree that if this is the long-term good solution then we should incorporate it into Evaluator

    • MS – yes, we don’t know if it works yet

    • CC – this is the pilot study

  • JC – posted papers with suggested third approach if first two don’t work

 

Github repo migration

@Chapin Cavender

  • Repo is here:

  • Questions for migration

    • Are there minimum standards for documentation/testing?

    • Protocol for code changes

  • JW – 3 options:

    • no one owns it

    • Chapin owns it

    • someone else in the infra team owns it. Not my preferred option

  • CC – I’m happy to own it

  • MT – is this a package that other packages will depend on?

    • CC – yeah probably, analogous to the sage-2.1.0 etc repos. It’ll be attached to a publication if anyone wants to reproduce our benchmarks

    • JW – agree, will try to make it nice for release but won’t maintain it

  • CC will be good for future work for direct comparisons

  • CC – I have separate repos for fitting and benchmarking, should I bring them both in?

  • JW – we need a release tarball and release entry, that tells people how to reproduce things. It’s fine if they’re in separate repos, but we need it to be coherent

  •  

Benchmarking espaloma

@John Chodera

What changes are required to proteinbenchmark to support espaloma?

  • CC – as long as we can make an interchange object from espaloma, that can slot into the rest of the workflow. It’s very modular and Anika has already picked it up.

  • JC – is this something we can easily run ourselves on the cluster?

  • CC – should be pretty agnostic

    • AF – I’ve used it on the bridges2 cluster, so it’s worked on multiple platforms

    • CC – as long as you can run a gmx simulation it should be good to go

  • JC – I will open an issue asking if everything is or can be moved to one place, where FF file goes in and Interchange object goes out

  •  

Action items

Decisions