2022-03-10 Protein FF meeting note

Participants

@Simon Boothroyd
@Chapin Cavender
@Michael Gilson
Robert Raddi
@Michael Shirts
@Jeffrey Wagner

Goals

Plans for Rosemary benchmark
- Small molecules
- Proteins

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Review fitting strategy	@Chapin Cavender	Charges ELF-10 Library charges for water (TIP3P) and 26 canonical amino acids (AM1BCC-ELF10) MS – Does the fitting involve optimizing against water? CC – Yes, when fitting Sage we used TIP3P water for mixtures. We just need to pick the water model for fitting the binary mixtures. MG – What about LJ refits? MS – We’re not refitting protein LJ CC – Based on our last protein meeting, we decided that since we’re swapping charges from AM1BCC to AM1BCC-ELF10, we should refit everything else (including LJ) MS – So we should clarify that we’re doing ELF10 for everything (both small mols and proteins) SB – We should update the SMIRNOFF spec to indicate that we’re doing AM1BCC ELF10. But AmberTools often has prohibitive runtime if we try to do ELF10 in it. CC – Does toolkit now default to use AM1BCC ELF10 when OE backend is available? JW – This would be rough, since the runtime is really large for Ambertools backend. SB – If we’re asserting that there’s a large enough difference between normal AM1BCC and AM1BCC-ELF10 that we need to refit vdW, then… Like, if the swap gives us different outputs for pharmaceutically relevant molecules. MS – The ultimate answer for “are the charges significantly different” should be “do they change the values of co-optimized LJ when redoing the Sage fit?” MS – We should determine how different the output charges are from AM1BCC-{,ELF10} to see if refits are needed. In the Sage fit we didn’t have big floppy molecules so we shouldn’t see a big difference there. MG – The decision to do ELF10 was made a while ago (CC - September 2021), so it looks like this has more ramifications than we considered before. SB – I agree that we should use AM1BCC-ELF10 MG – If the training is on molecules where it doesn’t make a difference whether you use ELF10, then is runtime really an issue? JW – If the spec says “do small molecules using ELF10”, then I can’t choose not to do that at runtime MS – Need to figure out whether ELF10 requires LJ refit. Could do this by taking the Sage training set, then do… MG – If one outcome of this experiment would be that we generate protein librarycharges using single-conf AM1BCC, the that would seem to be an unacceptable outcome. CC – We could fit two FFs - One with single conformer, another with ELF10, and refit LJ for each. Then tell users to choose what runtime allows. MG – Would it even be possible to benchmark ELF10 vs single conf on the mostly-rigid sage training set to see whether the LJ params need to change for ELF10 CC – Could we use NMR data on small peptides to run optimizations/benchmarks answer this question? MS – I don’t think so. It’d be a study unto itself to determine whether we can use NMR data in this context at all. MS – If fitting with AM1BCC single conf and ELF10 doesn’t give significantly different LJ, then we’re done. The protein and small molecule parts of the FF are perfectly compatible. So the pathway forward could be: Small mols: AM1BCC (single conf) and Sage LJ Proteins: AM1BCC-ELF10 precomputed librarycharges and Sage(?) LJ CC – This is different from the prior plan. I don’t think we can change the plan without having the previous people here. SB – My proposal is to double check that the Toolkit gives AM1BCCELf10 charges for all components when OE is available and proceed with the current fitting. So don’t look at this as “being entirely responsible for Rosemary and implementation”, but look at this as “a study on making a protein FF that uses ELF10 for all components”. If graph charges are available then we’re set when we get there, and if not we can figure it out at the time. (General) – Agree Lennard-Jones Fit Sage types Target Sage physical property training dataset Valence and torsions Fit Sage types (null model) or new protein-specific torsions Target Sage QC training dataset and new QC datasets Optimization dataset for capped 1-mers TorsionDrives on (phi, psi) and (chi1, chi2) for capped 1-mers Fit torsions/valence simultaneously or sequentially
Rosemary benchmark - small molecules	@Chapin Cavender	Sage benchmarks QC test dataset Physical property test dataset - solvation and transfer free energies
Rosemary benchmark - proteins	@Chapin Cavender	All trajectories in triplicate Force fields Need Rosemary/TIP3P Rosemary/OPC Amber ff14SB/TIP3P Want Amber ff14SB/OPC CHARMM36m/TIP3P Amber ff19SB/OPC a99SB-disp TIP4P-D Beauchamp/Pande dataset 32 small peptides (2 to 5 residues) Chemical shifts and scalar couplings 500 ns trajectories Use as validation dataset to choose between models for protein torsions Need: Robustelli/Shaw a99SB-disp dataset 4 folded proteins (largest 120 residues) Scalar couplings 10 μs trajectories 10 disordered proteins (largest 140 residues) Chemical shifts and scalar couplings 30 μs trajectories Want: Mao/Montelione dataset 41 folded proteins Chemical shifts and NOEs 10 μs trajectories SB – Does this need to be 10 continuous us, or can it be parallelized? CC – Literature generally uses continuous. MS – Could do some advanced sampling to get initial configurations. CC – Greg Bowman’s group uses the FAST method to seed initial confs. CC – If we do continuous, then should do at least trip0licate. MS – Which dataset? CC – Robustilli-Shaw dataset. MS – Can get confs from that set SB – Do we have a way to estimate/benchmark how long a 10us total runtime would take on available resources? CC – Like, taking the largest protein from the set, and estimating whether it’ll take like a month or something else? Would assume we want to use OpenMM SB – SMIRNOFF in GROMACS and AMBER can also work if needed. Aggregate sampling Need: 3 ms Want: 3.7 ms
Compute for protein benchmarks	@Jeffrey Wagner	Talked to SB - If protein observable benchmark sims will be run on F@H, there’s little benefit to having the analysis implemented in Evaluator. So we’ll want to think about which repo we could keep the submission, trajectory pulldown, and analysis scripts in, and how that will be reproducible (record software versions etc)

Meetings

2022-03-10 Protein FF meeting note

Participants

Goals

Discussion topics

Action items

Decisions

Related content