2024-08-08 Protein FF meeting note

Participants

@Chapin Cavender
@Pavan Behara
@Michael Gilson
@Anika Friedman
@Alexandra McIsaac
@Brent Westbrook (Unlicensed)
@Lily Wang
@Michael Shirts
Louis Smith
@Jeffrey Wagner

Goals

Benchmarks for Specific FF with Sage priors
Update on umbrella sampling with fraction of native contacts
Benchmarks of Null-0.0.3-OPC on folded proteins
Generating new QM training data from PDB survey

Recording

https://drive.google.com/file/d/1lHrLTAUW23DHqO3mQVLDQ-y6cE3sUZ25/view?usp=sharing

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Specific-0.0.3 with Sage priors	@Chapin Cavender	(Slide 4): Pair has initial values set to parameters from Amber Sage-Pair has initial values set to parameters from Sage Direct comparison is Null-0.0.3-NAGL (Slide 5): PB: was the width of the prior for torsions set to 5 kcal/mol? CC: yes, set to the same value as 2.1 MS: with the pairwise objective fits, are they trained to data corresponding to the alpha helical part of the Ramachandran plot? CC: yes MS: what’s the secret between what 14sb was doing and us? CC: there is a difference between the objective function that 14sb used vs us, where I added weighting to weight lower energy points higher MS: how different is our training data? CC: We don’t have their QM data, but it sounds generated in a similar way. They did 1D scans vs our 2D scans. MS: Would be interesting to compute their obj function vs our obj function for these points. CC: They’re also using MP2 as quantum method vs our DFT. MS: do we think that makes a difference? CC: unclear. Also, the backbone parameters were borrowed from older FFs that fit to lower QM methods. The bbs were re-fit to empirical scalar couplings, side chains re-fit to MP2 energies. CC: found that training to gas-phase data gave worse properties. Tweaking gas-phase backbones to fit NMR scalar couplings resulted in better performance. At that point, AMBER helices were too stable and fitting to unstructured peptides helped – we have the opposite problem. Most recently they fit to implicit solvent DFT. LW: do we train to just torsion scans or a wider array of geometries? Ties into AF’s topic below. CC: yes we’ve discussed increasing our training data in a few different directions MG: have we considered training to conformational preference data? LS: is it correct that the torsional profiles you obtain from SMIRNOFF fits are very similar to AMBER? If the torsion profiles between fits are pretty similar with all the experiments you’ve been running, that would imply we might need fundamentally different inputs since we’re ending up in the same basin MG: we discussed softening priors before CC: we started from Sage initial values but haven’t run this experiment yet LS: have we falsified that this error isn’t coming from NBs? CC: yes, we’ve run experiments swapping NB parameters with AMBER CC: (shows some older slides on torsion profiles around 30 min in). In general AMBER has highest differences to QM profile, Sage-CC is similar-ish to AMBER, and the Protein SMIRNOFF re-fits fit QM most closely AF: it almost looks like the closer we fit to QM, the worse we do MG: as CC points out, AMBER does fit to different method LW: have you ever run benchmarks on Sage-CC? CC: on short peptides yes, not the longer ones. Could easily run PB (in chat): I think Lee-Ping has Amber-FB15 training data here, https://github.com/leeping/forcebalance/tree/master/Products/AMBER-FB15, if you want to compare the objective functions from a zeroth iteration calculation of your FF and Amber-FB15 "All energy and gradient values in the database were respectively computed at the RI-MP2/CBS and RI-MP2/ aug-cc-pVTZ levels of theory" from Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15 MG: would it be possible manually tweak BBs to get better helices? MS: reweighting would be the way to do this MG: this is similar in spirit to what we’re currently doing, which is figuring out how to get the right answer from QM LS: agree with all of above MS: danger is we get it right for the wrong reasons. We don’t want to overstabilise, for example LS: we currently have a great IDP force field MS: happy to brainstorming reweighting approaches
Fraction of native contacts	@Chapin Cavender	MS: can take stable segments of a simulation, e.g. 1 us, and tweak torsions to stabilize the folded states. CC: issue was we didn’t have enough folded states in the SMIRNOFF force fields. The only trajectory long enough was Null with OPC, and we had a constraint that we needed a 3-pt water model MS: FYI, if you look at densities and heats of mixing, OPC performs the worst with a systematic issue and TIP3P is not too bad LS: did we ever try OPC3? CC: have run benchmarks with OPC3
Null-0.0.3-OPC	@Anika Friedman	JW: for lysozyme BB, is there no data? AF: we don’t have NMR data for lysozyme BB CC: generally side-chain data is less accurate than BB LS: did we just get unlucky picking GB3 as our target? Other targets look to perform within error AF: a significant portion of GB3 is a-helix, so that has a significant contribution to error LS: so is it because targets are less a-helical so less sensitive? Is it a convergence phenomenon, so if there’s more sampling BPTI would also unwind? AF: BPTI is about same size as GB3 MG: BPTI has multiple disulfides, looks like it’s anchoring the helices LS: if BPTI is more stable than GB3 in the FF, it might give deceptively good performance
New QM data from PDB survey	@Anika Friedman	MG: what if we reduce or downweight the sidechain data to avoid it being used for BB fitting? AF: CC has tried various weighting schemes. Doesn’t sound like the SC data is skewing the BB fits. MG: so the oversampling in this region currently is not a problem? CC: I think so AF: the problem seems to be more that we’re not characterizing between the 15 degree intervals CC: we could take 4-mers and do hierarchical clustering to characterize the multiple phi/psi angles present in the peptides AF: sounds like a good idea AF: do we just want to focus on a-basin? There are also regions in b-basin that aren’t sampled as thoroughly. MG, CC: agree. PB: why do we need to sample closely-spaced points in each basin? AF: we may be missing minima for certain residue configurations. Also, these are 4-mers which give us more structural information

2024-08-08 Protein FF meeting note

Participants

Goals

Recording

Discussion topics

Action items

Decisions