2024-08-08 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @Michael Gilson

  • @Anika Friedman

  • @Alexandra McIsaac

  • @Brent Westbrook

  • @Lily Wang

  • @Michael Shirts

  • Louis Smith

  • @Jeffrey Wagner

Goals

  • Benchmarks for Specific FF with Sage priors

  • Update on umbrella sampling with fraction of native contacts

  • Benchmarks of Null-0.0.3-OPC on folded proteins

  • Generating new QM training data from PDB survey

Recording

https://drive.google.com/file/d/1lHrLTAUW23DHqO3mQVLDQ-y6cE3sUZ25/view?usp=sharing

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Specific-0.0.3 with Sage priors

 

@Chapin Cavender

  • (Slide 4): Pair has initial values set to parameters from Amber

    • Sage-Pair has initial values set to parameters from Sage

    • Direct comparison is Null-0.0.3-NAGL

  • (Slide 5): PB: was the width of the prior for torsions set to 5 kcal/mol?

    • CC: yes, set to the same value as 2.1

  • MS: with the pairwise objective fits, are they trained to data corresponding to the alpha helical part of the Ramachandran plot?

    • CC: yes

  • MS: what’s the secret between what 14sb was doing and us?

    • CC: there is a difference between the objective function that 14sb used vs us, where I added weighting to weight lower energy points higher

    • MS: how different is our training data?

    • CC: We don’t have their QM data, but it sounds generated in a similar way. They did 1D scans vs our 2D scans.

    • MS: Would be interesting to compute their obj function vs our obj function for these points.

    • CC: They’re also using MP2 as quantum method vs our DFT.

    • MS: do we think that makes a difference?

    • CC: unclear. Also, the backbone parameters were borrowed from older FFs that fit to lower QM methods. The bbs were re-fit to empirical scalar couplings, side chains re-fit to MP2 energies.

    • CC: found that training to gas-phase data gave worse properties. Tweaking gas-phase backbones to fit NMR scalar couplings resulted in better performance. At that point, AMBER helices were too stable and fitting to unstructured peptides helped – we have the opposite problem. Most recently they fit to implicit solvent DFT.

  • LW: do we train to just torsion scans or a wider array of geometries? Ties into AF’s topic below.

    • CC: yes we’ve discussed increasing our training data in a few different directions

  • MG: have we considered training to conformational preference data?

    • LS: is it correct that the torsional profiles you obtain from SMIRNOFF fits are very similar to AMBER? If the torsion profiles between fits are pretty similar with all the experiments you’ve been running, that would imply we might need fundamentally different inputs since we’re ending up in the same basin

    • MG: we discussed softening priors before

    • CC: we started from Sage initial values but haven’t run this experiment yet

    • LS: have we falsified that this error isn’t coming from NBs?

    • CC: yes, we’ve run experiments swapping NB parameters with AMBER

  • CC: (shows some older slides on torsion profiles around 30 min in). In general AMBER has highest differences to QM profile, Sage-CC is similar-ish to AMBER, and the Protein SMIRNOFF re-fits fit QM most closely

    • AF: it almost looks like the closer we fit to QM, the worse we do

    • MG: as CC points out, AMBER does fit to different method

    • LW: have you ever run benchmarks on Sage-CC?

    • CC: on short peptides yes, not the longer ones. Could easily run

    •  

  • PB (in chat): I think Lee-Ping has Amber-FB15 training data here, https://github.com/leeping/forcebalance/tree/master/Products/AMBER-FB15, if you want to compare the objective functions from a zeroth iteration calculation of your FF and Amber-FB15

  • MG: would it be possible manually tweak BBs to get better helices?

    • MS: reweighting would be the way to do this

    • MG: this is similar in spirit to what we’re currently doing, which is figuring out how to get the right answer from QM

    • LS: agree with all of above

    • MS: danger is we get it right for the wrong reasons. We don’t want to overstabilise, for example

    • LS: we currently have a great IDP force field

    • MS: happy to brainstorming reweighting approaches

    •  

Fraction of native contacts

@Chapin Cavender

  • MS: can take stable segments of a simulation, e.g. 1 us, and tweak torsions to stabilize the folded states.

    • CC: issue was we didn’t have enough folded states in the SMIRNOFF force fields. The only trajectory long enough was Null with OPC, and we had a constraint that we needed a 3-pt water model

  • MS: FYI, if you look at densities and heats of mixing, OPC performs the worst with a systematic issue and TIP3P is not too bad

    • LS: did we ever try OPC3?

    • CC: have run benchmarks with OPC3

    •  

Null-0.0.3-OPC

 

@Anika Friedman

  • JW: for lysozyme BB, is there no data?

    • AF: we don’t have NMR data for lysozyme BB

    • CC: generally side-chain data is less accurate than BB

  • LS: did we just get unlucky picking GB3 as our target? Other targets look to perform within error

    • AF: a significant portion of GB3 is a-helix, so that has a significant contribution to error

    • LS: so is it because targets are less a-helical so less sensitive? Is it a convergence phenomenon, so if there’s more sampling BPTI would also unwind?

    • AF: BPTI is about same size as GB3

    • MG: BPTI has multiple disulfides, looks like it’s anchoring the helices

    • LS: if BPTI is more stable than GB3 in the FF, it might give deceptively good performance

    •  

New QM data from PDB survey

@Anika Friedman

  • MG: what if we reduce or downweight the sidechain data to avoid it being used for BB fitting?

    • AF: CC has tried various weighting schemes. Doesn’t sound like the SC data is skewing the BB fits.

    • MG: so the oversampling in this region currently is not a problem?

    • CC: I think so

    • AF: the problem seems to be more that we’re not characterizing between the 15 degree intervals

  • CC: we could take 4-mers and do hierarchical clustering to characterize the multiple phi/psi angles present in the peptides

    • AF: sounds like a good idea

  • AF: do we just want to focus on a-basin? There are also regions in b-basin that aren’t sampled as thoroughly.

    • MG, CC: agree.

  • PB: why do we need to sample closely-spaced points in each basin?

    • AF: we may be missing minima for certain residue configurations. Also, these are 4-mers which give us more structural information

Action items

Decisions