2024-09-19 Protein FF meeting note

Participants

  • @Pavan Behara

  • @Chapin Cavender

  • @Anika Friedman

  • @Michael Gilson

  • @Alexandra McIsaac

  • @David Mobley

  • @Michael Shirts

  • Louis Smith

  • @Lily Wang

  • @Brent Westbrook (Unlicensed)

Goals

  • Update on torsion fits targeting GB3 NMR data

Recording

https://drive.google.com/file/d/1jX9b48B_Py9HgUNLJJLSGoUbm-JIG__g/view?usp=sharing

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

GB3 NMR torsion fits

 

@Chapin Cavender

  • LS: to check my understanding of the table (slide 8): what does chi2 represent?

    • CC: the chi2 is the scalar coupling Karplus model from simulation and experiment. The chi2 is always with respect to the experiment.

    • LS: could you calculate a delta chi2 to make the comparison easier?

    • MS: an error estimate would also be helpful. Could bootstrap for the single samples.

    • MS: to check my understanding, if there’s a good overlap between reference and query, the chi2 should always be similar to the grey bars. So the similar values of ff14sbonlysc vs ffsb14 is good, even despite the much lower number of effective samples. And when there’s like 1 sample it’s not great. 1 is the lowest possible overlap, indicates essentially no overlap.

      • CC: right

    • MS: what we’ve seen is if Neff >= 50, then the uncertainty estimates are good. If Neff < 50, uncertainties get pretty untrustworthy

    • MG: I thought Neff of 50 was borderline.

    • MS: 50 means uncertainty estimate of the observable will be reasonable, not necessarily lower uncertainty.

    • MG: I was pushing for Neff of 1000. How high do you need to go?

    • MS: 1000 is more than you need for uncertainty estimation, which you can use to determine quality.

    • MS: simplest method to estimate uncertainty would be to bootstrap samples from the trajectory. You want to bootstrap the free energies as well, so the weights will be different as well.

    • CC: I would plan bootstrap inside mbar to write out Z (slide 5).

    • MS: can’t think through this clearly right now, but bootstrapping should occur at the very beginning where you collect the samples.

  • LS: My takeaway from slide 8 is that Specific-0.0.3 is not well-sampled.

    • MG: the goal is not to reweight our force fields, but to reweight to something better.

    • MS: my interpretation is that the low Neff is because the process didn’t generate that many uncorrelated samples, e.g. it was slower for some reason.

    • CC: the free energy minimum for this FF is probably lower than 0.4, which is where I stopped (slide 4). So reweighting to the unbiased state probably means there’s not too much overlap with >0.4. This is a worry to me so I’ve continued work with the Null FF, which has a much higher Neff.

  • LS: (slide 9): did you do the same thing with Null?

    • CC: not yet, would be a good check to do.

    • LS: would expect lack of native structures would make it harder to converge.

  • MG (slide 13): I’m seeing a tradeoff between predicted chi2 and Neff. The Neff around +2 is quite low. I would think around 3-4 is more optimal.

    • CC: I wanted to give this a bracket and look for a minimum. Above +2 it becomes a monotonic slope. I’ve found a local minimum around 1.7.

    • MG: I’m worried around that value you risk the Neff being too low and results being affected by noise. My intuition is that we need ~1000 Neff.

    • MS: my instinct is no lower than 2.0, maybe 3.0 is better. To checking: bigger alpha is smaller step?

    • MG: yes.

    • MS: I would start optimization a bit more aggressively, but not go below 2.0

    • LS: could you bootstrap to see how accurate the chi2 is?

    • MS: I would not overthink but start some optimizations around 2.0, 3.0, 4.0. Depends on expense. I’d let it run for a few steps.

    • CC: only takes a few minutes to do an optimization with a specified value of alpha. Benchmarking simulations is more expensive.

    • MG: because we are taking multiple steps, we don’t need to jump to optimal chi2 in first step.

  • MG (slide 15): this chi2 is predicted, not from simulation, right

    • CC: correct. The chi2 on the previous slides was the leave-one-out cross-validation error as a measure of transferability. This is the actual value from doing an actual optimization on all the data. Expected to be lower.

  • LS (slide 15): is k in kcal/mol?

    • CC: yes. It’s the RMSE

    • LS: Are these very small steps? What’s the absolute value of the mean?

    • CC: our largest torsions are around 1-2 kcal/mol. Most are smaller than half. I’m fitting 24 torsions. 0.11 is about a 10% change.

    • MG: from Neff it seems like you can’t take too large a step in k.

    • LS: could we take Neff of 50 be a stopping criterion?

    • CC: I’m thinking of it being a function of chi2

    • LS: there’s likely multiple solutions for a given minimum, especially if you have parameters that overstabilize the native structure

    • MS: we could check the free energy profiles

    • MG: could mix in unfolded peptide data, but don’t want to overcomplicate. CC, do you think we risk overstabilizing with fitting to these folded NMR targets?

    • CC: I’d rather have that problem rather than unfold an alpha-helix. Once solved we can progress from there.

    • LS: agree, assume MVP is something Sage-y that doesn’t unfold proteins

    • MS: maybe a different water model would improve properties

    • LS: do we know the experimental free energy profiles of GB3?

    • CC: just aiming for qualitative funnel shape at this point.

    • (some discussion on desired behaviour)

    • MS: our unfolded peptide benchmarks help select for behaviour in unfolded states too

  • MS (slide 16): maybe you’ll want to use two previous states to re-weight, not just most recent. Not clear it’s worth the effort to do that given the additional recalculations required.

    • CC: good idea.

  • CC: my stopping criteria currently is convergence of the chi2 value.

  • LS: how are you thinking about convergence in the umbrella sampling simulation? Noticing that region between 0.9 to 1.0 got worse. Do you think that’s significant?

    • CC: not necessarily.

    • LS: do you think it’s partially sampling? Is it noisy?

    • CC: (… recording around 62 min)

    • MG: is magenta run shorter than green?

    • CC: roughness comes from the NMR refit (pink) having only 1 replicate, original FF has 3 (green)

  • General: results look very encouraging

 

 

 

Action items

Decisions