2025-05-01 Protein FF meeting note

Participants

@Pavan Behara
@Chapin Cavender
@Jennifer Clark
Alan Grossfield
@Julianne Hoeflich
@Josh Mitchell
@David Mobley
@Michael Shirts
Louis Smith
@Jeffrey Wagner
@Lily Wang

Goals

GB3 NMR fits
PDB 4-mer QM fits

Recording

https://drive.google.com/file/d/1_YivyUmdOL57YwHk3qOcN06OJ7xs7lWh/view?usp=sharing

Discussion topics

Item	Presenter	Notes

Item

Presenter

Notes

GB3 NMR fits

@Chapin Cavender

(slide 3) MS – Could be good to mix in more traj data, but would be complex/require workflow changes.
- CC – Right, that would be what I want to do for next step, but due to amount of work I’d like to see if this works first.
LS – So this is just fit to sim from previous iterations, and no data is being carried forward except parameter values?
- CC – Yes
- CC – there’s a barrier that we see in umbrella sampling that doesn’t end up being reflected in unbiased MD. (very detailed, see recording ~8 mins). MG and I think there may be a bug or sampling issue in the process.
- LS – Thinking about how similar the successful and unsuccessful FFs are, that doesn’t mean that if you …. (it may be the case that an important state/conformation isn’t being sampled)
- MS – Umbrella sampling with repex might help.
- LS – Agree. Talked with greg bowman, and if you really want lots of sampling, could run on F@H.
- CC – I think that’d be helpful to generate more sampled from unbiased runs… For the use case I’m imagining, users will be running unbiased MD (where they care about maintaining structure of protein and sampling some local fluctuations) or understanding how unfolded states contribute to the proteins conf ensemble (in which case umbrella sampling is the right thing to do)
- MS – I think it will eventually make sense to to F@H, but maybe not in the next few months.
- CC – Right, and I’m also thinking that making F@H part of the process could make it less reproducible.
- AG – If you did one really complete set of sampling, then couldn’t you use that over and over again.
- MS – If they’re close in phase space, then maybe. But not sure that’d be the case.
- CC – … (see recording ~15 mins)
- MS – So we started with umbrella sampling, but that wasn’t working because we weren’t getting the folded state. Then we ran with … and got more data in folded state. But now it seems like there are some unfolded states that are being stabilized, probably because they aren’t being sampled enough for the FF to learn to destabilize them.
MS – Maybe fitting to 4-mer QM sets would improve this, how are those looking
- CC – I’m running MD of these FFs right now, but the data isn’t ready yet… If we look at FFs trained to our previous QM datasets (just with 3-mers) vs FFs trained to that plus 4-mer data, their RMSE on a held-out 3-mer torsiondrive test set is nearly identical.
MS – How could we help with repex workflow?
- CC – I got an early prototype working on UCSD cluster, but wasn’t very few exchanges between umbrellas.
- MS – Generally to get good replica spacing, you need the umbrellas to overlap 25-30%
- AG – What spring constant for umbrellas? Could probably loosen constant to improve constant
- CC – I experimented with some different values in previous work, and didn’t find a clearly optimal value.
- LS – Right, I remember some settings leaving gaps in the sampling.
- CC – Consensus seems to be that my GPU allocations are better spent getting repex to run than running more unbiased samples.
LS – I’d really like to try the differentially-seeded “swarm” simulation, and see if huge amount of sampling would be helpful. Then the dataset could be reused for future work.
- CC – I could send over some confs from my sims, can’t guarantee that this will slot back in to the workflow but will be interesting to see.
- MS – Which sampling to send/from which FF? This may determine the usability of the data, since FFs that are too different wouldn’t benefit.
- LS – Could do ff14SB and FFs from some of CC’s iterations.
- CC – Yeah, I’d probably send over the two FF candidates that keep the helix folded the longest.

PDB 4-mer QM fits

@Chapin Cavender

(sims not completed by the time of this meeting)

2025-05-01 Protein FF meeting note

Participants

Goals

Recording

Discussion topics

Action items

Decisions