2023-11-16 Protein FF meeting note

Participants

@Chapin Cavender
@Pavan Behara
@Alexandra McIsaac
@Lily Wang
@Anika Friedman
@David Mobley
@Michael Gilson
@Brent Westbrook (Unlicensed)
@John Chodera
Ken Takaba
@Matt Thompson
@Yuanqing Wang
@Michael Shirts

Goals

Benchmarks of FFs trained on NMR observables
Strategy for new QM fits
Data for Espaloma manuscript

Recording

https://drive.google.com/file/d/1WetoPLLVFtX-gVY2Rx114tP4iFjvVTws/view?usp=sharing

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
NMR FF benchmarks	@Chapin Cavender	Slides here MG: [slide 26] FF14SB better at capturing helicity, but does worse here. Is improving this metric going to be meaningful? CC: Agrees, thinks we can improve fits here but not sure if it will help overall. Considering revisiting QM fits to address this
QM fits	@Chapin Cavender	[starts at slide 27] CC: What is the correct software versions to use? Issues brought up yesterday about fitting stack LW: Changes in both OpenEye backend, but worse is issue with OpenFF software. Issue with OpenFF Interchange, if conformers don’t have same atom ordering, they are all assigned charges from the first conformer. Don’t use the most recent OpenFF stack, ForceBalance 1.9.3 is the last version to use, or wait until patch released. MT: Hopes to have this out in a few hours to days. Would be good to have people check to make sure it’s actually working before relying on it. CC: Wants to start fits now, will keep using ForceBalance 1.9.3 and Toolkit 0.10 LW: and 2022 version of OpenEye, to be safe MG: Does this affect results so far? CC: No, has been using earlier version so far. But wants updated toolkit features so would like to update software. Should not affect current results presented so far. PB: [Slide 28] Didn’t see any difference when testing pairwise conformer energy for small molecules, used all of Sage 2.1.0 training set. DM: Small molecules sometimes have weird things going on, proteins may not behave the same way, would be worth testing separately CC: protein dataset also has 2D torsion drives for coupled torsions, may see things we couldn’t see on 1D scan on small molecule PB: suggests using lower search tolerance. It’ll reduce the number of Hessian diagonals CC: are you suggesting a 2-stage fit where one stage has a lower tolerance? PB: yes MS: is the idea with these experiments to get better alpha helix behavior? CC: No, want a better starting point so want a better QM fitting procedure; other FFs have gotten a good fit without having to do NMR fits CC: Scoring alpha/beta basins on Ramachandran map differently than other FFs, even though our QM is good we’re still seeing that
Espaloma Protein Benchmark	@Anika Friedman	Slides here JC: Really cool work. Is it a defect with QM training data in the region, e.g. have we de-emphasized fitting in the region so the QM energies aren’t represented well, or is there an emergent property that causes devation? AF: hasn’t seen any clear answers so far, would have to dig into it MG: Do you have an intuitive sense of what’s happening? Does GB3 look like it’s unraveling? AF: Yes, visually it’s unraveling in Espaloma simulations, takes a while to unravel but doesn’t re-order What are we doing with torsions? [should be around 33 mins in the recording] Ken: protein torsions is OpenFF peptide torsion 2D torsion drive and a few others, 3-mers, 3-mer omega datasets, 3-mer capped backbone, released 2022 and 2023 MG: Do thses have overlap with Chapin’s datasets? CC: Training on ones called dipeptide 2D torsion drives, backbones and side chains of capped 1-mers, used 3-mers as validation CC: Issues with converging torsion drives in 3-mer datasets MG: So espaloma and Chapin’s FF are trained on some of the same data and some different CC: Yes KT: Slide 2, for side chains chi2 is >100, is that meaningful? AF: Trying to figure out why it’s so high, both don’t have much data [30 scalar couplings for lysozyme vs many more for ubiquitin], trying to figure out if it’s something to do with the protein target or what CC: Need estimate for systematic error in Karplus [?] model to get chi2, trying to see if this error is overestimated → inflating chi2 JC: What pH and temp were the experimental measurements done? Sometimes have extremes to keep it protonated CC: pH 6.5, T~25 C ish. JC: GB3 is relatively unstable at that temp/pH, are they sure it’s mostly folded at those conditions? These simulations would only sample folded state CC: scalar couplings are taken at different pHs than the backbone in some cases, same pH in others. pH could also affect Karplus models CC: Should also look at other observables besides scalar coupling. Has chemical shifts implemented now, or order parameters should be available. KT: Where are histidines in GB3? CC: There aren’t any histidines JC: pH likely chosen to prevent (or enhance) exchange of protons with deuterium solvent KT: Timescale of unfolding? AF: Gradual process, still trying to quantify, 3 residues gradually disorder over 1.5-2 microseconds. Trying to quantify a cutoff point KT: May want to plot RMSD to see how structure changes over time with different FF/runs AF: Wondering whether we should include this in Espaloma manuscript? [around 45 mins in recording--went a bit too quickly for good notes] KT: Thinking of resubmitting end of November, wants more analysis like RMSD plot, etc to understand data. But would love to add once we understand better. Could add during peer review? For resubmission, wants Chapin’s peptide results from 1-2 meetings ago. What do others think? JC: Would want a v2 of peptide results that better compares AF: OK, will start doing more analysis and will share
Espaloma manuscript	@Chapin Cavender Ken Takaba	CC: Anika will do more analysis on proteins, CC will share “v1” results and methods section JC: Want to avoid making choices informed by too small of a set of benchmarks, but this is a good starting point. Is there a way we can avoid being biased by looking at too few assessments, make sure we’re not over-optimizing for one particular benchmark? CC: what do you mean? JC: A lot of optimizations have been very targeted on a few benchmarks, tweaking things that affect small bias between alpha helix vs not, do we have enough breadth to make sure we’re not neglecting other metrics CC: page on protein FF confluence plan has tiers of observables to benchmarks based on how easy/long it will take JC: looks like these benchmarks also have bias toward helical structures, but also there is a beta one CC: idea was to pick one alpha and one beta JC: Are there longer peptides? CC: Could add up to 7-mer but decided not to due to weird behavior K19 is about 40% helical at room temp, to go beyond that would have to look at more unusual/engineered structures MG: Would be interesting to try simulating another protein and see if it has same helix problem CC: Wasn’t sure if that was how we wanted to spend our CPU time, but could do it MG: 5-mers have some helicity, but when trained on those didn’t get better results, maybe don’t do that experiment then.

Meetings

2023-11-16 Protein FF meeting note

Participants

Goals

Recording

Discussion topics

Action items

Decisions

Related content