2022-01-13 Protein FF meeting notes

Date

Jan 13, 2022

Participants

@Chapin Cavender
@Michael Gilson
@David Mobley
@John Chodera
@Pavan Behara
Robert Raddi
@Daniel Cole
@Simon Boothroyd
@Michael Shirts
@Jeffrey Wagner

Goals

Results from dipeptide 2-D TorsionDrives with sidechain dihedral constraints
Update on LiveCoMS review

Slides and recording

A recording of this meeting is available here: https://zenodo.org/record/5846958

Slides are attached below.

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Dipeptide 2-D TorsionDrives with sidechain dihedral constraints	@Chapin Cavender	JC – Are sidechains CONstrained or REstrained? If CONstrained, we could get unnaturally high energies CC – CONstrained, I think. It’s the same style of constraint as the driven torsions DM – What do other people do here? CC – In most other existing protein FFs, they don’t do anything to the sidechains. They have all their degrees of freedom free to minimize JC – (In tryptophan rotamer 2 slide) - Doesn’t this show that energies are highly sensitive to sidechain position? I think this means we shouldn’t totally constrain sidechain rotamers MG – But this is expected, the question is what’s the best way forward? JW: The discussion above seems to assume that high energies are bad. But isn’t that fine if the actual molecular energetics SHOULD give a high energy? Is the implication that we know that sterics will be incorrectly modeled, so we want to avoid large steric terms? MG – We’ll show “torsion contribution” plots later, which subtract out MM sterics JC – In the MM constraints, are the torsion ANGLES constrained, or are the ATOM POSITIONS constrained? (General) – (Several people vaguely recall that ForceBalance would constrain angle values. CC is using the same scripts as SB used to compare at the time of the Sage release, which freeze atom positions.) JC – On Alanine MM taget slide, it looks like things are coupled diagonally, which would imply that we need a CMAP that depends on both phi and psi MG + MS – Agree MS – Could include coupling terms to become more accurate DM – Would be better to do conventional torsions in the MVP. JC – May be able to do custom 1-4 scaling to compensate for this as well. JC – The energy difference plot - Does this equally weight high-energy/low-probabilty regions? Or is the MM difference already boltzmann-weighted? CC – It’s not weighted. JC – KL divergence could help here, which can take into account the similarity between different bins. JC – Do we know whether the moving minima are due to the constrained sidechain position, or (something else)? We could look at the forces on the constrained atoms to measure the effect of the constraints CC – I could look into this. Previous MacKerrell paper said that common sidechain rotamers nearly always correspond to QM minima. JC – It could be good to double check this. Maybe we could run some quick ANI calcs? DM – So, you’re taking a single set of sidechain confs and constraining them to stay there throughout the whole torsiondrive. This could mean that there’s a total clash somewhere in the landscape on a single sidechain conf. Thinking out loud, we’re trying to fix this phenomenon where the sidechain confs jump throughout the torsiondrive. MG – This is why we checked and found that there aren’t big steric bumps in these results. DM – Another way we could have done this is with REstraints instead of CONstraints. DM – Thinking about the minimum viable product - We were trying to fix a problem that we observed without any -straints. JC – The data that we have available can help us here: The sidechain forces can tell us how unhappy they are We have an optimization dataset for each of these, and they should be at a minimum for each of these. So we can see whether those went far from the torsiondrive results. MG – Again, we didn’t see huge steric energies, so I don’t get the sense that any of these are invalid. They’re all achievable by thermal sampling. DM – … MG – This is coming close to asking for a 4D torsiondrive to ensure we find the best sidechain position for everything JC + DM – We’re more of advocating that we use REstraints instead of CONstraints. DC – In previous work, we had not restrained sidechains at all, we just seeded a few different sidechain rotamers in the scan points and then selected the lowest energy for each grid point. MG – So maybe a plan of action would be to run a coarser 2D scan, with sidechain REstraints seeded at a few different positions JC – People generally use three sidechain rotamers, and you could come up with a force constant based on some boltzmann stats. SB – I could take a look at which angular restraint strengths I used. (general) – CAN we use REstraints in QCA? SB – I don’t think so. GeomeTRIC may allow this using “optking”. Maybe a PR to torsiondrive or geometric would enable this. JC – We may have the data we need here - Can compare the results of the optimizationdataset to the results of the torsiondrive. Then, if the energy differences are large, we know that Restraints should be used. CC – I don’t think we have that info. JC – I thought we’d generated optimizationdatasets for all the torsiondrive endpoints. That’s how we’d get bond lengths and angles (General) – The plan for rosemary hadn’t been to make protein-specific bonds or angles. Just torsions. But new torsions would be added, and all the numerical values (JW edit on 2022_01_18: Apologies, I think that what was discussed was actually “all `k` values for these new parameters”) will be refit. JC – So, we could do a new optimizationdatasets seeded with the protein SMILES, and then for each optimized geometry, compare it to the closest grid point from this study, and see if the energy difference is large. This will tell us whether the sidechain-constrained TD gets close enough to the minima PB – pepconf may have some of the optimization data you're looking for, this set has lot of errored calculations though, qca-dataset-submission/submissions/2020-10-26-PEPCONF-Optimization at master · openforcefield/qca-dataset-submission (Next steps slide) MG – A third option would be what Danny said - Not constraining sidechain rotamers and seeding multiple starting points could work. JC – What about seeding from all the scan points? CC – I only have multiple sidechain romaters for proline and trp. JW – I’d be skeptical of the Cerutti set - There are several chemistry deficiencies and a high error rate. CC – I’ve also seen irregularities in that set, it may be worth resubmitting. MG – We could save some time by increasing the scan step to something like 30 degrees. Could experiment by doing a fit to the current data, then do a fit to the data but only as if we had submitted a 30 degree increment (so slice the existing data). SB – The sooner we can identify the benchmarking observables, the sooner we can get those implemented. CC – Agree, I’ll work on this JC – Could resurrect KBeauchamp’s tool to compare to protein NOEs
LiveCoMS review	@Chapin Cavender	CC – The first xtallization section is just about ready as a first draft. Who should review it here? MS – I don’t think we need to do a deep feedback cycle internally CC – There isn’t a lot of consensus on how to actually do things, which puts us in a hard spot. MG – There are some differences in perspective in the writing MS – We should be careful with how editorially forceful we are. If it doesn’t come down to “which things do you actually match”, then journal reviewers will have a problem. It’s not necessary to agree - We can end up writing “one idea is this, the other idea is that” … CC – Do the openFF PIs want to go through a round of review before this gets sent to coauthors? MS – The quality is the most important, if it’s not ready by May that OK. DM – I could go either way. Happy to see this when it goes out for coauthor review.

2022-01-13 Protein FF meeting notes

Date

Participants

Goals

Slides and recording

Discussion topics

Action items

Decisions