2022-11-03 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @Michael Gilson

  • @David Mobley

  • @Diego Nolasco (Deactivated)

  • @Lily Wang

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Matt Thompson

Goals

  • Modeling neutral C termini in NMR benchmarks

  • Progress on null model parameter fit

  • SMIRKS for protein-specific model

Slides

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Modeling neutral C termini in NMR benchmarks

@Chapin Cavender

  • CC – Many NMR measurements were taken at pH 2, where C termini would be neutral. This is no problem for OpenFF, but there aren’t published librarycharges for protonated c termini ff14sb. I’ve been emailing with CSimmerling and he said that they have some unpublished charges for protonated CALA. But there are some other AAs that aren’t covered. Alternatively, PNerenberg and THead-Gordon have another set of RESP charges that could be used.

    • MG – What fraction of the benchmark set does this affect?

    • CC – For small peptides (2-5 residues), there’s a set of 2-mers that are all capped so they don’t have this problem, but some 3-mers that aren’t capped, as well as 4- and 5-mers. So 1/3-1/2 of the dataset.

    • CC – For something like an entire protein, it’s not a big deal if we mistreat the termini. But for small peptides it’s more difficult.

    • MG – Has anyone run these benchmarks with AMBER?

    • CC – Yes, other studies run with charged C termini at pH 7.

    • CC – One exception is that CSimmerling published on ala tetramer using ff19, where there ARE neutral C terminus charges.

    • MG –

    • CC – Could just try running with different protonation states and see how that affects results.

    • MG – Worried that this will multiplicatively increase the number of calcs we need to do.

    • CC – I think the right way to do it is to match the pH of the experiment regardless.

    • JW – Could make OH its own residue with net 0 charge? It’s a bad answer but it may be the least bad?

    • CC – Want to avoid easily-attackable ideas, avoid gapsys paper-style criticism.

    • DM – But if there’s no “correct” solution, we can just write that in the papers and do something else.

    • MG –

    • CC – So, decision on whether to use parameters that CSimmerling sends me?

    • MG + CC – …

    • CC – some manual work to copy over the parameters that CSimmerling used for ff19sb paper.

    • MG – CS may not have charges for all the termini that we need. So then we’d need to make stuff up, right?

    • CC – Yeah, but we could take the values from ALA, and use the same approach for GLY and VAL.

    • MG – Maybe Nerenberg params then?

    • CC – There’s not too different except for charges on amide nitrogen

    • MG – Do those look similar to mainchain/nonterminal amide nitrogens?

    • CC – Difference between restraints from Paul and Carlos is that Carlos made it explicity like a charged terminal residue, whereas Paul did a mainchain residue.

    • MG – So the Carlos solution is like a cap, whereas Paul let changes propagate deeper into the resiude?

    • CC – Yes

    • MG – Paul published this?

    • CC – Yes.

    • JW – PN charges sound better - It keeps us from needing to run “a method like what CS did”, because we can plug-and-play the exact values from PN’s work.

    • DM – Agree

    • MG – Agree

    • CC – So we can move ahead with Paul’s, but if we hear from CS that he has charges for all 3, does that take precedence over Paul's?

    • MG – Given that, even if we get them from CSimmerling, they’re still not published, it may be good to let Carlos know that this is the plan.

    • CC – Agree, that’s a good plan.

    • JW – I think the incentives work best if we use Paul’s

    • MG – Also PN’s parameters are published. So I think the best plan is to inform CSimmerling and see if he strongly objects.

    • PB (chat) – is this Paul's work that's being referred here, https://pubs.acs.org/doi/full/10.1021/ct2000183 -- Optimizing Protein−Solvent Force Fields to Reproduce Intrinsic Conformational Preferences of Model Peptides --

      • CC – yes

    •  

Null model fit

@Chapin Cavender

  • Chapin will post slides here

  • CC – Null model has converged!

    • Took about 15 days of walltime on TSCC

    • JW – Improvement in opt geo - Is that from small molecules?

      • CC – Not sure, but of about 4500 opt geos, 70 are protein.

  • CC – I’ll run this on QC validation set, and run our benchmarks.

  •  

  •  

Protein-specific SMIRKS

@Chapin Cavender

  • JW – Should we craft SMIRKS to cover weird resonance structures? Like “zwitterionic” form of amide bonds?

    • (General) – We’ll cover this on the infrastructure side, maybe just as simply as an entry in the FAQ: “If you enter a molecule in an unusually high-energy resonance form, then you may get different parameters assigned.”

  • PB (chat) – going fwd how will small mol Ffs be trained. do we fix protein parameters and train the small mol ones? then iterations would take 15 days

    • LW –

    • CC – So, this would make it harder to do exploratory work, since it would be hard to get rapid feedback. So one idea is to leave out the protein data and only fit/test improvement for small molecules parameters. Would that work for you, PB?

    • PB – Yes.

    • MG – How long do iterations take without the protein data? Why would it be so much quicker?

      • PB – Around 28 hours wiht a set similar to sage. Protein TorsionDrives are 2D, with 576 points.

      • CC – So each protein torsiondrive has 24 times as much data as a small molecule (1D) scan. And each requires MM minimizations.

      • MG – That makes sense, this may be worth testing formally at some point - Like, can we drop out some of these grid points? Or skip the minimizations?

      • CC – I think some minimization is important.

      • MG – Do we let surrounding torsions also move around?

      • CC – Yes, to some extent.

      • MG – Maybe we should restrain all torsions, but let bonds and angles relax.

      • CC – That’s a good diea, I don’t know if that’s implemented in forcebalance, or how hard it would be to add that. But if it’s already in geometric then it should be straightforward. Also, now that we have the fits done with a large amount of data, we could do experiments about slicing the set.

      • MG – Going ahead, this might be a side project worth doing.

    • CC – protein-specific backbond parameters only get applied to mainchaine residues. Uncapped termini will get small molecule params.

Action items

Decisions