2024-02-22 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @Brent Westbrook

  • @Lily Wang

  • @Alexandra McIsaac

  • @Michael Shirts

  • @David Mobley

  • @Jeffrey Wagner

  • @Pavan Behara

Goals

  • Update on protein QM parameter fits

  • Debrief on 2024 Biophysical Society Annual Meeting

Recording

https://drive.google.com/file/d/13MDfHFbtIAA8BvlyKs54J4BdHNpB2XEJ/view?usp=sharing

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Protein QM fits

 

@Chapin Cavender

  • MS (slide 3): can we compare objective functions between fits?

    • CC: can compare 0.0.3 vs 0.0.3-dw, factor of 10 off

  • LW (slide 4): how big are the peptides you’re fitting to vs benchmarking here?

    • 3-7 peptides in the J coupling set, 1-3 in the training set

  • LW (slide 4): worth adding larger peptide chains for benchmarking?

    • CC: computing in progress

    • JW: worth submitting to qca-dataset-submission?

    • CC: do we think we need more QC data for proteins beyond 1-mers and 3-mers?

    • DM: looks like we’re still evaluating whether this QC data is helpful, but if it is, you may find that you suddenly need lots more data, so potentially worth queuing up now

    • CC – Maybe, but current FFs didn’t fit to gas phase QM. So we’d probably need solution phase.

    • DM: won’t hurt to have nucleic acid data too

    • CC – Ok, so I’ll run these benchmarks for larger peptides and will run new fits. I’ll also think about whether it’d be worthwhile to do new fits, either with larger peptides or the same peptides with implicit solvent.

  • MS: if something doesn’t work, do we need next steps or a backup plan?

    • CC – I presented on this at BPS. People were very excited and engaged. Two groups of people gave feedback.

      • People who wanted to use it right now

      • People who wanted to know more about the science of it, often with experience in FF fitting. They wondered if the problems were coming from nonbonded parameters, not torsions. Could be vdW, could be charge model. Charge model seems unlikely since we’re so close to AMBER. One criticism was that the FF was tuned in the presence of our 1-4 nonbonded interactions.

    • CC – AF expressed interest in helping with this project in parallel. I’ll meet with her tomorrow. Maybe some way to split off nonbonded params.

      • MS – Ok, we were talking about looking at water models.

      • CC – Cool

      • MS: TIP3P is not the right water model long-term, but seems to stabilize proteins – just posted paper on how OPC seems to destabilize them

      • CC: not sure I completely agree with the paper, as TIP3P seems to favour keeping structures compact, IME modelling proteins with disordered regions behave better with better water models

      •  

      •  

    • .

BPS debrief

@Chapin Cavender

  • CC –

    • K Lindorff-lasrsen got a BPS fellowship, for comparing sim to expt. Gave a talk on a CG FF (calvados?) designed to model disordered proteins. Surprised folks by getting good behavior by training to single protien properties and paramagnetic relaxation. Training on single chain they were able tog et good performance on condensates with multiple molecules. Also, they didn’t place beads on alpha carbons, instead they did the COM for the whole residue, so it’s closer to the side chain.

    • CHARMM folks had a big presence. Some MAckerrell folks who were working on cgenff. Working on making nonbonded params for mols by doing away wiht combining rules and just making specific params for all 150 atoms types. This involved making a big QM dataset of these different atoms interacting in differnet geometries. They’re beginning parameter fits now.

      • LW – What QM data? Dimer energies?

      • CC – Yes, dimer energies. They found MP2 worked well but they looked for a cheaper method. Ended up finding a cheaper one but I don’t recall what it was.

      • MS – We’d discussed this on slack the other day as well.

      • PB: did they use the Sherrill SAPT datasets?

      • CC: think they’re using it as a benchmark set, don’t recall if it’s on the list

    • Folks working on DRUDE. One problem they found is that is causes monovalent ions to aggregate on the charged protein atoms. So there was a postdoc from MAcKerrell group working on re-tuning ions. They found that they could re-tune the ions so that didn’t happen

    • Talk from postdoc in GBowman’s group where they ran long sims of proteins where they have single-molecule FRET. This let them compare to experimental distances. They’re finding that they can’t do this, since it requires knowing how and where the dyes are attached to the protein. They’re interested in using OpenFF for this.

      • MS – CDavel’s work might help with this.

      • JW – We have a not-blessed way to do PTMs, if they’re OK with using not-blessed implementation feel free to put them in touch.

    • CC – (Something about xtals in e- fields). They’re running comparisons to experiment. This could be an interesting benchmark set for us.

    • Interesting talk from Rommie Amaro about lessons learned from trying to run very large sims. They recently simulated SARS-COV2 virion in aerosol droplet. Two big take-homes were to

      • have good relationship with software devs, both folks doing the MD engines and supercomputers

      • Initializing sims can be really difficult. For small sims a small fluctuation in density will just get worked out, but in big ones those can easily lead to crashes.

    • Work from VVoelz’s group. One of his students found a problem with tertiary amines, I’ve invited them to speak at our upcoming FF fitting meeting.

    • Also VV group made BICEPS method for using bayesian approach to incorporate uncertainty from various sources in sims. Previously this was just available for scoring FFs against expt, but now it returns gradients so you can use it as an optimizer. This is pretty early work, not clear if it will scale up to a general FF on our scale. Preprint ETA in the next few weeks.

  • …

  • PB – Was Mackerrell group using SAPT dataset?

    • CC – I don’t think so… they had a list of datasets they were using but I can’t recall if that was there.

    •  

    •  

    •  

  •  

  •  

  •  

  •  

  •  

  •  

Action items

Decisions