2024-05-30 Protein FF meeting note

Participants

  • @Chapin Cavender

  • @Michael Gilson

  • @Anika Friedman

  • @Brent Westbrook (Unlicensed)

  • @David Mobley

  • @Alexandra McIsaac

  • @Lily Wang

  • @Pavan Behara

  • @Jeffrey Wagner

Goals

  • QM parameter fits with alternate nonbonded parameters

    • Amber ff99 RESP charges and Lennard-Jones parameters

    • NAGL charges

  • Supplementing training dataset with PDB structures

Recording

https://drive.google.com/file/d/1BEpjW19UWwaNFWF127vb5SmIo-RYkCUl/view?usp=sharing

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Fits with alternate nonbonded parameters

 

@Chapin Cavender

  • CC will link slides here

  • Slide 3: can compare objective functions with Null-0.0.3-SP

  • JW: is ff99sb the same as 14sb NB parameters?

    • CC: Yes, they haven’t been changed since ff99

  • JW – Kinda a bummer that the SP line of fits didn’t work

  • LW – Might have missed this, but did Null-0.0.3-SP also maintain GB3 structure in benchmarks? Could the failure of the QAmb/NBAmb/NAGL NBs be more related to the SP protocol than the NB parameters?

    • CC – Yes. Null 003 SP shows unfolding of alpha helix in all water models. So it’s possible that the SP protocol is insufficient in all cases.

    • CC – So one idea is to redo all these fits using the full minimization routine. But that would take weeks for each FF. Instead, I could use a modified objective function that compares pairwise differences.

    •  

  • JW – Why not just use the FF14SB objective function?

    • CC – That objective function was just used for sidechain optimization. Also it makes sense here to use RMSE instead of MAE, and use weights based on barrier height.

    • MG – Could do multiple things at the same time? But start with the one at the bottom of slide 15

    • CC – Yeah, would just take more of my time.

  • JW – Possible that NAGL is doing something funny with larger molecules?

    • CC – Not sure if there’s a good way to check that other than what I’m doing.

    • DM – Could look at a by-residue charge difference as a function of sequence length

    • LW – I looked at 5-15 AA, but very few 15-AA-long chains due to the need to compare to OpenEye/AmberTools! We could do some more comparisons of larger proteins just comparing to LibraryCharges too.

    • CC – I can look into this. Based on these charts it’s not clear that NAGL is doing dramatically worse than anything else.

  • CC – General consensus that next steps are to try modified objective function?

    • MG + JW – Yes

  • LW – Just going back to the human vs computer time of redoing a NB fit with the minimisation protocol, is this something more compute would help with? e.g. would it be helpful to try doing a minimisation run on UCI resources in the background while Chapin focuses on other directions first?

    • CC – Yes, this would help, I could run the expensive minimization fits on UCI if that makes sense for everyone

  •  

Supplementing training dataset with PDB structures

@Anika Friedman

  • AF will link slides here

  • MG – How many 4-mers are there in our original dataset?

    • AF – 30kish

    • CC – It’s 70 torsion scans, most of which are 2d with 24 points in each dimension, so 578. 70 x 578 = 35kish

    • MG – Another thing we could do is to take away some of the other ones.

    • CC – I think the overweighed bins in delta and beta are where I fixed the backbone to do sidechain scans.

    • MG –

    • CC – For the sidechain torsion scans, only the sidechain torsions should be allowed to vary, since the selection of parameters is based on which dihedral was driven.

    • MG – Do we know, if you’re fitting a sidechain torison

    • CC – Roughly 1/3rd of the 35kish confs are from backbone scan

    • AF – Maybe we should remove sidechain torsiondrives from this analysis if they’re not being used to inform backbone parameters.

    • MG – Another thing is, does it matter if we’re optimizing SC params in the presence of a alpha/delta vs. beta backbone?

    • CC – We did the sidechain scans in the context of two backbone confs to avoid biasing toward one, but also without needing to do a 4D scan.

  • CC – I think we don’t want to have the sidechain scans in there, are we just want to look at the prevalence of the backbone scans.

  • MG – It’s also worth considering how to mix in the new peptide structures.

  • JW – How confident are we that the sidechain scans only affect the sidechain fits, and vice versa with backbones?

    • CC – About 50%, still need to look into this.

    • JW – Might be worth throwing out the sidechain scans and fits altogether, and just doing backbone fits to backbone scans.

    • LW – I hope I was getting the gist of the previous discussion correctly… but I don’t think there’s anything preventing side chain torsion drives from affecting backbone parameters and vice versa in ForceBalance — would have to check by looking at the code.

    •  

  • AF – If the backbone torsions are just being optimized to backbone scans, then … the weighting of the fit could be thrown off by having lots of points in a certian region of backbone angle space (the sidechain scans)

    • …

    • MG – Maybe the sidechain scans should be redone sampling alpha helical backbone angles?

  • MG – Could … compare more to backbone population distibution.

  • AF – CC and MG, let’s schedule a call to discuss how to handle this in the future.

  •  

  •  

  •  

Action items

Decisions