2024-04-10 FF Fitting Meeting

Participants

  • @Brent Westbrook

  • @David Mobley

  • @Chapin Cavender

  • @Jeffrey Wagner

  • @Alexandra McIsaac

  • @Pavan Behara

  • @Lily Wang

  • Andrea Bortolato

  • Chris Bayly

  • @Matt Thompson

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Torsion multiplicity update

Brent Westbrook

  • Slides:

  • Recording: https://us06web.zoom.us/rec/share/uM9Ga5GIAaG3tnYQRM0zbTSCdqLiiqlZgDIkp8gQGcihsd0rCBFskXTj87TreGDL.Q0t5T6CwjCdJADVM?startTime=1712779339000
    Passcode: 7YQF0?%O

  • AMI: with aggregate statistics, I found medians more helpful than means due to large outliers

    • DM: agree this would be interesting

  • JW: to check, this is with explicitly enumerated multiplicities

    • BW: yes, this is separate from idivf stuff

  • PB: how many of the new terms aren’t covered in the benchmark set?

    • BW: 29 torsions out of 220 aren’t covered. Some are old from Sage 2.1

  • CB: with the valence angle deviations, it’s encouraging to see where most of the data lies, but with these large outliers > 5, are we spending effort on these?

    • LMI: I’ve taken a bit of a look at this. Some of this is related to hypervalent sulfur parameters, where a lot of angles are grouped together. Also, some of this is that there are huge outliers in a 7-membered ring with weird QM configurations. Not sure if it’s filtered out here.

    • CB: perhaps next is this hypervalent sulfur with lone pair

  • CB: perhaps as we’re splitting out the torsions, we can do an assessment of whether multiplicities need to change at that level, earlier in the pipeline

    • BW: agree

  •  

 

Andrea Bortolato

  • (~14 min into recording)

  • What’s the direction of BespokeFit? Is there a way to compare the torsion in the molecule vs the torsion in the training set to recommend that torsions are re-parameterized, the same way Schrodinger does?

    • DM: we think that’s a good idea. We’ve not had the bandwidth to do that yet. We don’t think it should be very hard.

  • AB: I recall BF creates v specific torsions that will only apply to that molecule because it keeps appending. Is there a way to make it a bit more general? Would it be worth it to put a whole library through BF at once, if there’s infinite compute?

    • JW: this would be a great idea, but it is hard to “soften” smirks. We did just put out a dataset of torsiondrives of common drug cores, so if you can pull that down, you can use one of those torsiondrives instead of running your own. That’s not an answer to your question but it’s an answer to a question I’d rather answer

    • DM: what kinds of things do we prune out before doing torsion fits for the general force field?

    • PB: we prune out strongly h-bonded molecules which have steric clashes

    • DM: we try to prune out some data that incorporates forces that shouldn’t be covered by the torsion for a good general torsion. BF gives up on this and includes it

    • AB: could I use a FEP-type test where I use normal Sage, and Sage+ for every molecule where I run bespokefit

    • DM: our perspective on the qn, “can I take a specific force field and generalize it”, is that you shouldn’t – but that would be an interesting experiment

    • DM: we’ve done some work where looking at molecule-specific parameters can improve our general ff

  • CB: the general OpenFF force fields have very few parameters, compared to other FFs. It suffers from being too general. In the extreme with unlimited compute, I would bespokefit every mol. In between these 2 extremes, is cheminformatics questions, and useful scenarios e.g. congeneric series. The scenario DM mentioned with steric clashes, is why I would go with bespokefit parameters every time. I would also prioritise better vdW parameters, which would lead to better general FF. The balance between generality and specificity depends on how much compute you have.

    • AB: it’d be nice to have a tool to decide this

    • DM: Roi/Psivant decided they’ll just custom-fit all torsions

    • CB: the decision between those two extremes boils down to a science decision, can you generalise these torsions. BF fragments molecules, so there should be common fragments and therefore torsions in a congeneric series

    • CB: can BespokeFit check previously created torsions to see if new common torsions already exist?

    • JW: yes, has to be same Executor. There may also be a way to stop it and start it back up. Can find answer either in docs or Josh Horton.

  • AB: wondering about Schrodinger approach for saturated ring pucker. Are you exploring reparameterizing this based on this?

    • LW: not immediately clear how we would deal with this so we haven’t looked at it specifically

  • CB: wrt our training and benchmark sets, could we take these saturated rings in our QCArchive between different ring conformers and look at ddE?

    • PB: we have in-ring torsions to tackle some of the heterocycle conformations

    • DM: when we generate conformers for rings, we do generate conformers

  • AB: is there a new type of ring that’s not in the training set, it may not be very accurate. Same idea as torsions

    • CB: SMARTS parameters tend to generalise well. Would expect SMIRNOFF ffs comparatively to be doing quite well. I’d worry about unsaturated rings more.

  • CB: do we have an easy tool to plot QM vs MM minima?

    • JW: no, should be straightforward script

    • CB: could these be a useful tool to decide whether I want to run bespokefit?

    • AB: the high-level question is can I predict how reliable this FF is for the system I’m interested in? At the end of the day my aim is not to get a general FF, it’s to get the best results possible with the time I have. I want to estimate the probability of error.

    •  

Action items

Decisions