2020-08-06 Force Field Release meeting notes

Date

Aug 6, 2020

Participants

  • @Hyesu Jang

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Sage release timeline

 

  • OM – Nonbonded parameters are ready for refit.

  • OM – If we wait longer, we may be ready for BCC refit. Unsure whether BCC refit will lead to substantial improvement.

  • DM – JM will do better with more time to do Wiberg bond order refits. So we’re fine to delay on that.

  • HJ – Haven’t solidified which improvement I’ll put into Sage. Also not sure how to solve gauche effect.

  • CB – In the big picture, the development of FF is a “low hanging fruit” approach, where we get 90% of the imrpovement with the first 10% of work. WBO torsions will be the most important. LJ corrections are second most important. VSites are third most important. Until we have those, things like BCCs will try to compensate for stuff like pyridines.

  • CB – SB had shown property calculations where LJ on H adjacent to carbonyl group wanted to be different. I think there’s a good basis for this, since there’s electron withdrawal happening. I hope that comes into the new LJ types.

  • CB – In the BCC refit, I had 305 independent parameters. Without our valence FF, we have bonds, and angles, which are manageable-sized, but torsions are huge and hard to wrangle. The BCC refit will be a large gap that’ll take a long time to cover. It’ll be areal stretch to do that for sage.

  • DM – We might think about “if any BCCs are refit, they should have clear errors”

  • CB – I have a list of those, will send my poster on this.

  • DM - Agree that it’d be good to deal with vsites in sage, but then almost all partners can’t use them because of parmed limitations.

  • JW – System object isn’t focused on better interconversion for virtualsites, I doubt it will handle the conversion problem in the next 6-12 months.

  • CB – I’m OK to drop it from sage. Let’s take this offline.

  • DM – Agree that we kinda paint ourselves into a corner, by having other FF terms pick up the slack that vsites could resolve.

  • CB – OM and SB, are you comfortable/confident with adding and removing patterns?

  • SB – That would be good to explore. My big concern is “how do we define when we know when we’re winning?”. My big concern is that improvement on mixture properties may correlate with decreased performance on protein-ligand. So, I’d like to have automated PL benchmarking more prepared.

  • DM – DH and HBM are the most automation-ish people for this. DH will have a lot of compute time available in a few weeks.

  • SB – That’s great. I’ll contact them

  • DM – If concerned, you can be more conservative (like, don’t vary parameters that aren’t well covered in the training set). .

  • SB – Then our answer is “we would like to play with chemical perception”. WRT BCCs, I agree with CB that we only want to fit ones that we know are really bad. CB identified a big win by taking some of the BCC parameters and doing WBO interpolation.

  • CB+DM – WBO interpolation for BCCs would be great.

  • JW – Implementation in stable release could happen in 0.8.1 or 0.8.2 (2 or 3 months)

  • SB – Now that we support external parameterhandlers, we could prototype this quickly.

  • CB – Do we have an infrastructure that says “first we refit LJ, then we refit torsions that may be affected”? If we are fitting both for sage, we’ll want to have that cycle pretty optimized.

  • SB – We did this previously – basically iterating between fitting nonbonded and fitting valence terms until they converged. Benoit Roux found that, in this sort of cycle, the torsions rarely changed much.

  • CB – In the AMBER world, you have to fit the electrostatics first, then the LJ to macroscopic properties, then the torsions. This would have to change if LJ were being fit by some bespoke method (which would put it in the same category as electrostatics)

  • SB – Initially we have started the cycle with a valence refit because it was ready on time. In future iterations maybe we should start with LJ refit.

  • OM – So, which date?

    • OM – Basically, we’ll know whether BCC refits play out in 2-3 weeks.

    • DM – Let’s decide then. If it doesn’t look production-ready, then we’ll plan sage without it.

  • JW – Lesson from 1.2.0 release is to allow a lot more than just base compute time – computations will fail, people will make mistakes – We should set timeline at around 3.5x anticipated computational time, so we can fail everything twice, and then have some time on the side for human intervention

  • CB – Same or different QM data?

  • HJ – Will we do a full theory benchmark? And rerun everything at higher level of theory?

  • JW – We can just do whatever’s ready

  • CB – Do we have solvent in QM?

  • DM – OM and SB, are you fitting to QM ESPs?

  • OM – Yes. Unsure which level of theory to use, and whether to do something like Schauperl’s RESP2. So we want to run BCC refit against higher-quality gas phase basis set and HF-6-31G*. No clear decision on what will trigger different decisions.

  • DM – Would be good for you to assemble Scauperl and Nerenberg. Both of them came to similar conclusions.

  • CB – Relevant paper is https://pubs.acs.org/doi/abs/10.1021/acs.jcim.9b00962

  • JW – Do we have the ability to do solvent QM?

  • CB – There’s a PCM model in psi4, but it’s not great.

  • JW – This could be a big blocker – We don’t have the expertise to implement this, so if QCF won’t do this for us, it may be very difficult

  • SB – We could do this in experimental infrastructure to see whether it’s worth pursuing.

  • CB – Simon, does your xtalpi affiliation get us access to more capable QM tools to get ESPs in the short term?

  • DM – Problem isn’t running it locally – SB can do it already.

  • SB – Xtalpi is collaborating with us on this, so we can get their support for this project.

  • DM – I’d love to see more of xtalpi work.

  • OM – Which molecules do we use?

    • OM – Greedy search over chemical diversity + Parsley fitting set. Woudl also love to see CB’s poster.

    • CB – So, you’re looking to do ESP fitting on BCCs in the space of molecules in training set. But it’s hard to know if results will be extensible to all of chemical space. How are you going to scientifically span the gap of not having macroscopic property data for everything?

    • OM – Good question. We’d do a proof-of-concept targeted fitting of “known deficient” BCCs (from CB’s poster) for sage. There’s a question of how we’d bridge the gap to all of chemical space, and I don’t have an aswer to that for all physical properties. The advantage of fitting BCCs to QM data is that we’d hope to “paper over the cracks”. Not a fully solved problem. If we end up just doing LJ for sage, we’ll encounter the same thing

    • CB – So you’re saying that you have to carry forward two apradigms. One is “stick to tradition and use QM properties”. Other is “wouldn’t it be a better world if we had a model where we could simultaneously fit BCCs and LJ, including macroscopic proers in optimization”. What you’re saying that that the second is al alternative approach, and that we need to have a plan before we embark on it. The first paradigm is useful in the short term. The second one is aspirational, and we can gradually move there. There’s a question of whether we need a plan for all of chemical space tog et anything done.

    • OM – Trying to do a “full chemcial space refit” previously didn’t get us where we wanted it to. There seems to be a philosophy of “dump in more data and it wil have to show improvement”, but there will need to be a lot of targeting in sage. If it just does LJ.

    • OM – If we decide that any of these will require more data, we need to start collecting that now (even for rosemary). It’s obviously easier to seek out new data, but, for example, in looking into Hvap, we found that data “ages” poorly, and references dissapear or become scrambled.

  • SB – Probably best to asusme that BCCs won’t be included in Sage. We can aim for ABO torsions and LJ refit. These additions can be decided at a late stage

BCC refit ESP dataset selection

Owen/Simon



Action items

@Jeffrey Wagner will inquire into availability of QM solvent models
@Simon Boothroyd will make WBO ChargeIncrementModelHandler, with @Jeffrey Wagner support
@Owen Madin and @Simon Boothroyd will reach out to Schauperl and Nerenberg to catch up on RESP2 conclusions.
@Christopher Bayly will send @Owen Madin and @Simon Boothroyd BCC poster
@Simon Boothroyd and @Owen Madin will report on BCC refitting in 2-3 weeks (late August)
@Jeffrey Wagner and @Hyesu Jang (and anyone else involved) will summarize a “fitting process” postmortem, with recommendations for how to run subsequent rounds.

Decisions