2022-03-02 BespokeFit Meeting notes

Participants

@Joshua Horton
@Mateusz Bieniek
@David Mobley
@Daniel Cole
@Chapin Cavender
@Pavan Behara
Ventaka
@Lee-Ping Wang
@Simon Boothroyd

Discussion topics

Item	Notes

Item

Notes

Binding FE experiments with OpenFF 1.3 + GFN2XTB

(Will upload slides here)
JH – Tried to visualize trajectories to catch sampling issues.
- DM – Rotamer sampling issues for rgroups?
- (Needs to do more edges? How to do this?)
- MB – Uses RDKit to make the MCSes.
- DM – Could switch which mapping you get, and/or switch the coordinates of atoms
DM – One thing I’m thinking about is that TYK2 is “the system where everything works”, so I’m wondering what we’d see on the other systems.
LPW – An RMSE of 0.64 kcal/mol is really good. The parsley paper got around 1.0.
- DC – Yeah, JChodera’s paper with ML FFs also showed about half a kcal per mol.
JH – All the software is in place to run the whole JACS dataset, we’d just need more compute time.
JH – It’s frequently not worth it to drive amide bonds. Not sure how to automate skipping this.
- DM – It could be complex to figure out which ones to skip scanning
- JH – I’m thinking of caching torsion parameters for all fragments from QCArchive, and distributing with software. This could play well with the caching functionality that I recently added.
- SB – What’s the cost comparison between nromal FFs, XTB, and full QM?
  - JH – XTB is about 1 minute/fragment, full QM is hours. If you get clever with caching you can run the QM faster, but we haven’t incorporated that into bespokefit.
  - SB – Do we ever want to run full QM at this point? XTB seems plenty performant.
  - SB – PB, you’d experimented with XTB1 vs XTB2. What’s the tradeoff?
    - PB – On the set of our QM benchmark molecules (59 of them), wrt df-CCSD(T)/CBS//MP2-heavy-aug-cc-pVTZ baseline
      | Specification | RMSE in kcal/mol | MAE in kcal/mol |
      | gfn1xtb | 1.5271 | 1.0309 |
      | gfn2xtb | 1.2481 | 0.8554 |
    - JH – Orbnet paper compares GFN1 vs. 2. on torsionnet. GFN2 is a bit better, but it’s debatable whether it’s worth the cost.
    - LPW – Agree, it’s not clear how representative these plots are.

Bespokefit conda-force release

JW – Bespokefit is now on conda -forge. Looking for an initial tester before we ship it to swope!
- DC – I tried it yesterday and
  - I needed mamba to get it to install, is that a problem?
    - JW – I used to think it was, but I’m finding more and more that I need to do everything with mamba now. So I don’t think it’s that bad any more.
    - JH – Did you try to do psi4?
    - DC – No
    - SB – Current bespoke package doesn’t require psi4, targets toolkit-base to avoid AT/psi4 conflict. There was another dependency that had lots of pins, I talked to them and got that loosened up. I think it’s ready to send to Bill.
    - DC – Time to announce this to the world?
    - SB – Let’s get Bill to test it for a few days before we announce.
    - We’ll have Bill test bespokefit for 3 days, then JH will announce release.
  - It wasn’t clear to me how to have bespokefit use XTB instead of full QM.
    - SB – Agree – We need to document how to go through python to get XTB. We also need to add a commandline argument to get XTB.
    - JH – Agree, we’ll want to incorporate some pre-built commandline arguments.
JW – Just made emergency bugfix release for OFF Toolkit, unfortunately an unrelated change broke forceblaance, which means it broke bespokefit. So we need to make a new FB release.

Overview of vsites and functional form work

DC – Vsite work was started before we started with OpenFF.
(base don Exploration and Validation of Force Field Design Protocols through QM-to-MM Mapping )(
DC – sulfur lone pair vsites show good improvement
DC – F-C bond charge vsite keeps falling inside the fluorine, with a positive charge
- LPW – Would that basically be a sigma hole?
- DC – It’s the opposite of what we’d expect for a sigma hole - The region that should be getting positive is instead becoming negative. But fluorine is said to see the least sigma hole effect.
DC – Adding vsites reduces errors for densities and heats of vaporization
DM – Is it accurate to say that, if someone were interested in working with this in the main OpenFF force fields, they could start with your placement of vsites?
- MS – There are two questions for each vsite -
  - A discrete question of “should this vsite exist?”
  - Numerical questions of “what should the values be?”
- MS – I know SB has infrastructure to do the fitting.
- DM – So it seems like this is ready to start in OpenFF.
- MS – Maybe, my student Yu-Tang may be able to start here, but he’s just beginning grad school so it’ll take a good amount of training.
- MS – I think we need to pin down our electrostatics fitting procedure first. Will it be AM1, then fitting new BCCs? Without this we don’t have a foundation to begin fitting vsites.
MS – Is there a heuristic to decide which vsites should exist?
- DC – We only have a ~100 molecule set to help us decide, but these have informed us about where to put vsites. They largely indicate we need things like O and S lone pairs.
CC – Which MD engine did you implement vsites in?
- DC – OpenMM, using localcoordinatesite
- CC – We might think about, at OpenFF, that we’re concerned about the ability to export to other engines…
- (general – Support for our vsites in GROMACS is fine, but AMBER is unknown)
- DM – I don’t think this should block our research. If we fit vsites that are good thenw e can push to get them implemented in engines.
MS – If we agree that the goal of the study is to figure out which vsites SHOULD exist, then … Also figure out what the numbers should be.
DC – We’ve experimented with replacing LJ potential with double-exponential.
JH – Here are my fits with DExp - This is very preliminary! Mostly proof of concept to show that it could work.
Original sage fit

Densities (normal on left, DExp on right)

JH – Some stuff with absolv, not finished yet
- MS – I know how to fix that, can chat offline - Will send DC and JH a series of papers on it
- DC – I’ll take a look at this before the ad board meeting
MS – A few thoughts
- it would be nice to jump from 2 to 3 parameters instead of 2 to 4 parameters
- We want to get to the point where we can do bayesian inference on several different proposed functional forms (long term, 2 years). If you’re working along these lines it may be good to discuss this. But we basically need irrefutable evidence to convince people to switch funcitonal fomrs
- Where this matters is pressure dependence. This will make a difference as we model the repulsive component. So with ions and high pressure systems.
MS – To dos for me are
- send papers to DC and JH
- begin long-term planning for working together on improved nonbonded modeling.