2022-08-31 BespokeFit Meeting notes

Participants

@David Mobley
@Joshua Horton
@Daniel Cole
@Pavan Behara
@Chapin Cavender
@Jeffrey Wagner
Venkata
@Matt Thompson

Discussion topics

Item	Notes

Item

Notes

DEXP results

https://docs.google.com/presentation/d/15v2kR8WX2UZKmT5nRy29tWzYSXh-OY-PuTRLT4q6ADo/edit#slide=id.p
(B68 water model optimizaiton slide) -
- JW – It looks like the optimization got further form the QM. How did thathappen?
- DC – It was optimized to condensed phase parameters, not QM.
Slide 8
- CC – What does alpha-to-beta transition mean?
- JH – In the DEXP functional form, the potential is zero when alpha and beta have the same value. I’m using lambda scaling to change the difference between alpha and beta from the reference difference (lambda=1, alpha and beta have the same values as the DEXP force field) to zero (lambda=0, alpha and beta are equal)
DC – So in summary, we’re kinda disappointed in the inaccuracy in the HFEs using DEXP. The thinking is, when we turn to the training set, what are we training against? We’re trainign against pure liquid densities, mixture densities, etc… But none of those include the vacuum state. So we never worried about transferring molecules from vacuum to condensed phase. So Sage and predecessors have been developed for decades to get HFE right, whereas DEXP could be behind here.
- DM – I’m actually concerned about the extent to which the Sage line was built to get HFEs right. That may be a mistake because it could force errors into other calcs.
- DC – That makes sense, would recommend continuing and establishing whether this is an N=6 thing.
DC – While CC is here - We have this small molecule DEXP potential. We have a water DEXP potential. How far are we from protein DEXP potential?
- CC – I’m basically setting up ForceBalance fits to our training data. So you should be able to train something against the same training data. Right now I’m concerned about getting the fits set up/getting the SMIRKS right, and once that’s set up I can upload to a repo and share with Josh.
- DC – Awesome. We’ll try to stay away and not step on your toes, but please let us know once things are ready.
- CC – Great. I think it should be simple to swap out the functional form once the training is set up.
DC – Anything else that we should maybe check re HFEs?
- DM – Nothing is coming to mind, but I’m wondering whether there’s anything we can do to investigate whether using gas phase data forces errors into other properties? MAybe between data that Owen’s already generated and this data, this could lead to a paper. So, if we say that condensed phase properties are the “gold standard”, and we find that one kind of FF does worse on particular properties, that could be an interesting study/publication. For example, Schrodinger made certain choices about whether to include HFEs in their training.
- DC – So, start training to HVap and see if it hurts condensed phase properties?
  - DM – Yes. If that was true for multiple functional forms, that would be quite conclusive.
- DC – Another thing I hadn’t appreciated is how much compute power is needed to do the nonbonded training. In terms of “democratizing FF development”, should we look at “what is the minimal training set to do this right?”. Or is the answer likely just “you need 1000 GPUs”
  - DM – It’d be super valuable to start answering those questions. But it’s not necessarily your job.
  - DM – PB, have we done condensed phase training locally yet?
  - PB – Not yet, I recall that SB basically always used Lilac. Maybe CC could ship this his fits to multiple clusters and compare how they do.
  - JW – We’re having LW run CC’s calcs on Lilac, but I don’t think we should prioritize other sets while so much relies on Rosemary getting out. PRP is probably a bad choice due to pre-emptibility and high reservation levels. Oracle GPU compute seems unlikely to come through.
  - DM - Maybe run on UCI’s free queue?
  - PB – If JH sends me the inputs, I can run on UCI’s GPU free queue
  - JH – Is the idea to replicate the Sage fit? IIRC that was like 1000 GPUs for a week.
    - DM – I think so. We could try reducing the dataset.
  - …
  - DM –
  - PB – If all the inputs are ready, it’d just be setting up a conda env and submitting the slurm jobs.
  - DM – So yeah, if you can provide inputs we can see whether the free queue can get this done.
  - JH – Great, I’ll look at setting this up and passing it over.
- DC – Ok, my plan is to start writing this fairly soon, hopefully with transfer free energy.
CC – It sounds like Owen’s work on surrogate modeling may help speed up nonbonded fit. Though you may have to retrain some things to work with the DEXP form.
- DC – Interesting. Thanks! It definitely seems like our fits are getting stuck in a local minimum around the initial parameters.
- JH – We also had the idea of having different starting nonbonded parameter values derived form QM
  - DC – That was an interesting idea but let’s not do that yet.
PB – I wanted to ask about the difference between YANK and AbSolv
- DC + JH – There was a difference of about 1kcal/mol.
- PB – Was Absolv also using OpenMM?
- …
- JH –If we do this again, I’ll use the same dataset as Sage.

DC – As a reminder, the bespokefit paper has been put up for review/corrections. Be sure to review that ASAP if you were tagged!

2022-08-31 BespokeFit Meeting notes

Participants

Discussion topics

Action items

Decisions