2022-03-17 Force Field Release Meeting notes

Date

Mar 17, 2022

Participants

@Pavan Behara
@Trevor Gokey
@Chapin Cavender
@Daniel Cole
@David Mobley
@Diego Nolasco (Deactivated)
@Joshua Horton
@Jessica Maat (Deactivated)
@Matt Thompson
@Michael Gilson
@Owen Madin
Paul Nerenberg
@Willa Wang
@Jeffrey Wagner

Discussion topics

RECORDING of meeting at : https://drive.google.com/file/d/17myrAosHqrfFUEAHS04eJ6NLikDorMat/view?usp=sharing

Time	Item	Presenter	Notes

Time

Item

Presenter

Notes

10 mins

@Simon Boothroyd

AM1BCC -> AM1BCC-ELF10 charges warrant a refit of LJ types?
SB – At last week’s biopolymer ff call, there was discussion about whether the FFs and toolkit should use AM1BCC ELF10. Previously, in the Toolkit (before version 0.7.0), applying Parsley to a small molecule would give you AM1BCC ELF10 charges if you had access to OpenEye, and single-conformer AM1BCC from ambertools if not. A regression introduced in OFF Toolkit 0.7.0 removed this behavior. Then, in the project plan for Rosemary, it was determined that charges should be AM1BCC ELF10. There were implementation concerns about always using ELF10.
SB – But the proximal question was “If we use AM1BCC ELF10, do we need to refit the LJ?”. Previously we’d said “it’s chapin’s choice, but it’s fine if we don’t refit LJ”
- MS – I’d argue that we do not do an LJ refit.
- SB – It’s also important to compartmentalize “chapins project” and recognize that it is different from “make Rosemary”. Chapin isn’t responsible for making the final production FF.
- …
- MG – The idea of using vsites would also necessitate an LJ refit anyway.
- SB – That’s true. I’d advise that Chapin continue without a LJ refit, since there are a lot of degrees of freedom and they shouldn’t slow down his work.
- DM – ELF10 is only important for big flexible molecules, and our LJ training set is all small, mostly rigid molecules, which wouldn’t show a difference between single-conf AM1 and ELF10.
- SB – Agree. And I’d reiterate that CC and MG are the project owners, and they can stick to their written goals.
- DM –
- MG – Agree
- CC – Agree. I think “just do a refit” is cheaper than “do a study to deteremine whether we should do a refit”
- MS – I advocate not doing a LJ refit
- Decision – Because chapin’s work is focused on defining the PATHWAY from going from a small molecule FF to a protein FF, it’s not necessary to do a LJ refit after putting ELF10 charges in.
  - (unanimous agreement)

5 mins

@Trevor Gokey

Importance weighting parameters during fitting
Slides: A subset of the slides uploaded in these meeting notes 2022-03-14 Chemical Perception meeting notes
Slide 3:
- MG – Is there evidence that these parameters are actually causing a problem?
- TG – …?
- MG – Ok, that makes sense. I see why these parameters are being considered.
TG – I’m curious what the goalposts are to determine whether these parameters should be added to the FF.
- JW – This question is probably largely up to SB.
- DM – We had previously had the policy of “if it fixes the problem and doesn’t lead to an increase in overall benchmarking score elsewhere then it should go in the FF”
- OM – Agree.
- DM – I realize that we haven’t defined how much improvement is “enough” to justify a new release. As long as there’s statistically significant improvement on a test set.
- OM – Some of the bayesian inference stuff looks at this, where there’s a tradeoff between how many parameters there are and the amount of improvement.
- MG – How would we define “statistically significant”?
  - (General) – We’d want to measure accuracy in QM geometries and energies, and do bootstrapping to take different samples of the test set so that we get a value and error bars.
- MG – The test set will need to be “diverse enough”. But we don’t have a definition of what that means.
JW – I’d planned to work on a major refactor of industry benchmarking tools to make them suitable for internal use and rapid iteration on FF. Yesterday at the advisory board meeting, there seemed to be consensus that benchmarking isn’t as important as making vsites. If that is a reflection of my priorities, we should focus on interchange exporters that support vsites rather than benchmarking tools.
- DM – For general info, the advisory board basically said we were too hesitant about proving that we should add vsites,
DN – Will sage have vsites?
- DM –
- MS – A functional form change should be “rosemary”
- …
- DM – (In the context of supporting exporters to different formats) We don’t need to support vsite export to charmm.
- DN – Will rosemary have vsites?
  - SB – The current plan is “if vsites are ready”. We’d need to see statistical significant improvement before we implement that.
  - MS – There should be clear motivating rationale for the methods that we use.

15 min?

Virtual sites?

@Simon Boothroyd

Plan for virtual sites and electrostatics in Rosemary. A tentative plan is here: Electrostatics refitting plan for Rosemary. This is a summary of what I’ve heard, and needs updating from people actually doing the work.
SB – My general plan is to, once the infrastructure is in place, to come up with a project plan about how we’re going to implement vsites. So I think major discussion should happen once I have a draft plan. But I have some things to discuss now
MS – The plan above is a starting point from me, SB is in charge and can overrule any part of it.
SB –
- Infrastructure needs.
  - Openff recharge handles taking a set of molecules and getting ESPs from their wavefunction and has a variety of other utilities. This allows fitting of librarycharges, charge increments, and vsites (though it can’t optimize vsite GEOMETRIES since those are nonlinear in optimization space)
  - Forcebalance can refit every parameter mentioned in the above bullet to physical data, and QM/electrostatic data.
  - So, the infrastructure is lined up for our needs
- Data:
  - I’ve got a large set of chemical fragments with diverse chemistries that should be suitable for training. I gave a talk about this, and there’s a video of that talk on the #charge-models channel. Ping me on slack if there are any questions. There ate 60k ESPs that we can train against. We can use these to train BCCs, vsites, …
  - Notes & video links of previous talk: 2022-01-28 GCN charge models meeting notes
    And a recording of the meeting here: https://drive.google.com/file/d/12n3yYRd2VN0_ZNtBh6DdmNChWlc-cg-L/view?usp=sharing
  - For Testing, we have the ~10k molecules from the industry benchmark set, and I’ve generated ELF10 conformers for them, optimized, and computed wavefunctions. These are all on QCFractal. However the process of getting ESPs from that set takes about 3-6 minutes per molecule. So that should be a good test set.
  - MG – Will the ESPs that you’re computing be stored somewhere?
  - DM - Zenodo is nice because it gets a DOI and is maintained by a grant-funded library system; not good for pulling down bits and pieces, but also is free and maintained by a library. So I advocate that as at least a “public facing archival” solution.
  - SB – The file size is too large to store on QCArchive. For the fragment set I don’t think it’ll be too large, because the molecules are small. But the industry set will be large.
  - JW - Karmen had set up an aws account recently and we can make use of it if needed but the bandwidth costs must be assessed.
- Scientific studies
  - SB – Not a ton of planning on this, MS’s document above would be a good starting point for discussion .
  - SB – The first round of fits that we do should be limited to sp2 nitrogens, both in and out of rings. DCole suggests also doing sp3 carbons, like amides and amines
  - SB – I was thinking we could do two fits to begin with:
    - Fitting a bunch of vsites on top of AM1BCC. We should be able to do that pretty quickly, and have some basic results for the next meeting.
    - Fitting both vsites and bccs simultaneously. This gets into high-dimensional space and that’s dangerous.
      - MS – Wht would you think about not allowing geometries to change in these optimizations?
      - SB – DCole and CBy had mentioned some suggestions about using restraints, which could help in this direction.
      - DC – About the restriction on “how far vsites can move from atoms”:: What I’d recommend would be a physical property check about how far AWAY they can get from their parent atoms. So we’ve got some simple rules for this, like vsites can’t be more than 0.8A from a nitrogen or 1A from an oxygen.
      - SB – That sounds good.
      - MG – So there’s a mechanism to ensure that total charge is maintained as the vsite is added?
      - SB – yes. The vsite must take its charge from the parent atoms.
      - MG – Just one parent atom?
      - SB – No, sometimes multiple, but currently I’d plan to just take the charge from one parent atom.
  - SB – One potentially “controversial” decision that I’ve made is to use HF-631g*. I made this decision based on computational cost. I’d need to see an alternative method of similar cost with clear improvement to change this decision.
  - MG – What’s the role of the BCCs here? If we fit base charges to 6-31g* ESPs, then apply BCCs…?
  - SB – So, our training charges would come from a HF631g*. Then our method would choose a cheap method (like AM1) and apply BCCs and vsites to match the HF631g* ESPs.
  - …
  - JW: In the process of switching interchange code to toolkit I found two bugs in vsites implementation, we should resolve those first before generating a ton of data/doing significant work.
    One bug is where a vsite is applied backwards, second one is regarding hierarchy of vsites.
  - SB: Let’s sync up later.
DM – One logistical question that came up the other day is “who is driving vsites in a production FF” and/or “who is in charge of Rosemary”, and the answer seems to be “Simon”. So it’ll be really good to have as much written down as possible to ensure that Lily can take over the transition losslessly
- SB – That’s good … May
CC – Earlier we had said that we want vsites in our FFs, even if we lose CHARMM. Is that decided to be the tradeoff? Does this seem to overweight the needs of our advisory board over the general community?
- DM – This is kind of a “if you build it, they will come”. We expect that GROMACS and AMBER will support our vsites. We need to have a compelling reason to encourage CHARMM devs to add vsite support.
- MT – I agree with that logic. I think AMBER and GROMACS do indeed support our vsite styles (though AMBER does not have a consistent spec for vsites, so they need to improve that, and we can’t improve it for them).
- DM – Yes, we need to encourage the field to implement support or define their spec.
- SB – Should we email AMBER to tell them that we’re planning to do vsites in our particular styles, and that we’d like to get that supported?
- (General) – yes
- DM – I’ll email Dave Case and Daniel Roe, and initiate a conversation with JW.
- MG – It’d be good to make FFs both with and without vsites, and measure them against each other to determine how much improvement we get.
- DM – That would be cool, and we should do it if we want that information for a study/paper, but we’ve basically been told that people want vsites in our FF whether or not we do this study.
- DN – Would we consider this being “conservative” or “cautious”?
  - DM – I think industry people have frequently had problems with academic collaborations, and they want to encourage us to be less cautious and work faster.
  - DN – Were vsites in the plan for rosemary?
  - DM –
  - JW – There are some questions but one important component of the course of action is that “chapin will not put vsites in his work”
  - DM
  - DC – operationally, Vsites have ended up requiring a LOT more work in our experiments. So if there are already other unknwons with Rosemary, recognize that adding vsites will add a lot fo complexity on top of that
  - DM – Maybe DC’s learnings from that process will make it simpler for us. So we haven’t decided to put vsites in rosemary in this meeting.
  - SB – Following the WBO torsions episode with the Sage release, I think we should not guarantee vsites in rosemary. That doesn’t mean that we say that they WON’T be there, but we reserve the right to make the decision at a later time once more is known.
  - DM – Totally agree.
  - JW – Agree with SB
  - MG – Did we end up at the same place as before the ad board meeting?
  - JW + DM – No, we’re going to prioritize this work, which we weren’t goipng to do before.
  - DN – We may want to limit/push back on ad board requests. This could easily become disruptive of ongoing work.
  - SB – Agree. As a historical note, after the sage release, I wanted to finish three things before I left OpenFF. Those were bespokefit, vsites, and graph charges. So this isn’t a major redirection, and my work in the previous months was already going in this direction.
  - DM – The ad board advice was valuable because it turned out that I had a very different view of what they wanted than what tehy really wanted.
  - DN – Ok, I’ll take that as the new direction. But I don’t see that a process happened for us to change our priorities in this case. In the future I’d really like there to be a process that happens when we shift org-wide priorities like this.

Meetings

2022-03-17 Force Field Release Meeting notes

Date

Participants

Discussion topics

Action items

Decisions