2021-08-05 Force Field Release meeting notes

Date

Aug 3, 2021

Participants

  • @Pavan Behara

  • @Simon Boothroyd

  • @David Mobley

  • @Daniel Cole

  • @Jeffrey Wagner

  • @Christopher Bayly

  • @bill swope

  • @Hyesu Jang

Goals

  •  

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

 

 

  • CBy – DC, would you use modified seminario method for torsions?

    • DC – No, we use it for other terms. We would use a different method for torsion assignment

  • CBy – I’m interested in seeing how the modified seminario mehtod that can come up with esitmates of what a parameter should be, wihtout needing prior parameter definitions. What if we did this and tried to find sub-populations of parameters, instead of trying to find residuals of parameters. For example, take a particular parameter, take all the modified seminario estimates for what that parameter should be, and see if the values fall into particular bins.

    • SB – That would be super cool. I think JHorton started doing this, to see if we’d seemultimodal distributions of parameter values. Did more come of this?

    • DC – I don’t think so. We got as far as a big plot shown previously that showed distributions and the average value. I’m not sure what the next step would be

    • CBy – The idea would be to fit molecuels using modified seminario methods, and then compare the actual assigned values compare to the other places where that parameter would be applied, as defined by Sage or something. So we’d see a plot of “bond lengths where b49 would be applied, compared to the single value Sage would have assigned”

    • DC – Cool idea, but I’m not sure how to automate that.

    • By – TGokey could take a look at this.

    • DM – I’ve also considered looking this as a “gaussian mixture problem”. So if we see a bi- or tri-modal distribution of parameter values, we can imagine that we’re looking at a gaussian mixture, and figure out which chemistry underlies the different peaks/normal distributions. The problem here is that the overlap regions would be considerable, and so we’d need to have a plan for separating the chemistries in those overlap regions

    • DC – Our forays into Espaloma is kind of trying to address this problem as well.

    • DC – What we’d want to do to start this off would be to find a type/chemistry that we expect to find this multimodal distributions for. I’m envisioning a paper in this direction, where a simple use case is “taking the average and using that in the FF”, and the complex case is “using espaloma and avoiding typing altogether”

    • CBy – Toy system could be the bond between two sp2 carbons, which should range from single to double bonds, with all the aromaticity intermediates in between.

    • DM – There’s a plot in the Parsley preprrint figure 10 by Trevor, on “why b7 should be split”, showing the different environments for carbon-nitrogen bonds

    • DC – I don’t see anything obvious in our intiial look at that, but I’ll make a histogram and see if we see this sort of distribution.

  •  

15 mins

Re-number force field terms in SAGE?

@Simon Boothroyd

  • SB: Des renaming parameter ids cause user confusion, it is helpful for chemical perception

  • DM – It would cause some user confusion for things to be out of order, and the current state has numbers appended with letters. TG got burned by this (assuming that parameters were ordered)

  • CBy – There’s been value for us in identifying these by ID. So, if we have dictionaries between releases we could work around differences. But one thing that we DO use is the similarity in name (eg t49a is like t49b), so I would like to preserve that, or have a way to back-map to it if possible.

  • DM – We’ve thought about encoding hierarchy explciitly in SMIRNOFF. Generally, the as and bs don’t necessarily indicate hierarchy.

  • CBy – SMIRKS as identifier wouldn’t be human-readable. What if, for example, for torsions, we used SMIRKS around central bond?

  • SB – That’s kinda the problem – There’s two main cases: We need an ID for the computer, and we also need an ID for a human to look at (SMIRKS is bad for the latter). My proposal is mainly regarding the hash of a parameter, ideally the OFF Toolkit could expose a parameter hash that’s a funciton of its physical values, and the SMIRNOFF section it’s in (which may contain its own settings).

  • SB – One problem that we’re hitting now is that the spec doesn’t enforce the uniqueness of IDs. For example in the Sage release candidate, the parameters for TIP3P water actually collide with main-line FF parameters.

  • JW: In addition to the physical values and smirks the id might be indicative of hierarchy

  • SB – I’ll clarify exactly what I mean for a “parameter key”. In FB it’s common to say “i want ot use a particular prior for this one parameter” and it takes the form of Bonds/SMIRKS or id/k So I could use more guarantees about the uniqueness of that central ID or SMIRKS. So we have two needs - Unique keys for machine lookup, and unique keys for human use, and they may not need to be the same.

  • SB – I could see a problem coming up with changing the system where we talk about a problematic parameter in Sage, but the equivalent parameter in Parsley has a different ID.

  • PB – I’d prefer to keep the same parameter IDs. I think it’ll be useful to plan a ID break when we do the biopolymer FF.

  • SB – As a starting point for iteration, what about a hypothetical openff-1.3.2 that has the new parameter IDs? Then it could be used as a lookup key.

  • JW – To some extent, this is going to set a precedent for how we renumber things and leave breadcrumbs for people looking to relate before- and after- parameters. What about in the future, if a new parameter DOESN’T map to an old one?

  • CBy – It does seem like a major version bump is the right time to change IDs if we’re going to change them. A major version bump should be seen as a “derivative” of the previous major version, whereas a minor version bump should be seen as more incremental.

  • CBy – I think we’ve given into the temptation of referring to parameters by ID. Maybe we need to change our conventions to not refer to parameters as specific IDs, instead maybe by referring to them by SMARTS.

  • SB – Agree

  • (Decision) – We’ll renumber/rename Sage parameter IDs, and leave a sort of dictionary structure to map from old to new parameter IDs.

  •  

 

 







  • SB – Process for renumbering/renaming parameters in sage?

    • JW – We should make an rc2 release of the openff-sage repo, to correct my mistake. Then we should make an identical “full release” of openff-sage, followed by a “full release” in openff-forcefields, followed by package updates on conda-forge.

 

 

 

  • SB – What are goalposts for sending interchange in openeye? Would you want to see energy equivalence for a big box of molecules?

    • CBy – T4 lysozyme in water with a ligand. Can it make the parameterized object that we can make into an OpenMM System?

    • CBy – The thrombin in the JACS-15 dataset, 2zff, has a sulfur PTM. I’d like to see a pipeline to parameterize this, and see that the energy via ParmEd matches the energy from Interchange.

    • JW – I can’t put a timeline on the second one – REquires a lot of big deliverables that aren’t in the first release.

    • CBy – Maybe we want to look at interchange adoptions in a staged way, with simple protein-ligand stuff initially, and then keep ParmEd for advanced use cases.

    • SB – We could start building towards these, and prioritize a to_parmed method. The important thing here is that we can prioritize things based on your goals.

    • JW – Should TGokey be testing interchange?

    • SB – What’s the goal? Energy equivalence? Usability and information content? The latter is trickier.

    • JW – Probably the latter

    • SB – I think my work on smirnoffee kinda did this.

    • JW – Maybe a hackathon this year focused on playing with interchange?

  • JW – Current plans are to have a interchange <--> ParmEd import/export route, but to deprecate it ASAP.

  • CBy – The Interchange <--> parmed route would be useful to keep around. ANY organization that current uses parmed will want this interconversion until they interchange is fully functional/validated/stable.

  • CBy – Our current usage of ParmEd is in the openmm_orion repo, public on GH. We’d like to start replacing these usages of ParmEd with Interchange, maybe gradually if necessary, but to eventually remove it altogether.

  • CBy – There’s a “floe” called something like “proteinligandMD”. When that can run using Interchange instead of Parmed, then we’ll be over the finish line.

Action items

Decisions