BESMARTS and the OpenFF Roadmap

Participants

  • @James Eastwood

  • @David Mobley

  • @Lily Wang

  • @Trevor Gokey

Goals

  • Determine whether BESMARTS can automate some or all of chemical perception for forcefield development; “infer [typing for] a FF from scratch” even if it’s not the whole FF, even if eg just bonds and angles

  • See how the science and engineering fit with the timeframe of Trevor’s education / career plans

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

How well do we expect BESMARTS to work for this?

 

  • TG: Has done some of this before

  • TG: fitting is based on old code that needs to be refactored, but splitting part is at a decent stage

  • TG: Difficulty depends on what we need depends on what level we want to do, e.g. just bonds, just angles, just torsions etc. Bonds and angles pretty straightforward once engineering aspects in place.

  • DM: what do you think is realistic or achievable in a scientific sense, in a few months? Assuming e.g. splitting on vdW would be complex due to fitting to physical props

    • TG: never pulled the lever on a 3-4 week fit if we want to start a FF from scratch (if engineering is done, full FF could be fit, including torsions, in 3-4 weeks depending of course on the size of the molecule set; could get further/faster if starting from Sage).

  • JE: Engineering: How much involved? Or who could estimate it/what other info needed to estimate it? TG: Path he’s on right now doesn’t use toolkit; hooking in toolkit might make it faster. BUT the hard part is puppet master of FB runs because you have to do so many at the same time. 10k lines of code relating to this and it is old, so might be easier to just do from scratch.

  • DM: how much of the engineering work do you expect to do yourself before graduation?

    • TG: I’d like to get this done as proof-of-concept of building a transferrable FF from scratch, but may not implement everything e.g. TorsionDrives

    • DM+TG: but perhaps lower priority

  • DM/LW: Suggested MVP just bonds and angles. But we need the help the most on torsions. Torsions would be really nice.

    • TG: And angles may be overfit to the torsions in Sage. Could just fit the impropers because there are few and could provide significant gains.

  • JE: how does this scale with the size of the molecule set?

    • TG: multiply general FB fitting time by 100-300 because it’s evaluating many different FF optimizations. For torsions, maybe higher

    • JE: is that a linear factor wrt molecule set?

    • TG: scaling comes from diversity of parameter matches

  • TG: first priority is to build up capability for bonds and angles.

    • JE: how much of this can be reused for torsions?

    • TG: in original implementation, it overlapped a lot

  • TG: when fitting bonds, they’re v sensitive to reference forces. In general we haven’t really focused on force targets, just opt geo etc.

  • DM: can OpenFF help with the engineering side?

    • TG: 2 directions:

      • 1. designing dataset to use for fitting parameters from scratch – current set probably has a bit of redundancy so can be reduced

      • 2. writing some quick handy functions to go between OpenFF formats vs besmarts formats may be useful

        • e.g. writing code to interop with openeye SMARTS

        •  

Where do OpenFF plans and TG’s plans intersect?

 

  • LW: If we adopt this, it would be as an experimental package owned by the science team for at least a year (similar to NAGL). We would only hand it off to Infrastructure after research was making good progress.

  • LW: Should we use your code, or go our own way to play with BESMARTS?

    • TG: I’m not going to depend on the toolkit. If you want to use it, you probably need a separate codepath.

    • TG: I’m not against writing interfaces / adapters, I just don’t want the toolkit in my core functionality.

  • TG: What would be the intended dataset? Sulfonamides? Maybe we can start small and split some targeted parameters. If you decide what set this is, I can use it in my PhD research.

    • LW: It depends on the size of dataset you are after. Sulfonamide angles or other angles involving sulfur could be interesting.

  • LW: What would be most helpful to you?

    • TG: Inputs and outputs (datasets and benchmarks)

  • DM: What size set of molecules would you be comfortable fitting to?

    • TG: If you trust the process and wait 3 weeks, it could be 1-3 thousands of molecules. Up to 300 would be quite quick and allow us to iterate.

    • LW: IT would be interesting to find out what is the smallest set that would give us good results

    • LW: Let’s start with sulfur angles. What format do you want the molecules in? SDF?

    • TG: I don’t have a preferred format. SDF is ok.

  • TG: You could use my code by calling out to BESMARTS to generate parameters for you, instead of letting it orchestrate all the fitting.

    • JE: I thought that is what we would talk about.

    • LW: Doing it that way would put less maintenance burden on you.

    • TG: Yes! But then the question is who writes the interface code.

    •  

Action items

Decisions