How well do we expect BESMARTS to work for this? | | TG: Has done some of this before TG: fitting is based on old code that needs to be refactored, but splitting part is at a decent stage TG: Difficulty depends on what we need depends on what level we want to do, e.g. just bonds, just angles, just torsions etc. Bonds and angles pretty straightforward once engineering aspects in place. DM: what do you think is realistic or achievable in a scientific sense, in a few months? Assuming e.g. splitting on vdW would be complex due to fitting to physical props TG: never pulled the lever on a 3-4 week fit if we want to start a FF from scratch (if engineering is done, full FF could be fit, including torsions, in 3-4 weeks depending of course on the size of the molecule set; could get further/faster if starting from Sage).
JE: Engineering: How much involved? Or who could estimate it/what other info needed to estimate it? TG: Path he’s on right now doesn’t use toolkit; hooking in toolkit might make it faster. BUT the hard part is puppet master of FB runs because you have to do so many at the same time. 10k lines of code relating to this and it is old, so might be easier to just do from scratch. DM: how much of the engineering work do you expect to do yourself before graduation? TG: I’d like to get this done as proof-of-concept of building a transferrable FF from scratch, but may not implement everything e.g. TorsionDrives DM+TG: but perhaps lower priority
DM/LW: Suggested MVP just bonds and angles. But we need the help the most on torsions. Torsions would be really nice. JE: how does this scale with the size of the molecule set? TG: multiply general FB fitting time by 100-300 because it’s evaluating many different FF optimizations. For torsions, maybe higher JE: is that a linear factor wrt molecule set? TG: scaling comes from diversity of parameter matches
TG: first priority is to build up capability for bonds and angles. TG: when fitting bonds, they’re v sensitive to reference forces. In general we haven’t really focused on force targets, just opt geo etc. DM: can OpenFF help with the engineering side? TG: 2 directions: 1. designing dataset to use for fitting parameters from scratch – current set probably has a bit of redundancy so can be reduced 2. writing some quick handy functions to go between OpenFF formats vs besmarts formats may be useful
|
Where do OpenFF plans and TG’s plans intersect? | | LW: If we adopt this, it would be as an experimental package owned by the science team for at least a year (similar to NAGL). We would only hand it off to Infrastructure after research was making good progress. LW: Should we use your code, or go our own way to play with BESMARTS? TG: I’m not going to depend on the toolkit. If you want to use it, you probably need a separate codepath. TG: I’m not against writing interfaces / adapters, I just don’t want the toolkit in my core functionality.
TG: What would be the intended dataset? Sulfonamides? Maybe we can start small and split some targeted parameters. If you decide what set this is, I can use it in my PhD research. LW: What would be most helpful to you? DM: What size set of molecules would you be comfortable fitting to? TG: If you trust the process and wait 3 weeks, it could be 1-3 thousands of molecules. Up to 300 would be quite quick and allow us to iterate. LW: IT would be interesting to find out what is the smallest set that would give us good results LW: Let’s start with sulfur angles. What format do you want the molecules in? SDF? TG: I don’t have a preferred format. SDF is ok.
TG: You could use my code by calling out to BESMARTS to generate parameters for you, instead of letting it orchestrate all the fitting. JE: I thought that is what we would talk about. LW: Doing it that way would put less maintenance burden on you. TG: Yes! But then the question is who writes the interface code.
|