2020-03-23 Roadmap meeting notes

Date

Mar 23, 2020

Participants

Participants:

@David Mobley
@John Chodera
@Lee-Ping Wang
@Michael Gilson
@Michael Shirts

@Daniel Smith (Deactivated)

@Jeffrey Wagner
@Karmen Condic-Jurkic
@Matt Thompson
@David Cerutti (Deactivated)
@Simon Boothroyd

Concept

Discuss the ideas and problems at a high level and treat possible solutions as a black box. The goal is to define the most important features, inputs and outputs, restrictions, timelines. The exact solutions (product development plans) are to be proposed and devised by smaller designated teams.

Goals

Roadmap planning

Define the main outcomes for the next year for science and infrastructure:
- The number of major force fields to be released (for small molecules and biomolecules) and the most important features defining each release (release vs. generation);
- The most important infrastructure components to be developed in the next year and desirable features (data management);
Key science issues that need to be addressed to support future plans
- These need to proceed independently of major release plans
- Science to be incorporated into release plans when ready
What will NOT be included in the next FF releases and infrastructure support
Other important scientific efforts and outcomes (papers, collaborations, data collection, etc)
Community building – OpenFF affiliates?

Science Roadmap 2020

Additional materials

Discussion topics

Item	Notes

Item	Notes
Project management	JW – sounds really good SB – takes a bit of time, but its minimal overhead and very useful in the long run – can shoot the page DLM – This is nice from a management perspective KCJ – It’s also nice and searchable. Keeps everything documented in one place KCJ – Plans for papers should begin with an outline/proposal, where the authors list is finalized DLM – Make sure to talk to John about this MS – Agree with KCJ’s proposal. But current system has too many steps. There should be three max. KCJ – Agree that there are too many steps, will take this into consideration for new line (General) – We agree that the paper writing procedure takes too long, and we should streamline it.
Aspirational planning	MKG – I’m seeing our current path as extending FF in nthe infra and daa on-hand that we need, so that it’s easy to fit new FFs of differing functional form to existing data, or vice versa. Democratizing science around FFs. So I want BCCs, off site charges, polariziability, and make it easy to mix and match those with datasets, and crank out FFs. Then we can experiment and learn about how these factors combine into speed/accuracy tradeoffs. DC – The value of what David is talking about is presence of datasets from which we can just pull values, that would be extremely valuable. Always thought that having a library would be extremely useful, even if it means picking a single molecular property. Datasets include sensible, clean molecule sets that cover chemical space. There is one general parameter set that is for all intents and purposes good enough and adding more data and more parameters just doesn’t help.My experience is that common parameters DO work and ARE possible (eg protein backbone torsions+charges are nearly identical).Highly flexible atom typing will let us cover a much larger part of chemical space, and do so well. MKG – Concurs to everything said. Systemization of parameterization is central, so we don’t do things that don’t make a difference. We don’t want to get lost in parameter space, wants to improve optimization process.(Concerned about the size of parameter space, and how we may easily be fooled into falling into local minima) LPW – Largely agree with everything said so far, I would like to see development of unified and systematic approach that works for small molecules and biomolecules, systematic optimization and automated generation of new parameters, some real innovation in functional forms. This comes to mind right now. Wants to see automated generation of NEW parameters. Optimization of parameters based on WBO, material improvements to electrostatic models, etc. MS – Automated benchmarking is huge, emphasizing data drive approach. Molecules are molecules – small molecules, lipids, sugars, proteins… This effort makes it possible to do. JW – Excited to build new things and automate everything. SB – Also excited about benchmarking and data driven approach, including systematic benchmarking. DLM – automated benchmarking is a key part of getting where we need to be. Then we can really determine what is “better” on even footing. KCJ – Do we want to commission expts? MS – We probably can’t decide that now. But we can let subproject owners identify whether it would be useful to commission new experiments. JDC – Our goals need to be modest on the “commissioning expts” front, given that NIST has a LARGE budget to do this, and they’re struggling. H-G expts from Gilson lab will help, my lab can provide densities. But collabs with eg. Rafael Wafweiler can provide NMR and x-ray DLM – Not sure that ThermoML has continuous coverage for everything we need – It may have gaps of coverage MS –
Force fields	KCJ – the main features of FFs and stick with it, it can be changed if everyone agrees. JDC – What’s more important – Releasing on the DATE we want, or release with the FEATURES we want? MS – Depends on what will make our funders happiest JDC – releasing on time would make funders happy rather than adding more features KCJ & MS – disagree DLM – Remember that we’re doing product development, on top of science. The science can’t be done on a schedule. The FFs can. The infrastructure is in between. JDC – So maybe we should do FF releases on a DATE, with whatever features are available KCJ – Disagree – We should release FFs guaranteed to have certain FEATURES. It’s disruptive for partners to update to a new FF, so we should make sure there’s a clear value-add each time DLM – This gets to “what’s a release vs. a generation” of a FF. A “release” may include a bit more training or a fix for an observed issue, whereas a “generation” is a change in the science. JDC – This seems like the best of both worlds. KCJ – Naming scheme is complicated, and is throwing people off. So let’s be very deliberate about this. DLM – Maybe we should get rid of the word “release” entirely. “Generation” is a good name for significant changes. Each “generation” will have a new herb name. X only increments if there’s a change in functional form. (General) – Generation changes should be considered “major releases” `openforcefields` repo – “FF line → FF generation” MT – Do we expect to maintain previous herb releases and make point releases of them? DLM – No What about people outside OpenFF making and naming new FFs? (General) disagreement (General) – Agreed – Sage will be openff-2.0.0 JDC – Short term plan for biomolecular FFs is to just pull in an AMBER FF. Medium term plan is to build out the infrastructure to parameterize a biopolymer, like graph charges. Long term is to pull in NMR and other data to actually optimize the parameters. KCJ – Let’s focus on naming for now. JDC – The new FFs will be exactly AMBER – Just OFFXML representations of the exact same parameters. KCJ – But what will we call them? General (DM) – We’l just call it `amber14SB.offxml` and then they will become a part of `openff-X.Y.Z` with documentation describing the details. DC – metadata with provenance of parameter changing? SB – Could the toolkit just do a `diff` on parameters and give some more info without bloating FFs with a lot of metadata? MW – what people want out of data? In my past project, we had something similar. (General) – Punt on this topic. We could have the FF person think about this and report what infrastructure would be useful.
Feature specification for next FF generations	DM – suggestions for the next FF generations: LJ refit WBO interpolation BCC refitting Protein FF More distant future Conf-independent charge method DM – dividers across the next two generations JDC - more torsion data in QCArchive to do a good job with WBO interpolation DM – we need someone who can take over Chaya’s work – Need to task someone with making FF with few explicit bond roders JDC – LJ stuff seems ready, but we have data selection problem and how to constrain LJ parameters not to go too crazy. Simon? SB – LJ should be able to come out in the next release. We should add some water properties to make sure we stay compatible JDC – Can we add aqueous measurements of sidehcain analogues and other biopolymer-like moelcules? MKG – I would vote for this, it would reassure people. It’s easy, since ethanol is like serine SB – it depends if we have data for it MS – the first pass would be just to use it benchmarking and not in optimization JDC – we need to find data first. MS, you have some data? MS – I wasn’t thinking about all quantities when I built that dataset (some aq. free energies) DLM – We should punt on this so that the appropriate team (Shirts lab) can look into data avilability. Biomolecular FFS MKG – AMBER-parameterized sidechain analogues, versus Parsley-parameterized? DLM+MS – Carlos recommended we take the latest QM torsional data from AMBER fits, and refit our torsions to fit that. No need to worry about consistency as we remain internally consistent with LJ parameters. JDC – Fully agree that’s the first step toward our medium-to-long term vision for biomol FFs MKG – I’m not excited about parameterizing our LJ in the context of AMBER LJ JDC – But we need to start converging if we’re hoping to ever be compatible with Amber FFs. We need to include something to constrain. (General) – We should first do this as a benchmarking experiment, and then see how incompatible the FFs are SB – (INFRASTRUCTURE) infrastructure burden, currently not possible, we’d need Jeff or Matt to put some time in it (General) – Disagreement about importance of “compatibility” DLM – I don’t think compatibility is really that important JDC – I’m worried that this could break P-L binding affinities. KCJ – Sacrificing affinity in the short term may be the best way forward. MKG – Why do we expect this to be worse? Why should FFs be incompatible off the bat? If the benchmarking DOES look bad, thenw e don’t make a release. DLM – Decision – We don’t worry too much about “compatibility” , and benchmark the result before we release. MS – If we want to say “clean break with everything, we’re starting over” – then THAT’s a good time to make our own water. MKG – So, “don’t change the water, and check for fortuitous compatibility” SB – Unsure about whether to include water mixtures in Sage release. We’re not sure if it’s going to make sense. There may be a data availability issue. Will report later. MS + JDC – It’d be good to test out TIP3P-FB while we still have LPW here. LPW – Forcebalance has a lot of water models now, and they all use the same manually-curated datasets I think TIP3P-FB is the right place to start. If we want to make a big improvement past this, then mixture data needs to be included. We should validate that OpenFF-evaluation can run aquaous mixtures. Feasbility tests should be easy, but settling on a final model will take a lot of work. DC – As a control expt, we might take the AMBER parameter set, and try fitting using those. Sidechain analgoues were like 17,000 data points, though protein backbone was fit by hand. So we can bring in the AMBER bakbone paramteres and otpimize the siechain parameters to the same data. DM – that’s what Carlos was saying, more or less, and we want to start with something like that JW – Timing – my experience has left me with a lot of things that I haven’t done, but I’m not sure about priority. I would vote for ranking features and infrastructure tasks based on their priority instead of assinging dates. KCJ – We should add “time windows” for delivery, so we have some semblance of calendar planning KCJ – Revisit QM theory level? LPW – Decision was made carefully with respect to conformational energy and with a plan to go forward for a year, but performing a benchmarking study for torsional drives would be important. Important settings/factors for torsiondrives may not be the same as for geometry optimization. It will be important to look at different levels of theories for OTHER properties we want to fit using QM. JDC – For torsion fitting, Josh (Horton?) is working on automating submission, and may be a good candidate for a study looking into use of minimization trajectories and gradients as an alternative to torsion driving LPW – Will this sample top of torsion barriers, and how important is that in fitting? INFRASTRUCTURE TASKS Set up infra for property estimator mixed FF tests H-G infrastructure P-L infrastructure Biopolymer infrastructure (SMARTS typing optimization and infra improvement) and other polymers, too Polymer infrastructure (topological residue detetion, etc) Biopolymer infrastructure (graph charges and/or other scalable solution) AMBER-derived library charges (making the offxml file) Off-site charges (support in OFFTK) Off-site charges (support for conversion to other packages) Automated benchmarking + dashboard May include geometry tools (MM minimization, conformer generation, torsion scanning, conformer scoring) Automated QC mol submission Python System object Automate fitting infrastructure, remove OE dependencies Remove `smirnoff_hack`.py Docs cleanup and binder-izing all OFFTK examples Benchmarking automation/dashboard Exception hierarchy Shared molecule object/openforcefield-core/pydantic refactor SMIRNOFF spec update OFF-core/interoperable molecule/Stereochemistry refactor Package namespace reorganization Developers guide + true community contributions Nightly build framework and decision matrix A deep dive into toolkit parameterization differences → Josh's SMIRKS differences Data infrastructure → How to store datasets? Github, tarballs, images, etc? Bayesian infrastructure: ML frameworks SCIENCE TASKS/STUDIES FF compatibility benchmarks H-G benchmarking Mixture properties BCC fitting WBO fitting LJ refit GCN charge model Submission of more QM data Biopolymer Richer torsion data for WBO fitting Ligand-expo enamine-REAL building blocks Whether to use data generated “incidentally” during torsiondrive (eg optimization snapshots, gradients, energies)-- these data points are not very well controlled More efficient torsion sampling, using less grid points during scan Benchmarking/re-evaluating our choice of QM theory Off-site charge SMIRKS definition/fitting/benchmarking Studying addition of new parameters Ongoing – Fixing problems Automated typing inference from scratch
Supporting science	Atom type creation studies DM – MKG, are your folks still interested in this? MKG – Tobias is looking into this, but he may not choose the method that OFF wants. He's interested in AIM models kind of tpying. DLM and DGS – Please loop us in on this Property collection+selection/Data accessibility MS – Checking what we can findin ThermoML, and to a lesser extend DIPPR. We have ACCESS to DIPPR data, but not PERMISSION to reuse/redistribute. So we can do feasibility studies on those two, SB will say more. RefProp may also be useful, but I need to see if we can get access to it. finding datato use to parameterize clogP and mixing data is still on my to-do list. A larger dataset question is the biopolymer dataset – How will we benchmark? We’d talked about getting our scientific advisory board to come up with a list of tests to convince them that biopolymer FF is “good”. KCJ – DIPPR access status? MS – Tommy Notts (sp?) is in charge of DIPPR, generally supportive of our goals. But needs AICHE leadership to sign off on republishing. Only about 130 compounds of interest (pure single-compound). SB can say more. KCJ – Studies planned? MS – SB is looking into which properties are useful. Data coverage is another question. SB – ThermoML seems to lack diversity – eg. has lots of highly fluorinated stuff. Lacking in amine/amide data, which makes me nervous about amino acid analogues. We didn’t use DIPPR because of permission concerns and access infrastructure, since we’d need to manually curate a lot of it. It’s so small that the benefit may not be worthwhile. SB – If we DO want to focus on aqueous data more, we’ll need to find a new source of mixture data. ThermoML and DIPPR don’t have a lot of this. MS + SB – A good data set to find would be PURE DENSITY data, MIXTURE DENSITY data, and ENTHALPIES OF MIXING. JDC could produce the latter two in his lab. Surface tension would also be good MKG – Do mixture enthalpies actually improve things? SB – Yes, in my preliminary benchmark of alcohols and esters they did show broad improvement. Will broaden studies in coming weeks and keep you updated. FF fitting science + infra LPW – We should study how to set prior widths and weights for different sorts of data. Simon and I came with the same approach indepently. LPW – More efficient ways to simultaneously co-optimize valence and nonbonded parameters. Right now we do things serially, which is easier, but less likely to find a true optimum LPW – idea floating for a long time – in principle you should compute differential of the parameter much faster if you know which parameters change, we could get some additional performance here SB – PE already has this? LPW + SB can talk more about this and look into feasibility studies DM --some of these optimization tasks can be spun independenly – get involved projects INFRASTRUCTURE TASKS More efficient ways to simultaneously co-optimize valence and nonbonded parameters Efficient energy calc during optimization based on tracking parameters being changed, and only recomputing energy terms that changed (PE may already have this? LPW + SB can touch base) SCIENCE TASKS Studying which properties are useful Data set curation/coverage study Study how to set prior widths and weights for different sorts of data during FF optimization. Getting our scientific advisory board to come up with a list of tests to convince them that biopolymer FF is “good”. CCDC data selection/release (probably not high-priority)
FF improvement	KCJ – This is important to continuously do, since industry partners are interested in continual improvement DLM – Victoria Lim in my group is working on BenchmarkFF repo – uses various metrics to compare FFs. Has allowed her to idenitfy particular parameters that are a source of error JDC – Three categories that we want to present in the dashboard quantum chemical deviation physical poperties P-L free energyfr KCJ – P-L free energy may be too expensive to continuously evaluate right now. Initially we should stick to small systems, since they’ll also be clearer for identifying sources of error. At this point we’re running P-L before host-guest, which seems out of order. MKG + JW – pARPika - PE integration is not production-ready KCJ – Are there a few H-G systems that we can ship around and test in all frameworks? DLM – Binding is heading toward standardization in David Hahn’s and Hannah MacDonald’s benchmark system repos MKG – Dave Slochower had done a paper on systematic benchmarking, that dataset may be available (data set: ) INFRASTRUCTURE TASKS pAPRika - PE integration and production-readiness SCIENCE TASKS Pick some (5ish) small reference systems (possibly subset of SAMPL challenges) for P-L and/or H-G free energy, for comparison of different free energy methods
Biomolecular FFs	DC – Which quantum method should we use? Whether or not to use CMAPs (cross-over terms) Off-center charges? Expand bond and/or angle complexity to do force matching DM – we would like to be able to do all of these things and then run experiments to answer these questions by benchmarking different FF versions DC – I can do much of this in `mdgx` right now. DM – We will need to settle `mdgx` and FB infrastructure. DC – Will need to learn our “language” and differences between amber atom typing MT – Will different functional forms of “latest” FF exist simultaneously? DM – We originally envisioned this happening. We will probably have one main FF and we’ll focus our efforts on it. But we may end up offering different functional forms that provide different accuracy/cost tradeoffs, and updating them regularly. KCJ – Thoughts about other biopolymers – DNA/RNA? MS – Need toolkit infrastructure (represent biopolymers). Then need benchmark infrastructure. INFRASTRUCTURE TASKS CMAP torsions in OFFTK/SMIRNOFF spec SCIENCE TASKS Which quantum method should we use? Whether or not to use CMAPs Off-center charges? Expand bond and/or angle complexity to do force matching
QCArchive strategy	JDC – What can OpenFF push on to best synergize with QCA? DGS – I’m likely leaving MolSSI soon. Things are very stable now, so remaining team should be able to keep current capabilities active, and continue to add new data. MolSSI is happy to continue taking bug reports/feature requests to best support OpenFF needs moving forward. JDC – Should we have an OpenFF-employed QC developer? DGS – Current OpenFF budgeting should account for 50% of a QCA developer. Doing databases and distributed computing is complicated, so there will be a high onboarding cost. JDC – David Dotson could be a good candidate for this. It’d be good for OpenFF’s long-term health to have an in-house QCA person. MS – Our budget doesn’t include 50% of a developer for this in 2020. JDC – Open source science supplement from NIH may be a good source of support for this. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-20-073.html MT – I can poke around with this code and evaluate how well I could fill that role. DGS – It’ll be dangerous to learn on QCA in production – Databases are brittle, so you should pursue more formal education about them if this is of interest. JDC + DLM – Could look at diversity supplements to pull in a CS grad student, or NIH (NOSY?) grant for software scientist
Final prioritization	What can be done outside OpenFF organization or as one-off projects? What will be included in the next major FF? Parsley 1.2 (may 2020) More QM data/parameter fixes Sage (Late 2020) LJ refit Some WBO torsion interpolation INFRASTRUCTURE TASKS Training a 50% QCA developer More efficient ways to simultaneously co-optimize valence and nonbonded parameters Efficient energy calc during optimization based on tracking parameters being changed, and only recomputing energy terms that changed (PE may already have this? LPW + SB can touch base) Set up infra for property estimator mixed FF tests H-G infrastructure P-L infrastructure Biopolymer infrastructure (SMARTS typing optimization and infra improvement) and other polymers, too Polymer infrastructure (topological residue detection, etc) Biopolymer infrastructure (graph charges and/or other scalable solution) AMBER-derived library charges (making the offxml file) Off-site charges (support in OFFTK) Off-site charges (support for conversion to other packages) Automated benchmarking + dashboard May include geometry tools (MM minimization, conformer generation, torsion scanning, conformer scoring) Automated QC mol submission Python System object Automate fitting infrastructure, remove OE dependencies Remove `smirnoff_hack`.py Docs cleanup and binder-izing all OFFTK examples Exception hierarchy Shared molecule object/openforcefield-core/pydantic refactor SMIRNOFF spec update OFF-core/interoperable molecule/Stereochemistry refactor Package namespace reorganization Developers guide + true community contributions Nightly build framework and decision matrix A deep dive into toolkit parameterization differences → Josh's SMIRKS differences Data infrastructure → How to make maximally accessible? How to store datasets? Github, tarballs, images, etc? Dedicated data person who interfaces with domain experts? Or single person who is a science domain expert AND ALSO a data expert Bayesian infrastructure: ML frameworks pAPRika - PE integration and production-readiness CMAP torsions in OFFTK/SMIRNOFF spec FF SCIENCE TASKS Studying which properties are useful - feasibility studies high NOw Data set curation/coverage study high Now Study how to set prior widths and weights for different sorts of data during FF optimization. low Getting our scientific advisory board to come up with a list of tests to convince them that biopolymer FF is “good”.medium CCDC data selection/release (probably not high-priority) – spinoff / later AMBER FF compatibility benchmarks – part of benchmarking H-G benchmarking Pick some (5ish) small reference systems (possibly subset of SAMPL challenges) for P-L and/or H-G free energy, for comparison of different free energy methods (high/NOW) Study which mixture properties help the most in FF fitting Things simon is already doing (high) Chemical potential-like properties (MEDIUM) Octanol-water partition coefficients (MEDIUM) BCC fitting (high, DEPENDENT ON LJ REFIT) WBO fitting high, POSSIBLY DEPENDENT ON? Much more torsion data LJ refit (high/now) GCN charge model (high, IN PROGRESS) Submission of more QM data (ONGOING) Biopolymer Ensure sidechain data is available in QCA (either recompute or transfer in) – data selection is HIGH, computing (MEDIUM) can happen LATER Richer torsion data for WBO fitting (MEDIUM) Ligand-expo (MEDIUM/ONGOING) enamine-REAL building blocks (MEDIUM/ONGOING) Nonzero charge state (high) Protomer/tautomer enumerated molecules (high) Whether to use data generated “incidentally” during torsiondrive (eg optimization snapshots, gradients, energies)-- these data points are not very well controlled (SPINOFF, LOW) More efficient torsion sampling, using less grid points during scan (SPINOFF, LOW) Benchmarking/re-evaluating our choice of QM theory (high) Off-site charge SMIRKS definition/fitting/benchmarking (MEDIUM - a lot of work, potentially SPINOFF) Studying addition of new parameters (high/return to discuss approach or personnel) Ongoing – Fixing problems Automated typing inference from scratch BIOPOLYMERS / PROTEINS Which quantum method should we use for biopolymers (should it be the same as small molecules)? (HIGH, dependent on torsional QM benchmark) Feasibility/benchmarking studies of torsional CMAPs (DC to look into this) MEDIUM Feasibility/benchmarking studies of other cross-terms (like in class 2 force fields) LOW

Action items

Decisions

We agree that the paper writing procedure takes too long, and we should streamline it.
Sage will be openff-2.0.0 – Generally major increments MAY indicate a functional form/compatibility change, but don’t NEED to. If compatibility DOES change, then major version number MUST increment.
Biomolecular FFs naming – We’l just call it amber14SB.offxml and then they will become a part of openff-X.Y.Z with documentation describing the details.
Provenance of parameter changes – where to store it, etc. Future STUDY
Biomolecular FFs: We will take C. Simmerling view that consistency is overrated, we go ahead with Simon’s LJ refitting plan. Thus, we will do benchmarking of combined protein-ligand systems with separately optimized Openff small molecule parameters and existing AMBER protein parameters. If benchmarking results are okay, we can release, and next step will be a jump to co-parameterization of self-consistent small-molecule and protein parameters. If benchmarking results not okay, go back and optimize Openff parameters in context of some protein side-chain analogs outfitted with AMBER protein parameters.
Water optimization – start planning co-optimization this year and use LPW’s expertise in designing this study, which will be executed at a later date. TIP3P is a good starting point. Settling on the final parameters will be the slowest step.
STUDY Charges – decide which models to implement, in which order, coupling extent with LJ, etc.
study QM torsional benchmarking study
study Inclusion of incidental data in optimization (J. Horton is building infrastructure that will enable it)
study Off-site charges – long-term goal (Science + infrastructure)
STUDYMissing PERSON New parameter addition – requires a person which we don’t have at the moment, it will have to wait.
study Identify data missing for property estimation. Start thinking about how to address it. Short-term.