Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Adding protocol Direct polarization supported in infrastructureDrop ividfs

Idea

Discussion

Effort

Value

Risk

Dependencies

Resources

Timescale

Add example workflows for non-standard uses

  • Anything goes repo may solve this, need to do a better job of publicizing

-

-

-

Cover proteins, PTMs/NCAAs

Ongoing

High

High

Medium

Proteins: Chapin

PTMs/NCAAs: Infrastructure

Proteins: 4 years

PTMS: 1 year

Expansion of virtual sites and neural network charges

Ongoing

High

High

High

Fix Evaluator

Better, faster fitting

Science team

Cole group

vsites: 3 year

NAGL: 2 years

Expand quality to nucleic acids, lipids

Ongoing

High

High

High

Shirts group

Ken Takaba

Staff

3 years

Co-optimized water

  • What is missing is regularization of pure water properties.

  • Don’t we have the TIP3P results

  • We want to increase the size of the dataset

  • What is the priority for co-optimization? Start with what was used for Sage.

  • 2-ish FTE months

Medium

High

Medium

Shirts Group

Science

2 years

Make a GUI/system builder

  • I don’t think we need to lead on the development

  • Build molecules interactively

  • Similar to CHARMM GUI

High

High

High

Get involved directly with drug discovery

  • Proving the methods work in real life

  • Provide a source of income

  • Would not necessarily feed back into OpenFF development

  • Would we have a strike team like Schrodinger to incorporate issues?

  • Hands-on MedChem experience would be valuable for employees

  • Could join in with JC start-up ASAP in the short term

  • IP stays with the company, but the software advancements would stay with OpenFF

  • ASAP Alchemy - Takes prepared inputs, generates poses, pass off to alchemiscale + OpenFE

  • Might be more of an OMSF level thing to promote

High

High

Low if we watch for opportunities, higher if we push it

Identify opportunities for fee-for-service

OMSF

ASAP

Better logging in Evaluator

  • Currently the error messages are none

  • Debug logging only provides timings

  • If you do not save the working data you do not have messages to debug which can take days or weeks

  • We have not tools to debug other than re-running

  • What were OpenMM settings when it was run?

Medium

High

Low

Infrastructure

1 year

Adding protocols for osmotic coefficients

  • Draft of harmonic well potential already exists.

  • Can utilize BM and MS to implement

  • Until we do this/the next one, we have no way to include ions in fitting.

Medium

High

Low

Infrastructure

Shirts Group

1 year

Adding protocol for radial distribution functions

  • We need RDFs for other things too

  • Only isotropic scattering

  • Can utilize BM and MS to implement

Medium

High

Low

Infrastructure

Shirts group

2 year

Tune ion parameters FFs

  • Once the infra is there then this will be easier

High

High

Low

Adding protocols for osmotic coefficients ,

Adding protocol for radial distribution functions

Shirts group

2 year

ML potentials FF development

  • One possible is surrogate optimization

  • This is specifically about ML potentials (not ML training methods or espaloma style stuff)

High

High

High

5 year

Clearer communication about strategic goals

  • Platform to ask a broad question

Low

High

Medium

Staff

Leadership

1 year

Reduce maintenance costs by pinning versions

High

High

High

Write our own solvated topology builder

  • Can engage with PackMol people more

  • Two cases

    • Things we are fitting

    • general cases

High

High

Medium

Evaluator needs this

PackMol team

Infrastructure

1 year

Automatically typed parameters with BeSMARTS

  • Need to adapt to interface with smee

  • Provides a justification for atom typing

High

High

High

Smee

Science

Cole Group?

2 year

Espaloma Typing

  • Graph convolutional model rather than SMARTS model

  • directly assigns parameters and then uses standard functional forms after that.

  • Training is faster because of continuous embedding

  • Has been fined tuned on condensed phased

  • Could use nagl and train mode with additional outputs

High

High

Medium

Science

Infrastructure

Cole group?

5 year

Get a better benchmarking set which uses all parameters

  • Brent started on this

Medium

High

Low

Provide minimal estimate of uncertainty of parameterized molecule,

Science

Industry partners

Shirts group

1 year

Provide minimal estimate of uncertainty of parameterized molecule,

  • and posibly also do a refit if the parameter quality is estimated to be poor

  • I think a parameter quality estimator tool would be interpreted as a “should I use bespokefit” tool

    • But maybe this will help KEEP people from running bespokefit unnecessarily

High

High

Smee

Science

Cole group

Automatic iteration process for generating new QC data and fitting a FF

DC says he can take on most of the effort
  • If we had something that could spit out smiles strings, generate molecules, and then set-up the dataset

High

High

High

Docked conformation strain estimator using nagl

  • Split out automation into two pipelines

    • Valence

    • Nonbonded

  • Several parts of the QC dataset submission process would require a lot of extra work to automate

  • The first part is an active learning application

High

High

High

Active learning for generating new QC data depends on automated fitting

Infrastructure

Science

Cleaning up scripts to improve automated fitting: 1 year

Active learning: 5 year

Docked conformation strain estimator using nagl

DC says he can take on most of the effort

High

High

High

Cole group

Direct polarization supported in infrastructure

  • The spec does not support an offxml for a polarizable FF

  • What if we only want it to work with OpenMM

  • Do we take over the MPID plug-in?

medium

high

low

Further discussion

Switch to using smee instead of ForceBalance

  • Integrating this into the general OpenFF workflow

  • The effort involved with sticking to ForceBalance is very high

  • Things get simpler if we can drop torsion scans. But if we start doing class 2 FF fits something something.

High

High+

Medium

De-risk interaction with evaluator

Cole group

Staff

Achira / Simon

Josh Horton?

2 years

Simplify representations in toolkit

  • It seems like there are a bunch of support requests having to do with stereochemistry

  • We will still need stereochemistry sometimes

  • Users may be annoyed in some cases

Medium

Low

Low to med

Publicly visible way of asking for support and getting tickets

Low

Low

Low

Issue bot that reminds us to respond to partners and reminds non-pertners of the benefits of joining

Making a polymer builder

  • A lot of the effort can be offloaded to MuPT and TB (Shirts Group)

  • Long term cost associated with maintenance

Medium

Medium

Low

Clearer idea of intrerest needs

Shirts

Infrastructure

2 years

Unified logging across more tools

  • Some of our Pharma users ask for provenance

  • Being able to log which toolkit item was used for which task

  • You can initiate a logger ahead of time and then add a specification for level of logging

Medium

Medium

Medium

Infrastructure

1 year

Class 2 FF

  • This is not something close to our scale

  • Should the infrastructure support it? Probably only when someone comes to us with the need for it.

  • Very hard to do with Smirnoff, but would be easy to do with an espaloma format

  • Non-spec informal functionality in OpenMM only would be much easier to consider

  • Class 2 is the easiest way to make a FF better.

  • Could just be a smirnoff plug-in

High

Medium

High

Revisit LJ Mixing rules (Walden-Hagler alternative)

  • Involves no extra functional forms

  • Just a different was of combining epsilon and sigma

  • Only really matters with large differences in sigma and epsilon

  • Interchange can already do this with OpenMM

Medium

Medium

High

3 year

Distribute NAGL differently

  • Cybersecurity issue potential

Low

Medium

Medium

Staff

1 year

Track every custom torsion in Bespoke Fit and append to long running XML

  • Add public in as an adjective

  • Should be possible within your own group

  • Anyone using this for non-proprietary research could submit

Medium

Medium

High

Minimal reimplementation of SMARTS matching algorithms

  • replace Rdkit

  • allows us to drop a few dependencies

High

Medium

Low

Add an example using SAGE to score ligand poses

Low

Medium

Low

Someone needs to do this

Give clear instructions to cite the FF

Low

Medium

Low

Infrastructure

Leadership

1 year

Drop idivfs

Medium

Medium

Low

SMIRNOFF spec change

Infrastructure

SMIRNOFF committee

1 year

Direct polarization (in mainline FF?)

  • Working past protypes with polarizability will be difficult

  • Compatibility with other engines doesn’t exist yet/software will be an issue

  • What happened with Drude was it took 10-15 years to get the infrastructure to work correctly (indirect polarization)

  • Direct may be 5 years of pain

  • Get it working with OpenMM

  • Get an extensive dataset with 3 papers to prove it works to get GROMACS to incorporate it

  • One pathway is to keep it as a prototype using the OpenFF infrastructure to demonstrate utility and then the value increases and effort decreases

  • The infrastructure is pretty much there since it is in a smirnoff format and there is infrastructure in OpenMM, Charmm, and Tinker

  • We could implement partial polarizability

Write more blog posts

Make it possible to fit a FF in a day or less

Jenn’s LJ Mixing adding r^4 term

High

Fit FF in a Day

  • Exclude condensed phase

  • Valance parameters

  • Needs tied to smee or espaloma

Done (if we switch to smee)

Doing a benchmark set

High

Refit BCCs instead of using AM1BCC

  • Difficult to convince the world

  • NAGL2 is when we should do this

Try new charge models with new funcitonal forms

Already in progress

Make a concerted effort to drum up materials science funding by putting together a proposal

Out of scope for consortium, PIs should do it. And when we find people who have used OpenFF tools put them in contact with project leadership.

...

  • DM – No, only the ones that look promising.

  • One of the ways we don’t make stuff scalable is by bringing everything in. Dropping dependencies is a better strategy.

    • If we did WBO again, would have spent more time derisking science before integrating infrastructure.

    • As we invest effort into an idea and it continues not working, estimate of risk tends to go up.

    • As a project becomes worth looking at integration, we define metrics it should it to be integrated

    • Project milestones are great, but Jeff also needs expectation of how much time he should allocate from infrastructure team

  • James: PTMs are an example of how OpenFF is interfacing with academic teams. Previously, Chapin was doing most of protein FF work; more recently staff has been providing more assistance to reduce barriers to his work.

    • JW: Was mostly positive, but infrastructure added extra complexity for reproducibility as a cost for speeding things up.

    • James: We’ll have a workflow in a few hours, but protein FF has been a 1-year project for a long time.

Expansion of virtual sites and partial charges (ongoing work):

  • Is this the best way to approach it? Are there academic collaborators?

  • LW: Maybe someone from the Danny Cole group has been working on it?

  • James: It’s been going on for a long time. Is this still something critical?

  • DLM: It’s not existential. We haven’t shown that it improves accuracy yet.

  • What experiments have been done on this, and where would we find the data summarized?

  • Data is in slides, not papers yet, maybe could write a blog post

  • It seems like this info isn’t discoverable within our current materials/workflow.

  • There are slide decks floating around somewhere from Lily

  • LW: After training parameters as well and adding virtual sites

  • Should there be a dashboard of active projects and status? This gets into our communications strategy issues.

  • JE: When to pull the plug on virtual sites?

  • It looks like in some cases vsites show improvements, TG is following up on some of this work.

  • Once evaluator is fixed, it should be possible to restart this work in earnest.

  • We should prioritize things that have the most potential and go for them full-steam, instead of waffling on things repeatedly.

  • This is why it got deprioritized - Initial improvements turned out to be due to adding a new vdw type during vsite refits, not actually adding the vsite itself.

  • LW: hasn’t been progress in the last months

NAGL 2.0

  • Have discussed second-generation NAGL trained on higher level of theory or virtual sites. Is this a priority? Does it seem like it will give us significant additional improvement?

  • LW: growing evidence that some ff problems we see come from bad charges. I think NAGL 2 would deliver moderate amount of value.

Expand force field quality to nucleic acids and lipids

  • Lipids is ongoing, but no work started on nucleic acids yet

  • Ken Takaba has created RNA datasets already within QCArchive

  • Moderna has contributed specifically to fund this effort, so we are obligated to allocate staff time to it (outside of ongoing effort helping with datasets)

  • JE: We should allocate staff time to this area because we have money. There is no one currently assigned. Probably a multi year undertaking.

Co-optimization of water

  • No new features needed in evaluator since we’ll throw out highly viscous compounds. BM is ready to start calcs for new water models. LW has been major staff resource so far. LW and BM agree they should formalize meetings to help this project progress

  • First step would be to fix some errors in Evaluator

  • Next step could be to refit using enthalpies of mixing, densities, etc to start co-optimization

  • There’s an existing workflow in ForceBalance to do this

  • One Q is which charges we use - library vs. nagl?

    • Library would be easier during optimization

    • NAGL would be fully self-consistent

    • Could fit NAGL charges to condensed phase properties

  • Need MRS input

GUI/System Builder

  • Don’t do - Scope would be huge

Get involved directly with drug discovery

  • The action items are to watch for opportunities and let Karmen know

Better logging in evaluator

  • MT could work with scientists and add this pretty easily

Adding protocols for osmotic coeffs/rdfs

  • osmotic coeffs are more valuable than rdf, so if we only do one, let’s do that.

  • BM has the implementation in mind, would need help with implementation.

  • Q: How does this relate to fitting? Would we be fitting to osmotic coeffs, or just benchmarking to them?

    • Just benchmarking

  • This proposal is discussing adding the feature to evaluator, not just a benchmarking suite

  • BM/Shirts lab should have the capacity to work on this project

ML potentials : Treating (multibody) valence terms with learnable neural functional forms

  • Who would do this?

    • No takers currently

  • OpenFF is interested in looking into this potentially, but not in the short term

  • Slow for now; class II force fields much more tractable in near term

Clearer communication about strategic goals

  • Topic of this afternoon’s discussion (after coffee break)

Write our own solvated topology builder / engage with PackMol

  • Best option is probably to engage with packmol

  • What do we hope to get from this engagement? I think we need to acknowledge that we own this part of people’s workflows. So one option is to re-implement packmol algorithm in rust. Another is to make our own packmol wrapper that we fully commit too. But this causes MT a ton of pain.

  • JenC: Users providing an equilibrated box

  • MT: That's not an option we can continue

  • JW: I disagree. There's other ways to get equilibrated boxes. Its a big thing to own.

  • JenC: remaking mbuild for ourselves. Has a nice wrapper. Enhanced MC. That approach would be great because of the process of equilibrating the boxes. Huge task.

  • MRS: Box packing of rigid and flexible. packmol sees to be the best for rigid.

  • DM: Box packing of things we use in fitting or box things that others want to pack. we cant be doing arbitrary things.

  • MRS: can we reach out to packmol people

  • JW: we have things that packmol is not suitable for and we have things that np is suitable for

  • DM: We should not pack nor pack things that anyone wants. we should focus on improving out ability that packs what we care about.

  • JenC: There is money for polymer builder but not for this.

  • JE: we cannot take on something that meets everybodys needs

Automatically typed parameters with BeSMARTS

  • MRS: Julianne already has tools for this

  • JE: dependent on adopting Smee.

  • LW: larger in the science team. Cole group could be interested.

  • JW: Potentially Josh Mitchell could contribute to this, since he’ll have just come off the BeSMARTS documentation effort

Espaloma / NAGL++

  • Likely would extend NAGL instead of adopting espaloma/espalomax

  • Would need to be a joint infrastructure/science effort

  • Cole group was asking enough questions about NAGL that suggests they may be working on something similar

  • LW: It’s a long term plan

  • MRS: Many future things we would like to do would be a lot easier in NAGL

Get a better benchmarking set that uses all parameters

  • Ongoing, mostly science team

  • Does xtalpi strategy of fragmenting pubchem help here?

    • LW – It seemed promising

    • DM – Could JCl help?

    • JCl – Yes, though I’m on a one year timeline

    • JH – I could help too.

Provide minimal estimate of uncertainty of parameterized molecule,

  • Fit an ensemble of models with any tool, though likely requires SMEE or related faster tool

  • Could be run by anybody running force field experiments if fitting is fast enough

  • Would require more statistical rigor of training and test sets, and assessing calibration of error on held-out test set

  • Check with Danny Cole, since this is something they could start working on with SMEE

Automatic iteration process for generating new QC data and fitting a FF

  • Slight misnomer - The thinking is to detect both valence and nonbonded terms that need improvement and improve them

  • Need to split into valence and nonbonded fitting

  • We haven’t fit valence and nonbonded at the same time yet

  • We have infrastructure to fit QM and condensed phase data simultaneously, but it isn’t automated

    • Assess what parameters are bad, what data is missing?

  • LW: There are some scripts, Matt referring to the streamlining ones. A collection, not a simple thing.

  • MT: After force field is fit run benchmarks. a molecule will be outlier and we spend time in that. an outcome of that is that we… moiety underrepresenting your data set.

  • LW: Getting QC data would require human steps

  • MRS: we got scripts that do it but the failing modes are large. its manual bc when you run these series of scripts they can fail in different ways?

  • LW: they consider exactly what we put in molssi

  • JCh:

  • DM:

  • JenC: 1 pull request for molecule its a lot more work. It requires a lot of time that isn't automatable. Looking at number of basis sets is better. We would have to have an automated system … A lot of infrastructure there to make it automated.

  • JW: we dont have resources right now. Automation is going in right direction.

Docked conf strain generator using NAGL

Docked conformation strain estimator using nagl

Direct polarization supported in infrastructure

  • We already have this, essentially

  • Would be great to have the spec in OpenFF XML instead of plugging it in

  • MT: What level of support from OpenFF infrastructure is needed here?

  • Adding to spec creates expectation that we put it in toolkit, then that it

  • Could it sit in a PR until we’re ready to integrate it?

  • JW: Seems close to phase 1 completion; stage 2 might be to bring it into OpenFF SMIRNOFFPlugins; stage 3 would be SMIRNOFF spec

  • Question for a future meeting: Should this come into the SMIRNOFF plugins repo?

Switch to using smee instead of ForceBalance

  • Experiences using smee?

    • Used it 1-1.5 years ago, was getting k=0 for some torsions

    • Sounds like torsion issues were removed. smee is a lightweight piece of code. Includes descent (another package) as well. Evaluator isn’t super special, a lot of it is made to handle distributed compute, and we can keep the scientific capabilities without the distributed compute stuff.

  • Would we do co-optimization of water using smee or forcebalance?

    • Given timeline of water project (aim to complete this year) we’ll want to use forcebalance.

  • Will moving to smee break compatibility with some work that uses alternative functional forms? Should be evaluated.

  • Also, we’ll need to own smee in some form (even if it means forking).

  • JH and DC are also invested in smee.

Simplify representation in toolkit

Publicly visible way of asking for support and getting tickets

  • Issue bot knowing if someone is a partner or not

Make a polymer builder

  • MS group will be working on this, grant goes 3 years

  • Eric Jankowski’s group involved along with Tim

  • MRS: we need to know first what OpenFF wants from polymer builder

Unified logging across more tools

  • JW – I was initially apprehensive to this idea since I thought it was like making new types of objects for provenance of operations, but if it’s more just import logging and printing tons of stuff out, then I’m much more receptive.

  • .

Revisiting LJ Mixing rules (Walden-Hagler)

  • MRS: How much would it help? If you are dealing with LJ that are about the same size it would not help much … We could make this a rotating student project.

  • MT – This could be easy if we want to see these as one-offs, there’s just a few places where people would need to add some code.

  • MRS: push button non-covalent optimizaiton

  • DM: as long as someone wants to do it we should

  • MRS: once we can get cooptimized water, we have the infrastructure to do it.

  • JW: Concern: would this involve getting into a new cluster? My understanding is that things can be hard to set up on a new cluster.

  • MRS: we need more GPUs

  • BM – I set up evaluator on the cu boulder blanca cluster, it was easy to set this up

Track every custom torsion Bespoke Fit and append to long running XML

Minimal reimplementation of SMARTS matching algorithms

Add an example using SAGE to score ligand poses

Give clear instructions to cite the FF

Drop idivfs

  • JW – I might be net opposed to this since it involves a spec change