...
Idea | Discussion | Effort | Value | Risk | Dependencies | Resources | Timescale | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Add example workflows for non-standard uses |
| - | - | - | ||||||||
Cover proteins, PTMs/NCAAs | Ongoing | High | High | Medium | Proteins: Chapin PTMs/NCAAs: Infrastructure | Proteins: 4 years PTMS: 1 year | ||||||
Expansion of virtual sites and neural network charges | Ongoing | High | High | High | Fix Evaluator Better, faster fitting | Science team Cole group | vsites: 3 year NAGL: 2 years | |||||
Expand quality to nucleic acids, lipids | Ongoing | High | High | High | Shirts group Ken Takaba Staff | 3 years | ||||||
Co-optimized water |
| Medium | High | Medium | Shirts Group Science | 2 years | ||||||
Make a GUI/system builder |
| High | High | High | ||||||||
Get involved directly with drug discovery |
| High | High | Low if we watch for opportunities, higher if we push it | Identify opportunities for fee-for-service | OMSF ASAP | ||||||
Better logging in Evaluator |
| Medium | High | Low | Infrastructure | 1 year | ||||||
Adding protocols for osmotic coefficients |
| Medium | High | Low | Adding protocol Infrastructure Shirts Group | 1 year | ||||||
Adding protocol for radial distribution functions |
| Medium | High | Low | Infrastructure Shirts group | 2 year | ||||||
Tune ion parameters FFs |
| High | High | Low | Adding protocols for osmotic coefficients , Adding protocol for radial distribution functions | Shirts group | 2 year | |||||
ML potentials FF development |
| High | High | High | 5 year | |||||||
Clearer communication about strategic goals |
| Low | High | Medium | Staff Leadership | 1 year | ||||||
|
|
|
| |||||||||
Write our own solvated topology builder |
| High | High | Medium | Evaluator needs this | PackMol team Infrastructure | 1 year | |||||
Automatically typed parameters with BeSMARTS |
| High | High | High | Smee | Science Cole Group? | 2 year | |||||
Espaloma Typing |
| High | High | Medium | Science Infrastructure Cole group? | 5 year | ||||||
Get a better benchmarking set which uses all parameters |
| Medium | High | Low | Provide minimal estimate of uncertainty of parameterized molecule, | Science Industry partners Shirts group | 1 year | |||||
Provide minimal estimate of uncertainty of parameterized molecule, |
| High | High | Smee | Science Cole group | |||||||
Automatic iteration process for generating new QC data and fitting a FF | DC says he can take on most of the effort
| High | High | High | Docked conformation strain estimator using nagl |
| High | High | High | Direct polarization supported in infrastructureActive learning for generating new QC data depends on automated fitting | Infrastructure Science | Cleaning up scripts to improve automated fitting: 1 year Active learning: 5 year |
Docked conformation strain estimator using nagl | DC says he can take on most of the effort | High | High | High | Cole group | |||||||
Direct polarization supported in infrastructure |
| medium | high | low | Further discussion | |||||||
Switch to using smee instead of ForceBalance |
| High | High+ | Medium | De-risk interaction with evaluator | Cole group Staff Achira / Simon Josh Horton? | 2 years | |||||
Simplify representations in toolkit |
| Medium | Low | Low to med | ||||||||
Publicly visible way of asking for support and getting tickets | Low | Low | Low | Issue bot that reminds us to respond to partners and reminds non-pertners of the benefits of joining | ||||||||
Making a polymer builder |
| Medium | Medium | Low | Clearer idea of intrerest needs | Shirts Infrastructure | 2 years | |||||
Unified logging across more tools |
| Medium | Medium | Medium | Infrastructure | 1 year | ||||||
Class 2 FF |
| High | Medium | High | ||||||||
Revisit LJ Mixing rules (Walden-Hagler alternative) |
| Medium | Medium | High | 3 year | |||||||
Distribute NAGL differently |
| Low | Medium | Medium | Staff | 1 year | ||||||
Track every custom torsion in Bespoke Fit and append to long running XML |
| Medium | Medium | High | ||||||||
Minimal reimplementation of SMARTS matching algorithms |
| High | Medium | Low | ||||||||
Add an example using SAGE to score ligand poses | Low | Medium | Low | Someone needs to do this | ||||||||
Give clear instructions to cite the FF | Low | Medium | Low | Drop ividfsInfrastructure Leadership | 1 year | |||||||
Drop idivfs | Medium | Medium | Low | SMIRNOFF spec change | Infrastructure SMIRNOFF committee | 1 year | ||||||
|
| |||||||||||
Write more blog posts | ||||||||||||
Make it possible to fit a FF in a day or less | ||||||||||||
Jenn’s LJ Mixing adding r^4 term | High | |||||||||||
Fit FF in a Day |
| Done (if we switch to smee) | ||||||||||
Doing a benchmark set | High | |||||||||||
Refit BCCs instead of using AM1BCC |
| |||||||||||
Try new charge models with new funcitonal forms | Already in progress | |||||||||||
Make a concerted effort to drum up materials science funding by putting together a proposal | Out of scope for consortium, PIs should do it. And when we find people who have used OpenFF tools put them in contact with project leadership. | |||||||||||
...
DM – No, only the ones that look promising.
One of the ways we don’t make stuff scalable is by bringing everything in. Dropping dependencies is a better strategy.
If we did WBO again, would have spent more time derisking science before integrating infrastructure.
As we invest effort into an idea and it continues not working, estimate of risk tends to go up.
As a project becomes worth looking at integration, we define metrics it should it to be integrated
Project milestones are great, but Jeff also needs expectation of how much time he should allocate from infrastructure team
James: PTMs are an example of how OpenFF is interfacing with academic teams. Previously, Chapin was doing most of protein FF work; more recently staff has been providing more assistance to reduce barriers to his work.
JW: Was mostly positive, but infrastructure added extra complexity for reproducibility as a cost for speeding things up.
James: We’ll have a workflow in a few hours, but protein FF has been a 1-year project for a long time.
Expansion of virtual sites and partial charges (ongoing work):
Is this the best way to approach it? Are there academic collaborators?
LW: Maybe someone from the Danny Cole group has been working on it?
James: It’s been going on for a long time. Is this still something critical?
DLM: It’s not existential. We haven’t shown that it improves accuracy yet.
What experiments have been done on this, and where would we find the data summarized?
Data is in slides, not papers yet, maybe could write a blog post
It seems like this info isn’t discoverable within our current materials/workflow.
There are slide decks floating around somewhere from Lily
LW: After training parameters as well and adding virtual sites
Should there be a dashboard of active projects and status? This gets into our communications strategy issues.
JE: When to pull the plug on virtual sites?
It looks like in some cases vsites show improvements, TG is following up on some of this work.
Once evaluator is fixed, it should be possible to restart this work in earnest.
We should prioritize things that have the most potential and go for them full-steam, instead of waffling on things repeatedly.
This is why it got deprioritized - Initial improvements turned out to be due to adding a new vdw type during vsite refits, not actually adding the vsite itself.
LW: hasn’t been progress in the last months
NAGL 2.0
Have discussed second-generation NAGL trained on higher level of theory or virtual sites. Is this a priority? Does it seem like it will give us significant additional improvement?
LW: growing evidence that some ff problems we see come from bad charges. I think NAGL 2 would deliver moderate amount of value.
Expand force field quality to nucleic acids and lipids
Lipids is ongoing, but no work started on nucleic acids yet
Ken Takaba has created RNA datasets already within QCArchive
Moderna has contributed specifically to fund this effort, so we are obligated to allocate staff time to it (outside of ongoing effort helping with datasets)
JE: We should allocate staff time to this area because we have money. There is no one currently assigned. Probably a multi year undertaking.
Co-optimization of water
No new features needed in evaluator since we’ll throw out highly viscous compounds. BM is ready to start calcs for new water models. LW has been major staff resource so far. LW and BM agree they should formalize meetings to help this project progress
First step would be to fix some errors in Evaluator
Next step could be to refit using enthalpies of mixing, densities, etc to start co-optimization
There’s an existing workflow in ForceBalance to do this
One Q is which charges we use - library vs. nagl?
Library would be easier during optimization
NAGL would be fully self-consistent
Could fit NAGL charges to condensed phase properties
Need MRS input
GUI/System Builder
Don’t do - Scope would be huge
Get involved directly with drug discovery
The action items are to watch for opportunities and let Karmen know
Better logging in evaluator
MT could work with scientists and add this pretty easily
Adding protocols for osmotic coeffs/rdfs
osmotic coeffs are more valuable than rdf, so if we only do one, let’s do that.
BM has the implementation in mind, would need help with implementation.
Q: How does this relate to fitting? Would we be fitting to osmotic coeffs, or just benchmarking to them?Just benchmarking
This proposal is discussing adding the feature to evaluator, not just a benchmarking suite
BM/Shirts lab should have the capacity to work on this project
ML potentials : Treating (multibody) valence terms with learnable neural functional forms
Who would do this?
No takers currently
OpenFF is interested in looking into this potentially, but not in the short term
Slow for now; class II force fields much more tractable in near term
Clearer communication about strategic goals
Topic of this afternoon’s discussion (after coffee break)
Write our own solvated topology builder / engage with PackMol
Best option is probably to engage with packmol
What do we hope to get from this engagement? I think we need to acknowledge that we own this part of people’s workflows. So one option is to re-implement packmol algorithm in rust. Another is to make our own packmol wrapper that we fully commit too. But this causes MT a ton of pain.
JenC: Users providing an equilibrated box
MT: That's not an option we can continue
JW: I disagree. There's other ways to get equilibrated boxes. Its a big thing to own.
JenC: remaking mbuild for ourselves. Has a nice wrapper. Enhanced MC. That approach would be great because of the process of equilibrating the boxes. Huge task.
MRS: Box packing of rigid and flexible. packmol sees to be the best for rigid.
DM: Box packing of things we use in fitting or box things that others want to pack. we cant be doing arbitrary things.
MRS: can we reach out to packmol people
JW: we have things that packmol is not suitable for and we have things that np is suitable for
DM: We should not pack nor pack things that anyone wants. we should focus on improving out ability that packs what we care about.
JenC: There is money for polymer builder but not for this.
JE: we cannot take on something that meets everybodys needs
Automatically typed parameters with BeSMARTS
MRS: Julianne already has tools for this
JE: dependent on adopting Smee.
LW: larger in the science team. Cole group could be interested.
JW: Potentially Josh Mitchell could contribute to this, since he’ll have just come off the BeSMARTS documentation effort
Espaloma / NAGL++
Likely would extend NAGL instead of adopting espaloma/espalomax
Would need to be a joint infrastructure/science effort
Cole group was asking enough questions about NAGL that suggests they may be working on something similar
LW: It’s a long term plan
MRS: Many future things we would like to do would be a lot easier in NAGL
Get a better benchmarking set that uses all parameters
Ongoing, mostly science team
Does xtalpi strategy of fragmenting pubchem help here?
LW – It seemed promising
DM – Could JCl help?
JCl – Yes, though I’m on a one year timeline
JH – I could help too.
Provide minimal estimate of uncertainty of parameterized molecule,
Fit an ensemble of models with any tool, though likely requires SMEE or related faster tool
Could be run by anybody running force field experiments if fitting is fast enough
Would require more statistical rigor of training and test sets, and assessing calibration of error on held-out test set
Check with Danny Cole, since this is something they could start working on with SMEE
Automatic iteration process for generating new QC data and fitting a FF
Slight misnomer - The thinking is to detect both valence and nonbonded terms that need improvement and improve them
Need to split into valence and nonbonded fitting
We haven’t fit valence and nonbonded at the same time yet
We have infrastructure to fit QM and condensed phase data simultaneously, but it isn’t automated
Assess what parameters are bad, what data is missing?
LW: There are some scripts, Matt referring to the streamlining ones. A collection, not a simple thing.
MT: After force field is fit run benchmarks. a molecule will be outlier and we spend time in that. an outcome of that is that we… moiety underrepresenting your data set.
…
LW: Getting QC data would require human steps
MRS: we got scripts that do it but the failing modes are large. its manual bc when you run these series of scripts they can fail in different ways?
LW: they consider exactly what we put in molssi
JCh:
DM:
JenC: 1 pull request for molecule its a lot more work. It requires a lot of time that isn't automatable. Looking at number of basis sets is better. We would have to have an automated system … A lot of infrastructure there to make it automated.
JW: we dont have resources right now. Automation is going in right direction.
Docked conf strain generator using NAGL
Docked conformation strain estimator using nagl
Direct polarization supported in infrastructure
We already have this, essentially
Would be great to have the spec in OpenFF XML instead of plugging it in
MT: What level of support from OpenFF infrastructure is needed here?
Adding to spec creates expectation that we put it in toolkit, then that it
Could it sit in a PR until we’re ready to integrate it?
JW: Seems close to phase 1 completion; stage 2 might be to bring it into OpenFF SMIRNOFFPlugins; stage 3 would be SMIRNOFF spec
Question for a future meeting: Should this come into the SMIRNOFF plugins repo?
Switch to using smee instead of ForceBalance
Experiences using smee?
Used it 1-1.5 years ago, was getting k=0 for some torsions
Sounds like torsion issues were removed. smee is a lightweight piece of code. Includes descent (another package) as well. Evaluator isn’t super special, a lot of it is made to handle distributed compute, and we can keep the scientific capabilities without the distributed compute stuff.
Would we do co-optimization of water using smee or forcebalance?
Given timeline of water project (aim to complete this year) we’ll want to use forcebalance.
Will moving to smee break compatibility with some work that uses alternative functional forms? Should be evaluated.
Also, we’ll need to own smee in some form (even if it means forking).
JH and DC are also invested in smee.
Simplify representation in toolkit
Publicly visible way of asking for support and getting tickets
Issue bot knowing if someone is a partner or not
Make a polymer builder
MS group will be working on this, grant goes 3 years
Eric Jankowski’s group involved along with Tim
MRS: we need to know first what OpenFF wants from polymer builder
Unified logging across more tools
JW – I was initially apprehensive to this idea since I thought it was like making new types of objects for provenance of operations, but if it’s more just
import logging
and printing tons of stuff out, then I’m much more receptive..
Revisiting LJ Mixing rules (Walden-Hagler)
MRS: How much would it help? If you are dealing with LJ that are about the same size it would not help much … We could make this a rotating student project.
MT – This could be easy if we want to see these as one-offs, there’s just a few places where people would need to add some code.
MRS: push button non-covalent optimizaiton
DM: as long as someone wants to do it we should
MRS: once we can get cooptimized water, we have the infrastructure to do it.
JW: Concern: would this involve getting into a new cluster? My understanding is that things can be hard to set up on a new cluster.
MRS: we need more GPUs
BM – I set up evaluator on the cu boulder blanca cluster, it was easy to set this up
Track every custom torsion Bespoke Fit and append to long running XML
Minimal reimplementation of SMARTS matching algorithms
Add an example using SAGE to score ligand poses
Give clear instructions to cite the FF
Drop idivfs
JW – I might be net opposed to this since it involves a spec change