Info |
---|
This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap can be continuously updated, but the overall status and priorities will be revised and updated in June 2020. |
Force Fields
Upcoming force field versions:
Version | Codename | Features | Expected release date | Comment / Blocker |
---|---|---|---|---|
| Parsley |
| Expected: May 2020 Released: Jun 3, 2020 | |
| Parsley |
| When ready. September-December 2020. | |
| Sage |
|
Exepected: Release date to be set by November 2020 and will have three month lead time. Likely release between December 2020 and Feb. 2021. |
|
Scientific studies
The list of scientific studies which need to be performed in 2020, which will be updated every 3 months, as suggested in the science project management workflow. Each study should be linked to its Confluence page with more information about study design, execution and results. The study design should be submitted before study is about to begin.
Estimate start dates and end dates when possible before study has started. Record the real start and end dates for each study below the estimated dates.
Labels
Category | Labels | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Priority |
| ||||||||||||||||||||||||||||
Effort |
| ||||||||||||||||||||||||||||
Status |
|
Study | Priority | Effort | Science dependencies | Infrastructure dependencies | Comment | Start date | End date | Status | Driver/Team | ||||||||||||||||||||||||
Chemical perception | |||||||||||||||||||||||||||||||||
Addition of new parameters – manually fixing problems |
|
| Made easier by benchmarking dashboard (Optional) | Made easier by benchmarking dashboard (Optional |
| Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated) | |||||||||||||||||||||||||||
Automated typing inference from scratch |
|
| Organise a meeting to coordinate efforts. Update: Tobias Huefner is doing some basic research, but we don’t have a timeline defined here. Perhaps a more specific study to look at typing issues similar to Schauperl’s work on LJ typing. |
| Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Huefner may assist here. Owen Madin interested. Trevor Gokey is also actively working in this area. | ||||||||||||||||||||||||||||
Mixture Properties | |||||||||||||||||||||||||||||||||
| In the writing stage. |
| Driver: Simon Boothroyd | ||||||||||||||||||||||||||||||
|
| Parent study for in a long-term progress stage. |
| Driver:Simon Boothroyd | |||||||||||||||||||||||||||||
Chemical potential-like properties |
| Non-bonded optimization | Implementation in | Need to evaluate the data first (testing needed). Add Confluence page here. |
| Simon Boothroyd Spinoff (student)Octanol-water
| |||||||||||||||||||||||||||
Solvent-solvent partition coefficients |
| Implementation in |
|
| Simon Boothroyd spinoff (Student)
| ||||||||||||||||||||||||||||
Data coverage and availability |
| Feasibility studies | Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage. | Ongoing | |||||||||||||||||||||||||||||
QM Data Generation | |||||||||||||||||||||||||||||||||
QM dataset selection (training data) for OpenFF-1.2.0 |
| Need to expand to benchmarking set. |
| ||||||||||||||||||||||||||||||
QM dataset selection for OpenFF-2.0.0 |
|
| David Mobley Jessica Maat (Deactivated) Hyesu Jang | ||||||||||||||||||||||||||||||
Benchmarking/re-evaluating our choice of QM theory |
| (Optional) QC Dataset submission infrastructure | Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better.
|
| Hyesu Jang lead; Lee-Ping Wang Pavan Behara Hyesu Jang also leading molecule set selection with help from Jessica Maat (Deactivated) and Victoria Lim (Deactivated) | ||||||||||||||||||||||||||||
Protomer/tautomer enumerated molecules |
| QM level of theory validation (QMLoTV) | Protonation/tautomer enumeration integration (Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see
|
|
| ||||||||||||||||||||||||||||
Data selection for ionic species | What kind of experimental data would we need to include charged molecules? |
| Lead: Simon Boothroyd (oversight, to assemble team?) | ||||||||||||||||||||||||||||||
Data on molecules with nonzero formal charges |
| QM level of theory validation (QMLoTV) | (Optional) QC Dataset submission infrastructure | Couples to benchmarking |
| ||||||||||||||||||||||||||||
Enamine REAL fragment coverage |
| Automated fragmentation integration ( Joshua Horton |
| ||||||||||||||||||||||||||||||
Ligand Expo fragment coverage |
| Automated fragmentation integration ( Joshua Horton | Ligand Expo has higher priority than Enamine Real. |
| |||||||||||||||||||||||||||||
Richer torsion data for WBO fitting |
| WBO torsion implementation |
| (person needed to continue work of Chaya Stern (Deactivated); probably Will be Pavan Behara with input from Jessica Maat (Deactivated) or vise versa. Overseen by Simon Boothroyd ? , eventually coordinating with Simon Boothroyd as he moves to his new position. ) | |||||||||||||||||||||||||||||
Biopolymer data selection (ensure sidechain data is available in QCA) |
| One dataset ready, but a lot more data needs to be generated if we want sidechain sampling |
| ||||||||||||||||||||||||||||||
Biopolymer data computation |
| (Optional) QC Dataset submission infrastructure |
| ||||||||||||||||||||||||||||||
More efficient torsion sampling with less grid points during scan |
|
| |||||||||||||||||||||||||||||||
Fitting | |||||||||||||||||||||||||||||||||
Addition of new parameters – manually fixing problems |
|
| Ongoing |
| Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated) | ||||||||||||||||||||||||||||
LJ refitting (Sage) |
|
| |||||||||||||||||||||||||||||||
WBO refitting (Sage) |
| More torsion data | WBO torsion implementation: Done. | Implement what Chaya has already done. As soon as infrastructure is ready. Infrastructure ready. | After May meeting | Late 2020 (Sep 2020) |
| Jessica Maat (Deactivated) Hyesu Jang Someone else to continue where Chaya left it off and Pavan Behara . Pavan Behara will eventually oversee. | |||||||||||||||||||||||||
BCC refitting |
| LJ refit Patterns for BCCs; could start with something simple like bond SMARTS. | ChargeIncrementModel implementation (early May)Person needed ( |
| spinoff | ) David Mobley can help
|
Study how to set prior widths and weights for different sorts of data during FF optimization
Status | |||||
---|---|---|---|---|---|
|
Became higher priority due to need for fixes. Headed towards a 1.3 release late 2020.
Status | ||||
---|---|---|---|---|
|
Lee-Ping Wang Hyesu Jang Spinoff?
Value of data generated “incidentally” during torsiondrive in fitting, e.g. optimization snapshots, gradients, energies (low control over these data points)
Status | ||||
---|---|---|---|---|
|
Some parts of Bespoke workflow OpenFF Bespokefit
Once we have more people working on fitting, someone can run this study
Status | ||||
---|---|---|---|---|
|
Benchmarking
Small reference system for fast testing of FE infrastructure – 5-10 small reference systems, possibly subset of SAMPL challenges, for comparison of different free energy methods to avoid using large P-L systems for test calculations
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Should use SAMPLing challenge systems plus a couple more similar ones.
ASAP
Status | ||
---|---|---|
|
Benchmarking/re-evaluating our choice of QM theory
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
See above; to be done while fitting 2.0 (datasets ready). Hyesu Jang leading.
Status | ||
---|---|---|
|
CCDC data selection/release
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Create a list of tests to judge the “quality” of biopolymer FF with our scientific advisory board
Status | ||||
---|---|---|---|---|
|
Organise the meeting with our IAB, invite to May meeting. Done.
DC and MS will start conversations to get this going.
April / May
Status | ||||
---|---|---|---|---|
|
openff-1.2.0
(Parsley) benchmarking
Minor release of Parsley
Benchmarking dashboard
Done in preprint form, but no benchmarking dashboard. Still need torsion benchmarking; utilize work just done for OpenFF 1.0 paper.
JDC is trying to get a complete FE set run by D. Rufa.
Mid 2020
Done-ish
openff-2.0.0
(Sage) benchmarking
Release of Sage
Benchmarking dashboard
Late 2020
Status | ||
---|---|---|
|
Biopolymers
Which quantum method should we use for biopolymers (should it be the same as small molecules)?
Status | ||||
---|---|---|---|---|
|
QM benchmarking study
Short term – using the same method and same level of theory as ANI (wB97D)
Status | ||
---|---|---|
|
Feasibility/benchmarking studies of torsional CMAPs
Status | ||||
---|---|---|---|---|
|
After protein FF implementation
CMAP support in OFFTK
Status | ||
---|---|---|
|
Feasibility/benchmarking studies of other cross-terms
Status | ||||
---|---|---|---|---|
|
Support for cross-terms in OFFTK
MS – Importance of cross-terms will be related to a number of types
Status | ||
---|---|---|
|
Charges
GCN charge model
Status | ||||
---|---|---|---|---|
|
In a few steps:
conda-installable tool to assign charges
integration of tool into OFFTK under ChargeIncrementModel keyword (and exposure of relevant keywords)
Status | ||||
---|---|---|---|---|
|
John Chodera Yuanqing Wang Josh Fass (Deactivated) (maybe John Herr)
Off-site charge SMIRKS definition/fitting/benchmarking
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
VirtualSite support in OFFTK
Helpful discussion in Slack: https://openforcefieldgroup.slack.com/archives/C1907SGET/p1590251452068100
Infrastructure expected in September 2020
Status | ||
---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Bayesian inference and surrogate modeling
Testing Bayesian inference on an analytical model
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Nearing completion
Status | ||||
---|---|---|---|---|
|
Generalizing analytical model for Bayesian inference and testing methods
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
We don’t need Bayesian framework to work immediatelyProof-of-concept work to give us an analytical form for early testing
Status | ||||
---|---|---|---|---|
|
slower
Owen Madin (and a student)
Constructing full Bayesian architecture with reweighting and simulation to build surrogate models
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Analytical Bayesian inference testing
ForceBalance → pytorch, torchMD (timemachine)
Status | ||
---|---|---|
|
Simon Boothroyd Owen Madin Matt Thompson ?John Herr
Owen Madin (science, not software)
Automated typing inference from scratch
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Full-time person needed – to be discussed further. Work of Bayesian-based typing (Josh Fass’s work)
Josh Fass (Deactivated) and Tobias Wulsdorf may assist here. → Tobias Huefner
Other
Water co-optimization planning study (to be executed later) – discuss with Lee-Ping Wang
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Lack of bandwidth, potentially Bill Swope could help advise with data selection.
Status | ||||
---|---|---|---|---|
|
Thinking about metals / ions / salts / ionic liquids
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Biologically relevant, will become high Owen Madin Matt Thompson spinoffpriority at some point
Status | ||||
---|---|---|---|---|
|
Thinking about ionic liquids
Alchemical force fields (for alchemical free energies)
Status | ||||
---|---|---|---|---|
|
Status | ||||
---|---|---|---|---|
|
Soft core potentials. JDC might have people in his lab working on it, MS is interested to join the effort.
Continuous (smearnoff) typing
ESPALOMA