Info |
---|
This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap can be continuously updated, but the overall status and priorities will be revised and updated in June 2020. |
...
Version | Codename | Features | Expected release date | Comment / Blocker |
---|---|---|---|---|
| Parsley |
| Expected: May 2020 Released: Jun 3, 2020 | |
| Parsley |
| When ready. September-December 2020. | |
| Sage |
| Exepected: Late 2020 (Nov)Release date to be set by November 2020 and will have three month lead time. Likely release between December 2020 and Feb. 2021. |
|
...
Study | Priority | Effort | Science dependencies | Infrastructure dependencies | Comment | Start date | End date | Status | Driver/Team | ||||||||||||||||||||||||
Chemical perception | |||||||||||||||||||||||||||||||||
Addition of new parameters – manually fixing problems |
|
| Made easier by benchmarking dashboard (Optional) | Made easier by benchmarking dashboard (Optional |
| Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated) | |||||||||||||||||||||||||||
Automated typing inference from scratch |
|
| Organise a meeting to coordinate efforts. Update: Tobias Huefner is doing some basic research, but we don’t have a timeline defined here. Perhaps a more specific study to look at typing issues similar to Schauperl’s work on LJ typing. |
| Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Huefner may assist here. Owen Madin interested. Trevor Gokey is also actively working in this area. | ||||||||||||||||||||||||||||
Mixture Properties | |||||||||||||||||||||||||||||||||
| In the writing stage. |
| Driver: Simon Boothroyd | ||||||||||||||||||||||||||||||
|
| Parent study for in a long-term progress stage. |
| Driver:Simon Boothroyd | |||||||||||||||||||||||||||||
Chemical potential-like properties |
| Non-bonded optimization | Implementation in | Need to evaluate the data first (testing needed). Add Confluence page here. |
|
| |||||||||||||||||||||||||||
Solvent-solvent partition coefficients |
| Implementation in |
|
|
| ||||||||||||||||||||||||||||
Data coverage and availability |
| Feasibility studies | Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage. | Ongoing | |||||||||||||||||||||||||||||
QM Data Generation | |||||||||||||||||||||||||||||||||
QM dataset selection (training data) for OpenFF-1.2.0 |
| Need to expand to benchmarking set. |
| ||||||||||||||||||||||||||||||
QM dataset selection for OpenFF-2.0.0 |
|
| |||||||||||||||||||||||||||||||
Benchmarking/re-evaluating our choice of QM theory |
| (Optional) QC Dataset submission infrastructure | Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better.
|
| Hyesu Jang lead; Lee-Ping Wang Hyesu Jang also leading molecule set selection with help from Jessica Maat (Deactivated) and Victoria Lim (Deactivated) | ||||||||||||||||||||||||||||
Protomer/tautomer enumerated molecules |
| QM level of theory validation (QMLoTV) | Protonation/tautomer enumeration integration (Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see
|
|
| ||||||||||||||||||||||||||||
Data selection for ionic species | What kind of experimental data would we need to include charged molecules? |
| |||||||||||||||||||||||||||||||
Data on molecules with nonzero formal charges |
| QM level of theory validation (QMLoTV) | (Optional) QC Dataset submission infrastructure |
| Pavan | ||||||||||||||||||||||||||||
Enamine REAL fragment coverage |
| Automated fragmentation integration Joshua Horton |
| ||||||||||||||||||||||||||||||
Ligand Expo fragment coverage |
| Automated fragmentation integration Joshua Horton | Ligand Expo has higher priority than Enamine Real. |
| |||||||||||||||||||||||||||||
Richer torsion data for WBO fitting |
| WBO torsion implementation |
| (person needed to continue work of Chaya Stern (Deactivated); probably Pavan with input from Jessica Maat (Deactivated) or vise versa. Overseen by Simon Boothroyd ? ) | |||||||||||||||||||||||||||||
Biopolymer data selection (ensure sidechain data is available in QCA) |
| One dataset ready, but a lot more data needs to be generated if we want sidechain sampling |
| ||||||||||||||||||||||||||||||
Biopolymer data computation |
| (Optional) QC Dataset submission infrastructure |
| ||||||||||||||||||||||||||||||
More efficient torsion sampling with less grid points during scan |
|
| |||||||||||||||||||||||||||||||
Fitting | |||||||||||||||||||||||||||||||||
Addition of new parameters – manually fixing problems |
|
| Ongoing |
| Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated) | ||||||||||||||||||||||||||||
LJ refitting (Sage) |
|
| |||||||||||||||||||||||||||||||
WBO refitting (Sage) |
| More torsion data | WBO torsion implementation. Done. | Implement what Chaya has already done. As soon as infrastructure is ready. Done. | After May meeting | Late 2020 (Sep 2020) |
| ||||||||||||||||||||||||||
BCC refitting |
| LJ refit Patterns for BCCs; could start with something simple like bond SMARTS. | ChargeIncrementModel implementation (early May) |
| |||||||||||||||||||||||||||||
Study how to set prior widths and weights for different sorts of data during FF optimization |
| Lee-Ping Wang Hyesu Jang Spinoff? | |||||||||||||||||||||||||||||||
Value of data generated “incidentally” during torsiondrive in fitting, e.g. optimization snapshots, gradients, energies (low control over these data points) |
| Some parts of Bespoke workflow |
|
| |||||||||||||||||||||||||||||
Benchmarking | |||||||||||||||||||||||||||||||||
Small reference system for fast testing of FE infrastructure – 5-10 small reference systems, possibly subset of SAMPL challenges, for comparison of different free energy methods to avoid using large P-L systems for test calculations |
|
| Should use SAMPLing challenge systems plus a couple more similar ones. | ASAP |
| ||||||||||||||||||||||||||||
Benchmarking/re-evaluating our choice of QM theory |
|
|
| ||||||||||||||||||||||||||||||
CCDC data selection/release |
|
| |||||||||||||||||||||||||||||||
Create a list of tests to judge the “quality” of biopolymer FF with our scientific advisory board |
| Organise the meeting with our IAB, invite to May meeting. Done. DC and MS will start conversations to get this going. | April / May |
| |||||||||||||||||||||||||||||
| Minor release of Parsley | Benchmarking dashboard | Done in preprint form, but no benchmarking dashboard. Still need torsion benchmarking; utilize work just done for OpenFF 1.0 paper. JDC is trying to get a complete FE set run by D. Rufa. | Mid 2020 | Done-ish | ||||||||||||||||||||||||||||
| Release of Sage | Benchmarking dashboard | Late 2020 |
| |||||||||||||||||||||||||||||
Biopolymers | |||||||||||||||||||||||||||||||||
Which quantum method should we use for biopolymers (should it be the same as small molecules)? |
| QM benchmarking study | Short term – using the same method and same level of theory as ANI (wB97D) |
| |||||||||||||||||||||||||||||
Feasibility/benchmarking studies of torsional CMAPs |
| After protein FF implementation | CMAP support in OFFTK |
| |||||||||||||||||||||||||||||
Feasibility/benchmarking studies of other cross-terms |
| Support for cross-terms in OFFTK | MS – Importance of cross-terms will be related to a number of types |
| |||||||||||||||||||||||||||||
Charges | |||||||||||||||||||||||||||||||||
GCN charge model |
| In a few steps:
|
| John Chodera Yuanqing Wang Josh Fass (Deactivated) (maybe John Herr) | |||||||||||||||||||||||||||||
Off-site charge SMIRKS definition/fitting/benchmarking |
|
| VirtualSite support in OFFTK | Helpful discussion in Slack: https://openforcefieldgroup.slack.com/archives/C1907SGET/p1590251452068100 Infrastructure expected in September 2020 |
|
| |||||||||||||||||||||||||||
Bayesian inference and surrogate modeling | |||||||||||||||||||||||||||||||||
Testing Bayesian inference on an analytical model |
|
| Nearing completion |
| |||||||||||||||||||||||||||||
Generalizing analytical model for Bayesian inference and testing methods |
|
| Proof-of-concept work to give us an analytical form for early testing |
| Owen Madin (and a student) | ||||||||||||||||||||||||||||
Constructing full Bayesian architecture with reweighting and simulation to build surrogate models |
|
| Analytical Bayesian inference testing | ForceBalance → pytorch, torchMD (timemachine) |
| John Herr Owen Madin (science, not software) | |||||||||||||||||||||||||||
Automated typing inference from scratch |
|
| Bayesian-based typing (Josh Fass’s work) | ||||||||||||||||||||||||||||||
Other | |||||||||||||||||||||||||||||||||
Water co-optimization planning study (to be executed later) – discuss with Lee-Ping Wang |
|
| Lack of bandwidth, potentially Bill Swope could help advise with data selection. |
| |||||||||||||||||||||||||||||
Thinking about metals / ions / salts |
|
| Biologically relevant, will become high priority at some point |
| |||||||||||||||||||||||||||||
Thinking about ionic liquids | |||||||||||||||||||||||||||||||||
Alchemical force fields (for alchemical free energies) |
| Soft core potentials. JDC might have people in his lab working on it, MS is interested to join the effort. | |||||||||||||||||||||||||||||||
Continuous (smearnoff) typing | ESPALOMA |
...