This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap can be continuously updated, but the overall status and priorities will be revised and updated in June 2020.
Force Fields
Upcoming force field versions:
Version | Codename | Features | Expected release date |
---|---|---|---|
| Parsley |
| May 2020 |
| Sage |
| Late 2020 (November) |
Scientific studies
The list of scientific studies which need to be performed in 2020, which will be updated every 3 months, as suggested in the science project management workflow. Each study should be linked to its Confluence page with more information about study design, execution and results. The study design should be submitted before study is about to begin.
Estimate start dates and end dates when possible before study has started. Record the real start and end dates for each study below the estimated dates.
Labels
Category | Labels |
---|---|
Priority | HIGH | MEDIUM | LOW |
Effort | HIGH | MEDIUM | LOW |
Status | NOT STARTED | IN PROGRESS | COMPLETED |
Study | Priority | Effort | Science dependencies | Infrastructure dependencies | Comment | Start date | End date | Status | Driver/Team |
Chemical perception | |||||||||
Addition of new parameters – manually fixing problems | HIGH | HIGH | Made easier by benchmarking dashboard (Optional) | Made easier by benchmarking dashboard (Optional | IN PROGRESS | Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated) | |||
Automated typing inference from scratch | MEDIUM | HIGH | Organise a meeting to coordinate efforts. | Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Huefner may assist here. Owen Madin interested. | |||||
Mixture Properties | |||||||||
HIGH | IN PROGRESS | Driver: Simon Boothroyd | |||||||
HIGH | HIGH | IN PROGRESS | Driver:Simon Boothroyd | ||||||
Chemical potential-like properties | MEDIUM | Non-bonded optimization | Implementation in | Need to evaluate the data first (testing needed) | IN PROGRESS | Simon Boothroyd Spinoff (student) | |||
Octanol-water partition coefficients | MEDIUM | Implementation in | Data needed, harder problem | NOT STARTED | Simon Boothroyd spinoff (Student) | ||||
Data coverage and availability | HIGH | Feasibility studies | Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage. | Ongoing | |||||
QM Data Generation | |||||||||
QM dataset selection (training data) | HIGH | Need to expand to benchmarking set. | IN PROGRESS | ||||||
Benchmarking/re-evaluating our choice of QM theory | HIGH | (Optional) QC Dataset submission infrastructure | Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better. | IN PROGRESS | Hyesu Jang lead; Lee-Ping Wang Hyesu Jang also leading molecule set selection with help from Jessica Maat (Deactivated) and Victoria Lim (Deactivated) | ||||
Protomer/tautomer enumerated molecules | HIGH | QM level of theory validation (QMLoTV) | Protonation/tautomer enumeration integration (Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see ) | ||||||
Data on molecules with nonzero formal charges | HIGH | QM level of theory validation (QMLoTV) | (Optional) QC Dataset submission infrastructure | ||||||
Enamine REAL fragment coverage | MEDIUM | Automated fragmentation integration (Joshua Horton | |||||||
Ligand Expo fragment coverage | MEDIUM | Automated fragmentation integration (Joshua Horton | Ligand Expo has higher priority than Enamine Real. | ||||||
Richer torsion data for WBO fitting | LOW | WBO torsion implementation | (person needed to continue work of Chaya Stern (Deactivated); probably Pavan with input from Jessica Maat (Deactivated) or vise versa. Overseen by Simon Boothroyd ? ) | ||||||
Biopolymer data selection (ensure sidechain data is available in QCA) | HIGH | ASAP | |||||||
Biopolymer data computation | MEDIUM | (Optional) QC Dataset submission infrastructure | |||||||
More efficient torsion sampling with less grid points during scan | LOW | SPINOFF | |||||||
Fitting | |||||||||
Addition of new parameters – manually fixing problems | HIGH | HIGH | Ongoing | IN PROGRESS | Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated) | ||||
LJ refitting (Sage) | HIGH | IN PROGRESS | |||||||
WBO refitting (Sage) | HIGH | More torsion data | WBO torsion implementation | Implement what Chaya has already done. As soon as infrastructure is ready. | After May meeting | Late 2020 (Sep 2020) | Jessica Maat (Deactivated) Hyesu Jang Someone else to continue where Chaya left it off | ||
BCC refitting | HIGH | LJ refit Patterns for BCCs; could start with something simple like bond SMARTS. | ChargeIncrementModel implementation (early May) | Person needed (SPINOFF) David Mobley can help | |||||
Study how to set prior widths and weights for different sorts of data during FF optimization | LOW | Lee-Ping Wang Hyesu Jang Spinoff? | |||||||
Value of data generated “incidentally” during torsiondrive in fitting, e.g. optimization snapshots, gradients, energies (low control over these data points) | LOW | Some parts of Bespoke workflow | Joshua Horton SPINOFF | ||||||
Benchmarking | |||||||||
Small reference system for fast testing of FE infrastructure – 5-10 small reference systems, possibly subset of SAMPL challenges, for comparison of different free energy methods to avoid using large P-L systems for test calculations | HIGH | LOW | Should use SAMPLing challenge systems plus a couple more similar ones. | ASAP | NOT STARTED | ||||
Benchmarking/re-evaluating our choice of QM theory | HIGH | MEDIUM | NOT STARTED | ||||||
CCDC data selection/release | LOW | SPINOFF | |||||||
Create a list of tests to judge the “quality” of biopolymer FF with our scientific advisory board | MEDIUM | Organise the meeting with our IAB, invite to May meeting | April / May | ||||||
| Minor release of Parsley | Benchmarking dashboard | Done in preprint form, but no benchmarking dashboard. Still need torsion benchmarking; utilize work just done for OpenFF 1.0 paper. | Mid 2020 | Done-ish | ||||
| Release of Sage | Benchmarking dashboard | Late 2020 | ||||||
Biopolymers | |||||||||
Which quantum method should we use for biopolymers (should it be the same as small molecules)? | MEDIUM | QM benchmarking study | |||||||
Feasibility/benchmarking studies of torsional CMAPs | MEDIUM | After protein FF implementation | CMAP support in OFFTK | ||||||
Feasibility/benchmarking studies of other cross-terms | LOW | Support for cross-terms in OFFTK | |||||||
Charges | |||||||||
GCN charge model | HIGH | In a few steps:
| IN PROGRESS | John Chodera Yuanqing Wang | |||||
Off-site charge SMIRKS definition/fitting/benchmarking | MEDIUM | HIGH | VirtualSite support in OFFTK | Helpful discussion in Slack: https://openforcefieldgroup.slack.com/archives/C1907SGET/p1590251452068100 | SPINOFF (but interface with David Cerutti (Deactivated) work?) | ||||
Bayesian inference and surrogate modeling | |||||||||
Testing Bayesian inference on an analytical model | MEDIUM | LOW | IN PROGRESS | ||||||
Generalizing analytical model for Bayesian inference and testing methods | MEDIUM | MEDIUM | We don’t need Bayesian framework to work immediately | IN PROGRESS | |||||
Constructing full Bayesian architecture with reweighting and simulation to build surrogate models | MEDIUM | HIGH | Analytical Bayesian inference testing | NOT STARTED | |||||
Automated typing inference from scratch | HIGH | HIGH | Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Wulsdorf may assist here. | ||||||
Other | |||||||||
Water co-optimization planning study (to be executed later) – discuss with Lee-Ping Wang | LOW | HIGH | spinoff | ||||||
Thinking about metals / ions / salts / ionic liquids | LOW | HIGH | Owen Madin Matt Thompson spinoff |