OpenFF Science Roadmap 2020
This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap will be continuously updated.
Force Fields
Upcoming force field versions:
Version | Codename | Features | Expected release date | Comment / Blocker |
|---|---|---|---|---|
| Parsley |
| Expected: May 2020 Released: Jun 3, 2020 |
|
| Parsley |
| When ready. September-December 2020. |
|
| Sage |
| Exepected: Release date to be set by November 2020 and will have three month lead time. Likely release between December 2020 and Feb. 2021. |
|
Scientific studies
The list of scientific studies which need to be performed in 2020, which will be updated every 3 months, as suggested in the science project management workflow. Each study should be linked to its Confluence page with more information about study design, execution and results. The study design should be submitted before study is about to begin.
Estimate start dates and end dates when possible before study has started. Record the real start and end dates for each study below the estimated dates.
Labels
Category | Labels |
|---|---|
Priority | high | MEDIUM | LOW |
Effort | high | MEDIUM | LOW |
Status | Not started | In Progress | PROTOTYPE | Completed | BLOCKED | |
Study | Priority | Effort | Science dependencies | Infrastructure dependencies | Comment | Start date | End date | Status | Driver/Team |
Chemical perception | |||||||||
Addition of new parameters – manually fixing problems
| high | high |
| Made easier by benchmarking dashboard (Optional) | Made easier by benchmarking dashboard (Optional |
|
| in progress | @Hyesu Jang @David Mobley @Jessica Maat (Deactivated) @Victoria Lim (Deactivated) |
Automated typing inference from scratch | Medium | high |
|
| Organise a meeting to coordinate efforts. Update: @Tobias Huefner is doing some basic research, but we don’t have a timeline defined here. Perhaps a more specific study to look at typing issues similar to Schauperl’s work on LJ typing. |
|
| in progressSlowly | Full-time person needed – to be discussed further. Work of @Josh Fass (Deactivated) and @Tobias Huefner may assist here. @Owen Madin interested. @Trevor Gokey is also actively working in this area. |
Mixture Properties | |||||||||
high |
|
|
| In the writing stage. |
|
| COMPLETED | Driver: @Simon Boothroyd | |
high | high |
|
| Parent study for in a long-term progress stage. |
|
| in progress | Driver:@Simon Boothroyd | |
Chemical potential-like properties | MEdium |
| Non-bonded optimization | Implementation in | Need to evaluate the data first (testing needed). Add Confluence page here. |
|
| In Progress PROTOTYPE | @Simon Boothroyd spinoff |
Solvent-solvent partition coefficients | MEdium |
|
| Implementation in |
|
|
| not started | @Simon Boothroyd spinoff |
Data coverage and availability | high |
| Feasibility studies |
| Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage. |
|
| Ongoing | @Simon Boothroyd @Owen Madin @Michael Shirts |
QM Data Generation | |||||||||
QM Data Selection (training data) for OpenFF-1.2.0 | high |
|
|
| Need to expand to benchmarking set. |
|
| completed | @David Mobley @Jessica Maat (Deactivated) @Hyesu Jang |
QM dataset selection for OpenFF-2.0.0 | high |
|
|
|
|
|
| in progress | @David Mobley @Jessica Maat (Deactivated) @Hyesu Jang |
Benchmarking/re-evaluating our choice of QM theory | high |
|
| (Optional) QC Dataset submission infrastructure | Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better.
|
|
| in progress | @Hyesu Jang lead; @Lee-Ping Wang @Pavan Behara @Hyesu Jang also leading molecule set selection with help from @Jessica Maat (Deactivated) and @Victoria Lim (Deactivated) |
Protomer/tautomer enumerated molecules | high |
| QM level of theory validation (QMLoTV) | Protonation/tautomer enumeration integration (@Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see |
|
|
| PRototyped | @Joshua Horton |
Data selection for ionic species |
|
|
|
| What kind of experimental data would we need to include charged molecules? |
|
| NOt started | Lead: @Simon Boothroyd (oversight, to assemble team?) |
Data on molecules with nonzero formal charges | high |
| QM level of theory validation (QMLoTV) | (Optional) QC Dataset submission infrastructure | Couples to benchmarking |
|
| NOt started | @Pavan Behara |
Enamine REAL fragment coverage | MEDIUM |
|
| Automated fragmentation integration @Joshua Horton |
|
|
| in progress | @Trevor Gokey |
Ligand Expo fragment coverage | MEDIUM |
|
| Automated fragmentation integration @Joshua Horton | Ligand Expo has higher priority than Enamine Real. |
|
| NOT started |
|
Richer torsion data for WBO fitting | Low |
|
| WBO torsion implementation |
|
|
|
| (person needed to continue work of @Chaya Stern (Deactivated); Will be @Pavan Behara with @Jessica Maat (Deactivated) , eventually coordinating with @Simon Boothroyd as he moves to his new position. ) |
Biopolymer data selection (ensure sidechain data is available in QCA) | high |
|
|
| One dataset ready, but a lot more data needs to be generated if we want sidechain sampling |
|
| in progress | @David Cerutti (Deactivated) |
Biopolymer data computation | MEDIUM |
|
| (Optional) QC Dataset submission infrastructure |
|
|
| in progress | @David Cerutti (Deactivated) @David Dotson |
More efficient torsion sampling with less grid points during scan | Low |
|
|
|
|
|
|
| spinoff |
Fitting | |||||||||
Addition of new parameters – manually fixing problems | high | high |
|
|
| Ongoing |
| in progress | @Hyesu Jang @David Mobley @Jessica Maat (Deactivated) @Victoria Lim (Deactivated) |
LJ refitting (Sage) | high |
|
|
|
|
| in progress | @Simon Boothroyd and @Owen Madin | |
WBO refitting (Sage) | high |
| More torsion data | WBO torsion implementation: Done. | Implement what Chaya has already done. Infrastructure ready. @Pavan Behara and @Jessica Maat (Deactivated) to divide up chemical space for fitting/test fixes. @Hyesu Jang will do central fitting of production FF. Additional discussion: https://openforcefield.atlassian.net/wiki/spaces/OFFO/pages/670629936/90-day+plan+Onboarding+for+Pavan+Behara?accessType=view&grantAccess=true&username=5f592e4d0b6be2006ef56679&userFullName=Pavan%20Behara#Notes-from-9%2F16%2F20-planning-meeting-(feel-free-to-migrate) | After May meeting | Late 2020 (Sep 2020) | in progress | @Jessica Maat (Deactivated) and @Pavan Behara . @Pavan Behara will eventually oversee. |
BCC refitting | high |
| LJ refit Patterns for BCCs; could start with something simple like bond SMARTS. | ChargeIncrementModel implementation (early May) |
|
|
| in progress | @Simon Boothroyd @Owen Madin |
Study how to set prior widths and weights for different sorts of data during FF optimization | medium |
| |||||||