Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap can be continuously updated, but the overall status and priorities will be revised and updated in June 2020.

Force Fields

Upcoming force field versions:

  • How fast can we do WBO interpolations (Pavan)

  • Simon Boothroyd needs

    Version

    Codename

    Features

    Expected release date

    Comment / Blocker

    openff-1.2.0

    Parsley

    • Redesigned QM dataset for parameterization with better/broader coverage

    • Parameter fixes

    Expected: May 2020 / June

    Released: Jun 3, 2020

    openff-21.03.0

    Sage

    • LJ refit (based on the ongoing feasibility study)

    • Limited WBO torsion interpolation for systems for which data already exists (more torsional data needed for a wide range application)

    Late 2020 (November)

    Parsley

    • Improvements to priors by element, etc., allowing for more accurate fitting

    • Filtering of molecules used in fitting certain torsions (esp. for amides) to improve parameter quality (came up in the progress of fixing a problem elsewhere)

    • Discussed in this release call: 2020-09-17 FF Release Meeting notes

    When ready. September-December 2020.

    openff-2.0.0

    Sage

    • LJ refit (based on the ongoing feasibility study)

    • Limited WBO torsion interpolation for systems for which data already exists (more torsional data needed for a wide range application)

    Exepected: Release date to be set by November 2020 and will have three month lead time. Likely release between December 2020 and Feb. 2021.

    • How fast can we do WBO interpolations (Pavan)

    • Simon Boothroyd needs to get in touch with David Hahn and folks from the Chodera lab to discuss some free energy benchmarking after LJ fitting

    • Late 2020/early 2021 still feasible

    Scientific studies

    The list of scientific studies which need to be performed in 2020, which will be updated every 3 months, as suggested in the science project management workflow. Each study should be linked to its Confluence page with more information about study design, execution and results. The study design should be submitted before study is about to begin.

    Estimate start dates and end dates when possible before study has started. Record the real start and end dates for each study below the estimated dates.

    Labels

    Category

    Labels

    Priority

    Status
    colourRed
    titlehigh
    |
    Status
    colourYellow
    titleMEDIUM
    |
    Status
    colourGreen
    titleLOW

    Effort

    Status
    colourRed
    titlehigh
    |
    Status
    colourYellow
    titleMEDIUM
    |
    Status
    colourGreen
    titleLOW

    Status

    Status
    titleNot started
    |
    Status
    colourYellow
    titleIn Progress
    |
    Status
    colourBlue
    titlePROTOTYPE
    |
    Status
    colourGreen
    titleCompleted
    |
    Status
    colourRed
    titleBLOCKED
    |

    ...

    Study

    Priority

    Effort

    Science dependencies

    Infrastructure dependencies

    Comment

    Start date

    End date

    Status

    Driver/Team

    Chemical perception

    Addition of new parameters – manually fixing problems

    Status
    colourRed
    titlehigh

    Status
    colourRed
    titlehigh

    Made easier by benchmarking dashboard (Optional)

    Made easier by benchmarking dashboard (Optional

    Status
    colourYellow
    titlein progress

    Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated)

    Automated typing inference from scratch

    Status
    colourYellow
    titleMedium

    Status
    colourRed
    titlehigh

    Organise a meeting to coordinate efforts.

    Update: Tobias Huefner is doing some basic research, but we don’t have a timeline defined here. Perhaps a more specific study to look at typing issues similar to Schauperl’s work on LJ typing.

    Status
    colourYellow
    titlein progress
    Slowly

    Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Huefner may assist here. Owen Madin interested. Trevor Gokey is also actively working in this area.

    Mixture Properties

    Binary Mixture Data Feasibility Study

    Status
    colourRed
    titlehigh

    In the writing stage.

    Status
    colourGreen
    titleCOMPLETED

    Driver: Simon Boothroyd
    Team: Michael Shirts Owen Madin

    Non-bonded optimization

    Status
    colourRed
    titlehigh

    Status
    colourRed
    titlehigh

    Parent study for in a long-term progress stage.

    Status
    colourYellow
    titlein progress

    Driver:Simon Boothroyd
    Team: Michael Shirts Owen Madin

    Chemical potential-like properties

    Status
    colourYellow
    titleMEdium

    Non-bonded optimization

    Implementation in Evaluator

    Need to evaluate the data first (testing needed). Add Confluence page here.

    Status
    colourYellow
    titleIn Progress

    Status
    colourBlue
    titlePROTOTYPE

    Simon Boothroyd

    Status
    colourPurple
    titlespinoff

    Solvent-solvent partition coefficients

    Status
    colourYellow
    titleMEdium

    Implementation in Evaluator

    • Data needed, harder problem.

    • Update: Access to solubility phase, data is less of a problem now (MNSOL)

    Status
    titlenot started

    Simon Boothroyd

    Status
    colourPurple
    titlespinoff

    Data coverage and availability

    Status
    colourRed
    titlehigh

    Feasibility studies

    Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage.

    Ongoing

    Simon Boothroyd Owen Madin Michael Shirts

    QM Data Generation

    QM dataset selection (training data) for OpenFF-1.2.0

    Status
    colourRed
    titlehigh

    Need to expand to benchmarking set.

    Status
    colourGreen
    titlecompleted

    David Mobley Jessica Maat (Deactivated) Hyesu Jang

    QM dataset selection for OpenFF-2.0.0

    Status
    colourRed
    titlehigh

    Status
    colourYellow
    titlein progress

    David Mobley Jessica Maat (Deactivated) Hyesu Jang
    Lead: Hyesu Jang ??

    Benchmarking/re-evaluating our choice of QM theory

    Status
    colourRed
    titlehigh

    (Optional) QC Dataset submission infrastructure

    Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better.

    • Some datasets ready, but analysis is still required (Hyesu Jang ) Pavan might help with this.

    Status
    colour
    • No changes made to our fitting data til after 2.0 release, but decision may be made before that (eg while fitting)

    Status
    colourYellow
    titlein progress

    Hyesu Jang lead; Lee-Ping Wang Pavan Behara

    Hyesu Jang also leading molecule set selection with help from Jessica Maat (Deactivated) and Victoria Lim (Deactivated)

    Protomer/tautomer enumerated molecules

    Status
    colourRed
    titlehigh

    QM level of theory validation (QMLoTV)

    Protonation/tautomer enumeration integration (Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see

    Github link macro
    linkhttps://github.com/openforcefield/openforcefield/issues/526
    )

    • It can only do enumeration with OpenEye

    Status
    colourBlue
    titlePRototyped

    Joshua Horton

    Data selection for ionic species

    What kind of experimental data would we need to include charged molecules?

    Status
    titleNOt started

    Lead: Simon Boothroyd (oversight, to assemble team?)

    Data on molecules with nonzero formal charges

    Status
    colourRed
    titlehigh

    QM level of theory validation (QMLoTV)

    (Optional) QC Dataset submission infrastructure

    Couples to benchmarking

    Status
    titleNOt started

    Pavan Behara

    Enamine REAL fragment coverage

    Status
    colourYellow
    titleMEDIUM

    Automated fragmentation integration Joshua Horton

    Status
    colourYellow
    titlein progress

    Trevor Gokey

    Ligand Expo fragment coverage

    Status
    colourYellow
    titleMEDIUM

    Automated fragmentation integration Joshua Horton

    Ligand Expo has higher priority than Enamine Real.

    Status
    titleNOT started

    Richer torsion data for WBO fitting

    Status
    colourGreen
    titleLow

    WBO torsion implementation

    • What data to generate and

    (person needed to continue work of Chaya Stern (Deactivated); probably Will be Pavan Behara with input from Jessica Maat (Deactivated) or vise versa. Overseen by Simon Boothroyd ? , eventually coordinating with Simon Boothroyd as he moves to his new position. )

    Biopolymer data selection (ensure sidechain data is available in QCA)

    Status
    colourRed
    titlehigh

    One dataset ready, but a lot more data needs to be generated if we want sidechain sampling

    Status
    colourYellow
    titlein progress

    David Cerutti (Deactivated)

    Biopolymer data computation

    Status
    colourYellow
    titleMEDIUM

    (Optional) QC Dataset submission infrastructure

    Status
    colourYellow
    titlein progress

    David Cerutti (Deactivated) David Dotson

    More efficient torsion sampling with less grid points during scan

    Status
    colourGreen
    titleLow

    Status
    colourPurple
    titlespinoff

    Fitting

    Addition of new parameters – manually fixing problems

    Status
    colourRed
    titlehigh

    Status
    colourRed
    titlehigh

    Ongoing

    Status
    colourYellow
    titlein progress

    Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated)

    LJ refitting (Sage)

    Status
    colourRed
    titlehigh

    Non-bonded optimization

    Status
    colourYellow
    titlein progress

    Simon Boothroyd and Owen Madin

    WBO refitting (Sage)

    Status
    colourRed
    titlehigh

    More torsion data

    WBO torsion implementation. : Done.

    Implement what Chaya has already done. As soon as infrastructure is ready. Done.. Infrastructure ready.

    Pavan Behara and Jessica Maat (Deactivated) to divide up chemical space for fitting/test fixes. Hyesu Jang will do central fitting of production FF. Additional discussion: https://openforcefield.atlassian.net/wiki/spaces/OFFO/pages/670629936/90-day+plan+Onboarding+for+Pavan+Behara?accessType=view&grantAccess=true&username=5f592e4d0b6be2006ef56679&userFullName=Pavan%20Behara#Notes-from-9%2F16%2F20-planning-meeting-(feel-free-to-migrate)

    After May meeting

    Late 2020 (Sep 2020)

    Status
    colourYellow
    titlein progress

    Jessica Maat (Deactivated) Hyesu Jang Pavan and Pavan Behara . Pavan Behara will eventually oversee.

    BCC refitting

    Status
    colourRed
    titlehigh

    LJ refit

    Patterns for BCCs; could start with something simple like bond SMARTS.

    ChargeIncrementModel implementation (early May)

    Status
    colourYellow
    titlein progress

    Simon Boothroyd Owen Madin

    Study how to set prior widths and weights for different sorts of data during FF optimizationweights for different sorts of data during FF optimization

    Status
    colourYellow
    titlemedium

    Became higher priority due to need for fixes. Headed towards a 1.3 release late 2020.

    Status
    colourGreenYellow
    titleLowin progress

    Lee-Ping Wang Hyesu Jang Spinoff?

    Value of data generated “incidentally” during torsiondrive in fitting, e.g. optimization snapshots, gradients, energies (low control over these data points)

    Status
    colourGreen
    titleLow

    Some parts of Bespoke workflow OpenFF Bespokefit

    • Once we have more people working on fitting, someone can run this study

    Joshua Horton (question)

    Status
    colourPurple
    titlespinoff

    Benchmarking

    Small reference system for fast testing of FE infrastructure – 5-10 small reference systems, possibly subset of SAMPL challenges, for comparison of different free energy methods to avoid using large P-L systems for test calculations

    Status
    colourGreen
    titleLow

    Status
    colourGreen
    titleLow

    Should use SAMPLing challenge systems plus a couple more similar ones.

    ASAP

    Status
    titleNot started

    Benchmarking/re-evaluating our choice of QM theory

    Status
    colourRed
    titlehigh

    Status
    colourYellow
    titlemedium

    See above; to be done while fitting 2.0 (datasets ready). Hyesu Jang leading.

    Status
    titleNot started

    Lee-Ping Wang (question) Hyesu Jang Pavan Behara

    CCDC data selection/release

    Status
    colourGreen
    titleLow

    Status
    colourPurple
    titlespinoff

    Create a list of tests to judge the “quality” of biopolymer FF with our scientific advisory board

    Status
    colourRed
    titleHIGH

    Organise the meeting with our IAB, invite to May meeting. Done.

    DC and MS will start conversations to get this going.

    April / May

    Status
    colourYellow
    titlein progress

    David Cerutti (Deactivated) Michael Shirts

    openff-1.2.0 (Parsley) benchmarking

    Minor release of Parsley

    Benchmarking dashboard

    Done in preprint form, but no benchmarking dashboard. Still need torsion benchmarking; utilize work just done for OpenFF 1.0 paper.

    JDC is trying to get a complete FE set run by D. Rufa.

    Mid 2020

    Done-ish

    openff-2.0.0 (Sage) benchmarking

    Release of Sage

    Benchmarking dashboard

    Late 2020

    Status
    titleNot started

    Biopolymers

    Which quantum method should we use for biopolymers (should it be the same as small molecules)?

    Status
    colourYellow
    titlemedium

    QM benchmarking study

    Short term – using the same method and same level of theory as ANI (wB97D)

    Status
    titleNot started

    Lee-Ping Wang David Cerutti (Deactivated)

    Feasibility/benchmarking studies of torsional CMAPs

    Status
    colourYellow
    titlemedium

    After protein FF implementation

    CMAP support in OFFTK

    Status
    titleNot started

    David Cerutti (Deactivated)

    Feasibility/benchmarking studies of other cross-terms

    Status
    colourGreen
    titleLow

    Support for cross-terms in OFFTK

    MS – Importance of cross-terms will be related to a number of types

    Status
    titleNot started

    Charges

    GCN charge model

    Status
    colourRed
    titlehigh

    In a few steps:

    • conda-installable tool to assign charges

    • integration of tool into OFFTK under ChargeIncrementModel keyword (and exposure of relevant keywords)

    Status
    colourYellow
    titlein progress

    John Chodera Yuanqing Wang Josh Fass (Deactivated) (maybe John Herr)

    Off-site charge SMIRKS definition/fitting/benchmarking

    Status
    colourYellow
    titlemedium

    Status
    colourRed
    titlehigh

    VirtualSite support in OFFTK

    Helpful discussion in Slack: https://openforcefieldgroup.slack.com/archives/C1907SGET/p1590251452068100

    Infrastructure expected in September 2020

    Status
    titleNot started

    Status
    colourPurple
    titlespinoff
    (but interface with David Cerutti (Deactivated) work?)

    Bayesian inference and surrogate modeling

    Testing Bayesian inference on an analytical model

    Status
    colourYellow
    titlemedium

    Status
    colourGreen
    titleLow

    Nearing completion

    Status
    colourYellow
    titlein progress

    Owen Madin

    Generalizing analytical model for Bayesian inference and testing methods

    Status
    colourRed
    titleLOW

    Status
    colourYellow
    titlemedium

    Proof-of-concept work to give us an analytical form for early testing

    Status
    colourYellow
    titlein progress
    slower

    Owen Madin (and a student)

    Constructing full Bayesian architecture with reweighting and simulation to build surrogate models

    Status
    colourRed
    titleLOW

    Status
    colourRed
    titlehigh

    Analytical Bayesian inference testing

    ForceBalance → pytorch, torchMD (timemachine)

    Status
    titleNot started

    John Herr (question)

    Owen Madin (science, not software)

    Automated typing inference from scratch

    Status
    colourRed
    titlehigh

    Status
    colourRed
    titlehigh

    Bayesian-based typing (Josh Fass’s work)

    Josh Fass (Deactivated)Tobias Huefner

    Other

    Water co-optimization planning study (to be executed later) – discuss with Lee-Ping Wang

    Status
    colourGreen
    titleLow

    Status
    colourRed
    titlehigh

    Lack of bandwidth, potentially Bill Swope could help advise with data selection.

    Status
    colourPurple
    titlespinoff

    Thinking about metals / ions / salts

    Status
    colourGreen
    titleLow

    Status
    colourRed
    titlehigh

    Biologically relevant, will become high priority at some point

    Status
    colourPurple
    titlespinoff

    Thinking about ionic liquids

    Alchemical force fields (for alchemical free energies)

    Status
    colourGreen
    titleLow
    /
    Status
    colourYellow
    titleMEDIum

    Soft core potentials. JDC might have people in his lab working on it, MS is interested to join the effort.

    Continuous (smearnoff) typing

    ESPALOMA

    Yuanqing Wang

    ...