Infrastructure Roadmap 2020
List of infrastructure tasks for 2020. Each task should be linked to its Confluence or GitHub page with more information. See also Science Roadmap 2020.
Labels
Category | Labels |
---|---|
Priority | high | MEDIUM | LOW |
Effort | high | MEDIUM | LOW |
Status | Not started | In Progress | PROTOTYPE | Completed | BLOCKED | |
Roadmap
Infrastructure tasks | Priority | Effort | Blocking science? | Infrastructure Dependencies | Start date | End/Due date | Status | Driver |
Architecture / General infrastructure | ||||||||
Low |
|
| Likely not migrating toolkit and forcefields repo until toolkit 1.0, but new packages should follow the |
|
| IN PROGRESS | @Matt Thompson | |
95%+ core package uptime and deployment (OpenFF TK, s99F, OpenFFs) | High | High |
| Will be made easier by conda-forge migration once OpenMM gets moved over | Ongoing |
| IN PROGRESS | @John Chodera @Jeffrey Wagner @Matt Thompson @David Dotson |
Monitoring dashboard, Nightly builds, deployment tests, error severity/triage policy | medium |
|
| Nightly builds will be easier after conda-forge migration | April 2020 |
| IN PROGRESS | @Jaime Rodríguez-Guerra (Deactivated) @Matt Thompson @David Dotson |
High |
| No, but slowing some things/diverting effort from elsewhere. | Chemper package creation Migration of CMILES and Fragmenter functionality into OFFTK | May 2020 |
| IN PROGRESS PROTOTYPE | @Joshua Horton @David Dotson | |
Automate finding FF discrepancies and submitting torsion drives – find more molecules that have underexplored parameters, eats large molecule datasets and compared OpenFF to ANI energies, and flags most discrepant for QM calculation. Emphasize plugin architecture so other people can easily add new criteria. Relevant slack discussion. | medium |
|
|
|
|
|
| @David Dotson @Trevor Gokey |
General “reproducible computation” records and data infrastructure | High |
|
| Interoperable molecule class |
|
|
| @Simon Boothroyd @Joshua Horton |
Bayesian infrastructure: ML frameworks |
|
| Bayesian Fitting | Analytically Differentiable System Object |
|
| BLOCKED |
|
Off-site charges (support for conversion to other packages) | MEDIUM |
|
| Hard to spec without VirtualSite Handler implementation |
|
|
|
|
|
|
| Won’t be open source until fragmenter is refactored to be OE-free Chemper conda-forge mir CMILES conda-forge migration QC submission infrastructure |
|
| IN PROGRESS PROTOTYPE | @Joshua Horton | |
High |
| Analytical parameter gradient-based fitting, possibly other ML or Bayesian optimization routines | (Optional) spec from MolSSI interoperable molecule workgroup OpenFF-core refactor | March 2020 |
| IN PROGRESS PROTOTYPE | @Matt Thompson | |
HIGH |
|
|
| June 2020 |
| IN PROGRESS PROTOTYPE | @Matt Thompson @Jeffrey Wagner | |
Remove | MEDIUM |
|
|
|
|
|
|
|
Refactor Fragmenter / remove OE dependence / Base off OpenFF Molecule | High |
|
| Graph-based charges/WBOs |
|
|
| @David Dotson @John Chodera @Matt Thompson |
High |
|
|
| Late 2020 |
| IN PROGRESS | @Jaime Rodríguez-Guerra (Deactivated) Peter Eastman; Anthony Scopatz (contracted)l @John Chodera @Jeffrey Wagner Levi Naden | |
High |
|
| OpenMM conda-forge migration | Late 2020 |
| IN PROGRESS | @Jaime Rodríguez-Guerra (Deactivated) @Jeffrey Wagner | |
Toolkit | ||||||||
MEDIUM |
| Off-site charge fitting | Likely to be reworked in the long-term to better work with the System object | March 2020 |
| IN PROGRESS | @Trevor Gokey | |
HIGH |
|
|
|
|
| Complete | @Matt Thompson @David Dotson | |
HIGH |
| Biopolymer fitting |
| March 2020 |
| IN PROGRESS | @David Cerutti (Deactivated) @Jeffrey Wagner | |
Polarizability ParameterHandler | LOW |
| Polarizable fitting |
|
|
|
|
|
A deep dive into toolkit parametrization differences (Josh Fass SMIRKS differences) / Automate complaining about cases where incoming molecule/chemistry is bad/misformatted | High |
|
|
|
|
|
| Spinoff (Potentially Shirts lab undergrad?) |
Refactor/make our own Exception hierarchy, implement some problems as catch-able warnings. | MEDIUM |
|
|
|
|
| IN PROGRESS | @Matt Thompson @Simon Boothroyd @Jeffrey Wagner |
Implement friendly default behavior, with option for custom validation logic when loading large datasets/high-volume pipelines. Consider making moleculefixer for common data problems. | MEDIUM |
|
|
|
|
|
|
|
openforcefield-core/pydantic refactor (possibly driving a SMIRNOFF spec update) | High |
|
| Aromaticity refactor Stereochemistry refactor |
|
|
|
|
LOW |
|
| RDKit doesn’t have helpful protonation state enumeration; need to publicize and see if community wants to contribute there https://github.com/openforcefield/openforcefield/issues/526 Could use EPIK from schrodinger suite? Example in OpenMolTools | Mar 2020 | July 2020 (incomplete) | PROTOTYPEBLOCKED | @Joshua Horton building on work of @Chaya Stern (Deactivated) | |
RDKit stereochemistry and tautomer enumeration | High |
|
| This is implemented in the toolkit see here
|
|
| Complete | @Jeffrey Wagner |
Interoperable molecule/stereochemistry/aromaticity refactor | MEDIUM |
|
| Need to decide on desired behavior for how stereochemistry and aromaticity is handled. Also need to decide on which molecule formats should be losslessly round-trippable. |
|
|
| @Jeffrey Wagner |
Biopolymer infrastructure (SMARTS typing optimization) | High , but can be after protein FF port |
| Biopolymer fitting |
|
| Dec 31 2020 |
| @Jeffrey Wagner |
Biopolymer infrastructure (infra improvement/Topology refactor/automated polymer unit recognition) | High , but can be after protein FF port |
| Biopolymer fitting | Should discuss design with OpenEye |
|
|
|
|
Biopolymer infrastructure (graph charges and/or other scalable solution) | High , but can be after protein FF port |
| Biopolymer charge fitting |
|
|
|
| @Yuanqing Wang @Josh Fass (Deactivated) |
CMAP torsions in OFFTK/SMIRNOFF spec | LOW |
| CMAP fitting |
|
|
|
|
|
Fitting | ||||||||
Migrate FF optimization to ML framework | HIGH |
|
|
|
|
|
|
|
Automate fitting infrastructure, remove OE dependencies | HIGH |
|
| QC Submission infrastructure (for QCMol-->graph mol conversion) |
|
|
| @Jeffrey Wagner @Hyesu Jang |
PE parallelization (Fractalization? Key-value store in cloud? F@H? etc) | HIGH |
|
| MolSSI packaging Fractal separately from QCFractal |
|
|
| @David Dotson @John Chodera @Simon Boothroyd |
Benchmarking | ||||||||
H-G benchmarking | Medium |
|
|
|
|
|
|
|
Medium |
|
|
| Mid 2019? |
| IN PROGRESS | @David Hahn | |
Property estimator mixed FF tests (mix AMBER and SMIRNOFF system components) | LOW |
| This may not be necessary. |
|
|
|
|
|
Automated benchmarking + dashboard May include geometry tools (MM minimization, conformer generation, torsion scanning, conformer scoring) | HIGH |
|
| (Optional) Reliable QCMol → OFFMol conversion/CMILES deviation checks | ??? |
| IN PROGRESS | Dashboard: @Jaime Rodríguez-Guerra (Deactivated) @David Dotson @John Chodera @Trevor Gokey |
Documentation / Community / Training | ||||||||
Docs cleanup and Binder-izing all examples | Medium |
|
|
|
|
| IN PROGRESS | @Matt Thompson (binder) |
Developers guide + true community contributions/branch OE license issue resolution | HIGH |
|
|
|
| Dev docs will be a living document | IN PROGRESS |
|
Training a 50% QCA developer | Medium |
|
|
| April 2020 |
| IN PROGRESS | @David Dotson |
Compute hosting for bespoke workflow on Hypernet Labs' Galileo Platform | Medium |
|
| Bespoke workflow prototype |
|
| IN PROGRESS | @Joshua Horton @Jeffrey Wagner |
|
|
| A way to create CHARMM residue template files (ParmEd Issue #1103) |
|
| BLOCKED |
| |
External |
|
|
|
|
| IN PROGRESS | Marti Municoy; Victor Guallar; @Jeffrey Wagner @David Mobley |